Citation: | LIU Qinghai, WU Qianlin, LUO Jia, TANG Lun, XU Liming. Cross Modal Hashing of Medical Image Semantic Mining for Large Language Model[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250529 |
[1] |
HUANG S C, PAREEK A, SEYYEDI S, et al. Fusion of medical imaging and electronic health records using deep learning: A systematic review and implementation guidelines[J]. NPJ Digital Medicine, 2020, 3: 136. doi: 10.1038/s41746-020-00341-z.
|
[2] |
HOLSTE G, PARTRIDGE S C, RAHBAR H, et al. End-to-end learning of fused image and non-image features for improved breast cancer classification from MRI[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 3287–3296. doi: 10.1109/ICCVW54120.2021.00368.
|
[3] |
FANG Shichao, HONG Shenda, LI Qing, et al. Cross-modal similar clinical case retrieval using a modular model based on contrastive learning and k-nearest neighbor search[J]. International Journal of Medical Informatics, 2025, 193: 105680. doi: 10.1016/j.ijmedinf.2024.105680.
|
[4] |
ZHANG Yilin. Multi-modal medical image matching based on multi-task learning and semantic-enhanced cross-modal retrieval[J]. Traitement du Signal, 2023, 40(5): 2041–2049. doi: 10.18280/ts.400522.
|
[5] |
ZHU Xiangru, LI Zhixu, WANG Xiaodan, et al. Multi-modal knowledge graph construction and application: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(2): 715–735. doi: 10.1109/TKDE.2022.3224228.
|
[6] |
XU Liming, ZENG Xianhua, ZHENG Bochuan, et al. Multi-manifold deep discriminative cross-modal hashing for medical image retrieval[J]. IEEE Transactions on Image Processing, 2022, 31: 3371–3385. doi: 10.1109/TIP.2022.3171081.
|
[7] |
FANG Jiansheng, FU Huazhu, and LIU Jiang. Deep triplet hashing network for case-based medical image retrieval[J]. Medical Image Analysis, 2021, 69: 101981. doi: 10.1016/j.media.2021.101981.
|
[8] |
LI Junnan, LI Dongxu, SAVARESE S, et al. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[C]. 40th International Conference on Machine Learning, Honolulu, USA, 2023: 19730–19742.
|
[9] |
ZHANG Pan, DONG Xiaoyi, WANG Bin, et al. InternLM-XComposer: A vision-language large model for advanced text-image comprehension and composition[EB/OL]. https://doi.org/10.48550/arXiv.2309.15112, 2023.
|
[10] |
ZHU Hongyi, HUANG Jiahong, RUDINAC S, et al. Enhancing interactive image retrieval with query rewriting using large language models and vision language models[C]. 2024 International Conference on Multimedia Retrieval, Phuket, Thailand, 2024: 978–987. doi: 10.1145/3652583.3658032.
|
[11] |
LEE J, YOON W, KIM S, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234–1240. doi: 10.1093/bioinformatics/btz682.
|
[12] |
SANDERSON K. GPT-4 is here: What scientists think[J]. Nature, 2023, 615(7954): 773. doi: 10.1038/d41586-023-00816-5.
|
[13] |
JIANG Qingyuan and LI Wujun. Deep cross-modal hashing[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3270–3278. doi: 10.1109/CVPR.2017.348.
|
[14] |
WU Gengshen, LIN Zijia, HAN Jungong, et al. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval[C]. 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 2854–2860. doi: 10.24963/ijcai.2018/396.
|
[15] |
LI Chao, DENG Cheng, LI Ning, et al. Self-supervised adversarial hashing networks for cross-modal retrieval[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4242–4251. doi: 10.1109/CVPR.2018.00446.
|
[16] |
LI Tieying, YANG Xiaochun, WANG Bin, et al. Bi-CMR: Bidirectional reinforcement guided hashing for effective cross-modal retrieval[C]. 36th AAAI Conference on Artificial Intelligence, 2022: 10275–10282. doi: 10.1609/aaai.v36i9.21268.
|
[17] |
BAO Hangbo, WANG Wenhui, DONG Li, et al. VLMo: Unified vision-language pre-training with mixture-of-modality-experts[C]. 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 2384.
|
[18] |
MELCHIOR J, WANG Nan, and WISKOTT L. Gaussian-binary restricted boltzmann machines for modeling natural image statistics[J]. PLoS One, 2017, 12(2): e0171015. doi: 10.1371/journal.pone.0171015.
|
[19] |
JOHNSON A E W, POLLARD T J, GREENBAUM N R, et al. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs[EB/OL]. https://doi.org/10.48550/arXiv.1901.07042, 2023.
|
[20] |
DEMNER-FUSHMAN D, KOHLI M D, ROSENMAN M B, et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23(2): 304–310. doi: 10.1093/jamia/ocv080.
|
[21] |
SHARMA P, DING Nan, GOODMAN S, et al. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning[C]. 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018: 2556–2565. doi: 10.18653/v1/P18-1238.
|
[22] |
DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
|
[23] |
GILARDI F, ALIZADEH M, and KUBLI M. ChatGPT outperforms crowd workers for text-annotation tasks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2023, 120(30): e2305016120. doi: 10.1073/pnas.2305016120.
|
[24] |
DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 2019: 4171–4186. doi: 10.18653/v1/N19-1423.
|
[25] |
LESTER B, AL-RFOU R, and CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]. 2021 Conference on Empirical Methods in Natural Language Processing, 2021: 3045–3059. doi: 10.18653/v1/2021. emnlp-main. 243.
|
[26] |
LI X L and LIANG P. Prefix-tuning: Optimizing continuous prompts for generation[C]. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2024: 4582–4597. doi: 10.18653/v1/2021.acl-long.353.
|
[27] |
吴钱林, 唐伦, 刘青海, 等. 基于Transformer语义对齐的医学图像跨模态哈希检索[J]. 生物医学工程学杂志, 2025, 42(1): 156–163. doi: 10.7507/1001-5515.202407034.
WU Qianlin, TANG Lun, LIU Qinghai, et al. Cross-modal hash retrieval of medical images based on Transformer semantic alignment[J]. Journal of Biomedical Engineering, 2005, 42(1): 156–163. doi: 10.7507/1001-5515.202407034.
|
[28] |
TU Rongcheng, MAO Xianling, MA Bing, et al. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(2): 560–572. doi: 10.1109/TKDE.2020.2987312.
|
[29] |
XIE De, DENG Cheng, LI Chao, et al. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval[J]. IEEE Transactions on Image Processing, 2020, 29: 3626–3637. doi: 10.1109/TIP.2020.2963957.
|
[30] |
SUN Yuan, REN Zhenwen, HU Peng, et al. Hierarchical consensus hashing for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2024, 26: 824–836. doi: 10.1109/TMM.2023.3272169.
|
[31] |
BAI Cong, ZENG Chao, MA Qing, et al. Graph convolutional network discrete hashing for cross-modal retrieval[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 4756–4767. doi: 10.1109/TNNLS.2022.3174970.
|
[32] |
TU Junfeng, LIU Xueliang, LIN Zongxiang, et al. Differentiable cross-modal hashing via multimodal transformers[C]. 30th ACM International Conference on Multimedia, Lisboa, Portugal, 2022: 453–461. doi: 10.1145/3503161.3548187.
|
[33] |
HUO Yadong, QIN Qibing, ZHANG Wenfeng, et al. Deep hierarchy-aware proxy hashing with self-paced learning for cross-modal retrieval[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 5926–5939. doi: 10.1109/TKDE.2024.3401050.
|