半配对的多模态询问哈希方法

庾骏; 马江涛; 咸阳; 侯瑞霞; 孙伟

doi:10.11999/JEIT231072

半配对的多模态询问哈希方法

doi: 10.11999/JEIT231072 cstr: 32379.14.JEIT231072

庾骏¹,
马江涛¹,
咸阳¹,
侯瑞霞^2, ,,
孙伟³

1.
郑州轻工业大学计算机与通信工程学院郑州 450000
2.
中国林业科学研究院资源信息研究所北京 100091
3.
中国农业科学院农业信息研究所北京 100081

基金项目: 国家自然科学基金(32271880)，河南省科技攻关项目基金(222102210064)，河南省自然科学基金(232300420150)

详细信息

作者简介:
庾骏：男，博士，讲师，研究方向为多媒体信息搜索，模式识别，深度学习

马江涛：男，博士，副教授，研究方向为知识图谱，深度学习

咸阳：男，博士，讲师，研究方向为语音识别，深度学习

侯瑞霞：女，博士，副研究员，研究方向为大数据分析，数据挖掘

孙伟：男，博士，研究员，研究方向为农林时空信息智能分析

通讯作者:
侯瑞霞　houreix@ifrit.ac.cn

中图分类号: TN911.7; TP391
计量
- 文章访问数: 620
- HTML全文浏览量: 586
- PDF下载量: 55
- 被引次数: 0
出版历程
- 收稿日期: 2023-10-08
- 修回日期: 2024-01-31
- 网络出版日期: 2024-01-31
- 刊出日期: 2024-02-29

Semi-paired Multi-modal Query Hashing Method

1.
College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
2.
Research Institute of Resource Information Techniques, CAF, Beijing, 100091, China
3.
Agricultural Information Institute of CAAS, Beijing 100081, China

Funds: The National Natural Science Foundation of China (32271880), The Science and Technology Research Project of Henan Provincial Department (222102210064), The Natural Science Foundation of Henan Province Science (232300420150)

摘要

摘要: 多模态哈希能够将异构的多模态数据转化为联合的二进制编码串。由于其具有低存储成本、快速的汉明距离排序的优点，已经在大规模多媒体检索中受到了广泛的关注。现有的多模态哈希方法假设所有的询问数据都具备完整的多种模态信息以生成它们的联合哈希码。然而，实际应用中很难获得全完整的多模态信息，针对存在模态信息缺失的半配对询问场景，该文提出一种新颖的半配对询问哈希(SPQH)，以解决半配对的询问样本的联合编码问题。首先，提出的方法执行投影学习和跨模态重建学习以保持多模态数据间的语义一致性。然后，标签空间的语义相似结构信息和多模态数据间的互补信息被有效地捕捉以学习判别性的哈希函数。在询问编码阶段，通过学习到的跨模态重构矩阵为未配对的样本数据补全缺失的模态特征，然后再经习得的联合哈希函数生成哈希特征。相比最先进的基线方法，在Pascal Sentence, NUS-WIDE和IAPR TC-12数据集上的平均检索精度提高了2.48%。实验结果表明该算法能够有效编码半配对的多模态询问数据，取得了优越的检索性能。
- 多模态信息检索 /
- 哈希 /
- 半配对数据 /
- 跨模态重建 /
- 二值化编码
Abstract: Multimodal hashing can convert heterogeneous multimodal data into unified binary codes. Due to its advantages of low storage cost and fast Hamming distance sorting, it has attracted widespread attention in large-scale multimedia retrieval. Existing multimodal hashing methods assume that all query data possess complete multimodal information to generate their joint hash codes. However, in practical applications, it is difficult to obtain fully complete multimodal information. To address the problem of missing modal information in semi-paired query scenarios, a novel Semi-paired Query Hashing (SPQH) method is proposed to solve the joint encoding problem of semi-paired query samples. Firstly, the proposed method performs projection learning and cross-modal reconstruction learning to maintain semantic consistency among multimodal data. Then, the semantic similarity structure information of the label space and complementary information among multimodal data are effectively captured to learn a discriminative hash function. During the query encoding stage, the missing modal features of unpaired sample data are completed using the learned cross-modal reconstruction matrix, and then the hash features are generated using the learned joint hash function. Compared to state-of-the-art baseline methods, the average retrieval accuracy on the Pascal Sentence, NUS-WIDE, and IAPR TC-12 datasets has improved by 2.48%. Experimental results demonstrate that the algorithm can effectively encode semi-paired multimodal query data and achieve superior retrieval performance.
- Multimodal retrieval /
- Hashing /
- Semi-paired data /
- Cross-modal reconstruction /
- Binary codes

HTML全文

图 1 SPQH框架的示意图

下载: 全尺寸图片幻灯片

图 2 完全配对的询问场景下的PR曲线

下载: 全尺寸图片幻灯片

图 3 完全未配对的图像询问场景下的PR曲线

下载: 全尺寸图片幻灯片

图 4 完全未配对的文本询问场景下的PR曲线

下载: 全尺寸图片幻灯片

图 5 不同特征空间的t-SNE可视化

下载: 全尺寸图片幻灯片

图 6 SPQH在询问编码阶段设置不同配对询问样本占比的mAP值

下载: 全尺寸图片幻灯片

图 7 本文方法在3个数据集上的收敛曲线

下载: 全尺寸图片幻灯片

1 半配对的多模态询问哈希模型

输入：训练集${\boldsymbol{O}}= \left({\boldsymbol{y}}_1^{(1)},{\boldsymbol{y}}_1^{(2)},{\boldsymbol{l}}_1\right),\left({\boldsymbol{y}}_2^{(1)},{\boldsymbol{y}}_2^{(2)},{\boldsymbol{l}}_2\right),\cdots, $ 　$ \left({\boldsymbol{y}}_n^{(1)},{\boldsymbol{y}}_n^{(2)},{\boldsymbol{l}}_n \right) $，核特征表示矩阵：${{\boldsymbol{X}}_1},{{\boldsymbol{X}}_2} $.
输出：${{\boldsymbol{P}}_1},{{\boldsymbol{P}}_2},{{\boldsymbol{U}}_1},{{\boldsymbol{U}}_2},{{\boldsymbol{W}}_1},{{\boldsymbol{W}}_2},{\alpha _m} $.
① 初始化${{\boldsymbol{U}}_1},{{\boldsymbol{U}}_2},{{\boldsymbol{E}}_1},{{\boldsymbol{E}}_2},{{\boldsymbol{W}}_1},{{\boldsymbol{W}}_2},{\boldsymbol{B}},{\alpha _1},{\alpha _2} $.
② for iter =1: $\xi $ do
③ 　根据等式(5)和式(7)分别更新${{\boldsymbol{P}}_1} $和${{\boldsymbol{P}}_2} $;
④ 　根据等式(9)和式(11)分别更新${{\boldsymbol{E}}_1} $和${{\boldsymbol{E}}_2} $;
⑤ 　根据等式(13)和式(15)分别更新${{\boldsymbol{U}}_1} $和${{\boldsymbol{U}}_2} $;
⑥ 　根据等式(17)和式(18)分别更新${{\boldsymbol{W}}_1} $和${{\boldsymbol{W}}_2} $;
⑦ 　根据式(20)更新${\alpha _1} $和${\alpha _2} $;
⑧ 　根据式(23)和式(24)更新${\boldsymbol{B}} $;
⑨ end for

下载: 导出CSV

表 1 3个基准数据集的统计数据

数据集	Pascal Sentence	NUS-WIDE	IAPR TC-12
数据集的大小	1 000	186 577	20 000
训练集大小	600	5 000	5 000
检索集大小	600	186 577	18 000
测试集大小	400	1 866	2 000
类别数目	20	10	255

下载: 导出CSV

表 2 完全配对的询问场景下不同比特长度的多模态检索任务的mAP比较

任务	方法	Pascal Sentence				NUS-WIDE				IAPR TC-12
任务	方法	16	32	64	128	16	32	64	128	16	32	64	128
O2O	ITQ	0.3602	0.3523	0.3675	0.3803	0.3724	0.3751	0.3776	0.3789	0.3730	0.3844	0.3936	0.4020
	LSH	0.1011	0.1243	0.1572	0.2129	0.3421	0.3554	0.3544	0.3672	0.3251	0.3363	0.3509	0.3686
	DLLH	0.3631	0.3720	0.3971	0.3959	0.3738	0.3782	0.3794	0.3823	0.3644	0.3796	0.3863	0.3868
	HCOH	0.2135	0.4812	0.4846	0.4860	0.3232	0.3451	0.3434	0.3645	0.3082	0.3581	0.3717	0.3712
	MFH	0.1834	0.2399	0.2729	0.2731	0.3673	0.3752	0.3803	0.3815	0.3263	0.3374	0.3435	0.3451
	MVLH	0.1192	0.1347	0.1200	0.1202	0.3363	0.3339	0.3324	0.3284	0.3394	0.3401	0.3409	0.3499
	OMH-DQ	0.4177	0.6719	0.7414	0.7622	0.5223	0.5381	0.5823	0.5957	0.3949	0.4200	0.4446	0.4642
	SIDMH	0.6681	0.7479	0.7596	0.7660	0.5828	0.5976	0.6055	0.6120	0.4131	0.4277	0.4364	0.4706
	AMFH	0.6837	0.7501	0.7511	0.7519	0.6190	0.6240	0.6271	0.6385	0.4198	0.4374	0.4571	0.4887
	SPQH_all	0.5176	0.7641	0.7799	0.7689	0.6208	0.6269	0.6353	0.6410	0.4060	0.4601	0.4691	0.4964

下载: 导出CSV

表 4 Pascal Sentence数据集上SPQH与深度方法的mAP比较

方法	哈希编码长度
方法	32	64	128
DMHOR	0.5930	0.6693	0.6860
DMVH	0.5301	0.6010	0.6720
SIDMH	0.7479	0.7596	0.7660
FGCMH	0.7089	0.7433	0.7511
SPQH_all	0.7641	0. 7799	0.7689

下载: 导出CSV

表 3 完全未配对的询问场景下不同比特长度的多模态检索任务的mAP比较

任务	方法	Pascal Sentence				NUS-WIDE				IAPR TC-12
任务	方法	16	32	64	128	16	32	64	128	16	32	64	128
I2I	SPH	0.2425	0.2839	0.2919	0.3034	0.3142	0.3210	0.3465	0.3609	0.3352	0.3500	0.3634	0.3798
	SSMH	0.2600	0.3240	0.3499	0.3817	0.3468	0.3591	0.3877	0.4047	0.3258	0.3528	0.3503	0.3699
	UCMH	0.2766	0.3908	0.4778	0.5119	0.3500	0.3588	0.4051	0.4166	0.3487	0.3744	0.3795	0.3798
	STCH	0.2999	0.4126	0.4953	0.5648	0.3589	0.3603	0.3926	0.4260	0.3561	0.3807	0.3822	0.3912
	SPQH_img	0.3168	0.4533	0.5996	0.6111	0.3597	0.3612	0.4286	0.4319	0.3777	0.3823	0.3902	0.4003
T2T	SPH	0.3717	0.3958	0.4122	0.4534	0.4017	0.4128	0.4203	0.4399	0.2922	0.3021	0.3448	0.3485
	SSMH	0.3916	0.4443	0.4929	0.5341	0.5322	0.5538	0.5727	0.5894	0.3306	0.3529	0.3674	0.3879
	UCMH	0.4633	0.5324	0.5911	0.6317	0.5512	0.5677	0.5824	0.5878	0.3527	0.3742	0.3922	0.4157
	STCH	0.4718	0.6377	0.6651	0.6818	0.5677	0.6025	0.6148	0.6281	0.3789	0.3879	0.4206	0.4379
	SPQH_txt	0.5058	0.7433	0.7578	0.7600	0.5801	0.6199	0.6312	0.6327	0.3801	0.4054	0.4377	0.4416

下载: 导出CSV

表 5 语义结构保持项的消融实验

方法	Pascal Sentence	NUS-WIDE	IAPR TC-12
SPQH-DSP	0.4614	0.4166	0.3949
SPQH	0.7799	0.6353	0.4691

下载: 导出CSV

表 6 SPQH在询问阶段基于不同模态特征编码的mAP比较

方法(询问数据的特征类型)	Pascal Sentence	NUS-WIDE	IAPR TC-12
SPQH1 (文本特征)	0.6159	0.4702	0.4162
SPQH_txt (文本特征+ 伪图片特征)	0.7578	0.6312	0.4377
SPQH2 (图片特征)	0.4837	0.4023	0.3680
SPQH_img (图片特征+ 伪文本特征)	0.5996	0.4286	0.3902
SPQH_all (图片特征+ 文本特征)	0.7799	0.6353	0.4691

下载: 导出CSV

参考文献(29)

[1]	GEETHA V and SUJATHA N. A survey on divergent classification of social media networking[C]. 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 2022: 203–207. doi: 10.1109/ICCCIS56430.2022.10037606.
[2]	顾广华, 霍文华, 苏明月, 等. 基于非对称监督深度离散哈希的图像检索[J]. 电子与信息学报, 2021, 43(12): 3530–3537. doi: 10.11999/JEIT200988. GU Guanghua, HUO Wenhua, SU Mingyue, et al. Asymmetric supervised deep discrete hashing based image retrieval[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3530–3537. doi: 10.11999/JEIT200988.
[3]	GONG Yunchao, LAZEBNIK S, GORDO A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2916–2929. doi: 10.1109/TPAMI.2012.193.
[4]	DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]. The 20th Annual Symposium on Computational Geometry, Brooklyn, USA, 2004: 253–262. doi: 10.1145/997817.997857.
[5]	SHEN Fumin, SHEN Chunhua, LIU Wei, et al. Supervised discrete hashing[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA, 2015: 37–45. doi: 10.1109/CVPR.2015.7298598.
[6]	JI Rongrong, LIU Hong, CAO Liujuan, et al. Toward optimal manifold hashing via discrete locally linear embedding[J]. IEEE Transactions on Image Processing, 2017, 26(11): 5411–5420. doi: 10.1109/TIP.2017.2735184.
[7]	KOUTAKI G, SHIRAI K, and AMBAI M. Hadamard coding for supervised discrete hashing[J]. IEEE Transactions on Image Processing, 2018, 27(11): 5378–5392. doi: 10.1109/TIP.2018.2855427.
[8]	LIN Mingbao, JI Rongrong, LIU Hong, et al. Supervised online hashing via hadamard codebook learning[C]. The 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 2018: 1635–1643. doi: 10.1145/3240508.3240519.
[9]	LIN Mingbao, JI Rongrong, CHEN Shen, et al. Similarity-preserving linkage hashing for online image retrieval[J]. IEEE Transactions on Image Processing, 2020, 29: 5289–5300. doi: 10.1109/TIP.2020.2981879.
[10]	JIN Lu, LI Zechao, PAN Yonghua, et al. Weakly-supervised image hashing through masked visual-semantic graph-based reasoning[C]. The 28th ACM International Conference on Multimedia, Seattle, USA, 2020: 916–924. doi: 10.1145/3394171.3414022.
[11]	LI Zechao, TANG Jinhui, ZHANG Liyan, et al. Weakly-supervised semantic guided hashing for social image retrieval[J]. International Journal of Computer Vision, 2020, 128(8/9): 2265–2278. doi: 10.1007/s11263-020-01331-0.
[12]	SONG Jingkuan, YANG Yi, HUANG Zi, et al. Effective multiple feature hashing for large-scale near-duplicate video retrieval[J]. IEEE Transactions on Multimedia, 2013, 15(8): 1997–2008. doi: 10.1109/TMM.2013.2271746.
[13]	SHEN Xiaobo, SHEN Fumin, SUN Quansen, et al. Multi-view latent hashing for efficient multimedia search[C]. The 23rd ACM International Conference on Multimedia, Brisbane, Australia, 2015: 831–834. doi: 10.1145/2733373.2806342.
[14]	LIU Li, YU Mengyang, and SHAO Ling. Multiview alignment hashing for efficient image search[J]. IEEE Transactions on Image Processing, 2015, 24(3): 956–966. doi: 10.1109/TIP.2015.2390975.
[15]	LU Xu, LIU Li, NIE Liqiang, et al. Semantic-driven interpretable deep multi-modal hashing for large-scale multimedia retrieval[J]. IEEE Transactions on Multimedia, 2021, 23: 4541–4554. doi: 10.1109/TMM.2020.3044473.
[16]	YU Jun, HUANG Wei, LI Zuhe, et al. Hadamard matrix-guided multi-modal hashing for multi-modal retrieval[J]. Digital Signal Processing, 2022, 130: 103743. doi: 10.1016/j.dsp.2022.103743.
[17]	庾骏, 黄伟, 张晓波, 等. 基于松弛Hadamard矩阵的多模态融合哈希方法[J]. 电子学报, 2022, 50(4): 909–920. doi: 10.12263/DZXB.20210760. YU Jun, HUANG Wei, ZHANG Xiaobo, et al. Multimodal fusion hash learning method based on relaxed Hadamard matrix[J]. Acta Electronica Sinica, 2022, 50(4): 909–920. doi: 10.12263/DZXB.20210760.
[18]	LU Xu, ZHU Lei, CHENG Zhiyong, et al. Online multi-modal hashing with dynamic query-adaption[C]. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 2019: 715–724. doi: 10.1145/3331184.3331217.
[19]	YU Jun, ZHANG Donglin, SHU Zhenqiu, et al. Adaptive multi-modal fusion hashing via hadamard matrix[J]. Applied Intelligence, 2022, 52(15): 17170–17184. doi: 10.1007/s10489-022-03367-w.
[20]	SHEN Xiaobo, SUN Quansen, and YUAN Yunhao. Semi-paired hashing for cross-view retrieval[J]. Neurocomputing, 2016, 213: 14–23. doi: 10.1016/j.neucom.2016.01.121.
[21]	WANG Di, SHANG Bin, WANG Quan, et al. Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation[J]. Multimedia Tools and Applications, 2019, 78(17): 24167–24185. doi: 10.1007/s11042-018-6858-8.
[22]	GAO Jing, ZHANG Wenjun, ZHONG Fangming, et al. UCMH: Unpaired cross-modal hashing with matrix factorization[J]. Neurocomputing, 2020, 418: 178–190. doi: 10.1016/j.neucom.2020.08.029.
[23]	JING Rongrong, TIAN Hu, ZHANG Xingwei, et al. Self-Training based semi-Supervised and semi-Paired hashing cross-modal retrieval[C]. 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022: 1–8. doi: 10.1109/IJCNN55064.2022.9892301.
[24]	RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using amazon’s mechanical Turk[C]. The NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, America, 2010: 139–147.
[25]	CHUA T S, TANG Jinhui, HONG Richang, et al. NUS-WIDE: A real-world web image database from national university of Singapore[C]. The ACM International Conference on Image and Video Retrieval, Santorini, Greece, 2009: 48. doi: 10.1145/1646396.1646452.
[26]	ESCALANTE H J, HERNÁNDEZ C A, GONZALEZ J A, et al. The segmented and annotated IAPR TC-12 benchmark[J]. Computer Vision and Image Understanding, 2010, 114(4): 419–428. doi: 10.1016/j.cviu.2009.03.008.
[27]	WANG Daixin, CUI Peng, OU Mingdong, et al. Deep multimodal hashing with orthogonal regularization[C]. The 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 2015: 2291–2297.
[28]	YANG Rui, SHI Yuliang, and XU Xinshun. Discrete multi-view hashing for effective image retrieval[C]. 2017 ACM on International Conference on Multimedia Retrieval, Bucharest, Romania, 2017: 175–183. doi: 10.1145/3078971.3078981.
[29]	LU Xu, ZHU Lei, LIU Li, et al. Graph convolutional multi-modal hashing for flexible multimedia retrieval[C/OL]. The 29th ACM International Conference on Multimedia, Chengdu, China, 2021: 1414–1422. doi: 10.1145/3474085.3475598.