基于深度学习的跨社交网络用户匹配方法

马强; 戴军

doi:10.11999/JEIT220702

基于深度学习的跨社交网络用户匹配方法

doi: 10.11999/JEIT220702 cstr: 32379.14.JEIT220702

马强,
戴军^,

西南科技大学信息工程学院绵阳 621010

基金项目: 国家自然科学基金 (62071170, 62072158)，河南省杰出青年科学基金(222300420006)，河南省高校科技创新团队支持计划(21IRTSTHN015)，西南科技大学博士基金(17zx7158)

详细信息

作者简介:
马强：男，副教授，研究方向为智能数据处理、社交媒体计算

戴军：男，硕士生，研究方向为跨社交媒体计算、用户账号匹配

通讯作者:
戴军　arturia_pendragon@163.com

中图分类号: TN915; TP391
计量
- 文章访问数: 843
- HTML全文浏览量: 661
- PDF下载量: 120
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-31
- 修回日期: 2022-08-27
- 录用日期: 2022-09-06
- 网络出版日期: 2022-09-09
- 刊出日期: 2023-07-10

User Matching Method for Cross Social Networks Based on Deep Learning

MA Qiang,
DAI Jun^,

School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China

Funds: The National Natural Science Foundation of China (62071170，62072158), Henan Science Foundation for Distinguished Young Scholars (222300420006), Henan Support Plan for Science and Technology Innovation Team of Universities (21IRTSTHN015), The Doctoral Foundation of Southwest University of Science and Technology (17zx7158)

摘要

摘要: 现有基于时空信息的跨社交网络用户匹配方案，存在着难以耦合时空信息、特征提取困难问题，导致匹配精度下降。该文提出一种基于深度学习的跨社交网络用户匹配方法(DLUMCN)，首先对用户签到数据进行时空尺度的网格映射，生成包含用户特征的签到矩阵集合，对其归一化后构成用户签到图。然后采用卷积从签到图中生成高维度的时空特征图，利用深度可分离卷积对特征图权重变换和特征融合，对特征图1维展开获得特征向量。最后利用全连接前馈网络构建分类器并输出用户匹配评分。通过在两组真实社交网络的数据集上进行实验验证，实验结果表明，与现有相关算法相比，所提算法在匹配的准确率以及F1-值均得到提升，验证了所提算法的有效性。
- 跨社交网络 /
- 用户匹配 /
- 深度学习 /
- 签到相似度
Abstract: The existing spatio-temporal information based user matching schemes for cross social networks have problems of spatio-temporal information decoupling and feature extraction difficulties, which result in a decrease in matching accuracy. A Deep Learning based User Matching method for Cross social Networks (DLUMCN) is proposed. Firstly, grid mapping at the spatio-temporal scale is carried out on the user sign-in data. The sign-in matrix set is generated, which contains user characteristics. User sign-in map is formed after normalization. Secondly, the convolution is used to generate high-dimensional spatio-temporal feature maps from the user sign-in map. The weight transformation and feature fusion of feature maps are carried out by deep separable convolution. The feature vector is obtained by one-dimensional expansion of feature maps. Finally, the fully connected feedforward network is used to build a classifier and output the user matching score. Experimental results on two sets of datasets of real social networks show that the proposed method has improved matching accuracy and F1-value, compared with the existing related methods. The effectiveness of the proposed method is demonstrated.
- Cross social networks /
- User matching /
- Deep learning /
- Sign-in similarity

HTML全文

图 1 匹配模型

下载: 全尺寸图片幻灯片

图 2 模型的准确率和损失曲线

下载: 全尺寸图片幻灯片

图 3 模型在不同条件下的准确率

下载: 全尺寸图片幻灯片

图 4 关联系数s和填充系数p的确定

下载: 全尺寸图片幻灯片

算法1　单点填充算法
输入：用户签到集S，与S待匹配签到集S_match，网格密度系数k
输出：用户签到矩阵集SMS
初始化：初始化k维的零矩阵集SMS={A_s1, A_s2, A_t1, A_t2}，通　过S和S_match设定全局时空域
(1) 遍历签到集S：
(2) 获取时空网格：g_s=(x_s, y_s)，g_t=(x_t, y_t)
(3) 链接并填充矩阵：A_s1[x_s, y_s] += 1；A_t1[x_t, y_t] += 1
(4) 通过S和S_match设定局部时空域，将A_s1和A_t1替换为A_s2和A_t2，　重复执行步骤1～步骤3
(5) 输出SMS

下载: 导出CSV

算法2　关联填充算法
输入：用户签到集S，与S待匹配签到集S_match，网格密度系数k，　关联系数s，填充系数p
输出：用户签到矩阵集SMS
(1) 通过单点填充算法获得SMS={A_s1, A_s2, A_t1, A_t2}，定义空　集合A, B
(2) 通过S和S_match设定全局时空域，选定A_s = A_s1 A_t = A_t1,
(3) 遍历签到集S中任意签到sign：
(4) 获取sign的时空网格表示g_s=(x_s, y_s)和g_t=(x_t, y_t)
(5) 将满足d(g, g_s) <= s的网格g，将其添加到临时集合A_t
(6) 将满足d(g, g_t) <= s的网格g，将其添加到临时集合B_t
(7) 对任意网格g=(x, y)：
(8) 若g ∈ A_t∩A：A_s[x, y]+=p
(9) 若g ∈ B_t∩B：A_t[x, y]+=p
(10) 更新集合A和B：A=A_t，B=B_t
(11) 若当前映射空间为全局时空域：通过S和S_match设定局部时　空域，选定A_s = A_s2 A_t = A_t2，重复执行步骤3～步骤10
(12) 否则：输出SMS

下载: 导出CSV

算法3　模型优化算法
输入：训练样本集train_data，迭代轮数epoch，批次尺寸bs，　学习率α
输出：model
(1) 随机初始化model{W_C1, ···, W_F1, W_OUT, B_C1, ···, B_F1, b}
(2) for i =1 to epoch:
(3) 　　for batch_data in train_data: #按批次遍历整个训练　　　　样本集
(4) 　　　　for sample in batch_data: #按样本遍历单个批次
(5) 　　　　　构建样本签到图MAP
(6) 　　　　　根据式(12)进行前向传播，计算模型各层输出
(7) 　　　根据式(13)—式(16)计算模型预测的损失和各层参数　　　　　的梯度
(8) 　　　更新模型各层参数：W–= α×δW，B–= α×δB
(9) 输出 model

下载: 导出CSV

表 1 数据集概况

	数据集
	Brightkite(样本量50686)		Gowalla(样本量107092)
划分部分	a	b	a	b
平均签到数	111	111	75	75
起始时间	2008–03–22 06:34:37	2008–03–21 20:36:21	2009–02–05 06:27:43	2009–02–04 05:17:38
终止时间	2010–10–18 18:34:01	2010–10–18 18:39:58	2010–10–23 05:22:06	2010–10–23 05:22:06
经度范围	–163.193～151.198	–163.193～151.198	–90.011～105.659	–90.011～105.625
纬度范围	–179.824～179.999	–179.824～179.999	–176.309～177.463	–166.525～177.453

下载: 导出CSV

表 2 匹配模型的其他参数

模型参数	设定值	说明
k	65(Brightkite) / 75(Gowalla)	网格密度系数
det/mt	1×10^–4/0.5	调节因子/匹配阈值
batch-size，α	256, 1×10^–3	训练的批次样本量，学习率
kcc1(fc1), kcc2(fc2)	16(3), 32(3)	CON1, CON2卷积核数 (尺寸)
kcs1(fs1), kcs2(fs2)	1(2), 64(1)	SCON1, SCON2卷积核数(尺寸)
n_F1, n_F2, n_F3	16, 12, 8	前馈网络隐藏层神经元数量

下载: 导出CSV

表 3 不同算法的复杂度以及耗时(s)

算法	时间复杂度	空间复杂度	(Brightkite数据集)		(Gowalla数据集)
算法	时间复杂度	空间复杂度	训练时间	测试时间	训练时间	测试时间
UNICORN	O(N²·S·k²)	O(N·S)	–	11.85	–	25.12
STUL	O(N·S³)	O(N·S)	–	451.663	–	1394.156
CDTraj2vec	O(N·S²)	O(N·S²)	375.833	11.85	587.174	26.385
UIDwST	O(N²·S²)	O (N·S)	–	217.775	–	631.105
DLUMCN	O(N·S·k²)	O(N·k²)	69.678	0.475	200.066	1.311

下载: 导出CSV

表 4 不同算法在两组数据集上的测试结果

算法	(Brightkite数据集)				(Gowalla数据集)
算法	acc	pre	rec	f1	acc	pre	rec	f1
UNICORN	0.8801	0.8085	0.9960	0.8925	0.8847	0.8194	0.9870	0.8954
STUL	0.9253	0.9039	0.9516	0.9271	0.9281	0.9094	0.9447	0.9267
CDTraj2vec	0.9529	0.9833	0.9212	0.9512	0.9567	0.9686	0.9360	0.9520
UIDwST	0.9740	0.9594	0.9897	0.9743	0.9773	0.9619	0.9870	0.9742
DLUMCN	0.9881	0.9949	0.9814	0.9881	0.9890	0.9969	0.9812	0.9889

下载: 导出CSV

参考文献(22)

[1]	DENG Kaikai, XING Ling, ZHENG Longshui, et al. A user identification algorithm based on user behavior analysis in social networks[J]. IEEE Access, 2019, 7: 47114–47123. doi: 10.1109/ACCESS.2019.2909089
[2]	邢玲, 邓凯凯, 吴红海, 等. 复杂网络视角下跨社交网络用户身份识别研究综述[J]. 电子科技大学学报, 2020, 49(6): 905–917. doi: 10.12178/1001-0548.2019182 XING Ling, DENG Kaikai, WU Honghai, et al. Review of user identification across social networks: The complex network approach[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(6): 905–917. doi: 10.12178/1001-0548.2019182
[3]	张树森, 梁循, 弭宝瞳, 等. 基于内容的社交网络用户身份识别方法[J]. 计算机学报, 2019, 42(8): 1739–1754. doi: 10.11897/SP.J.1016.2019.01739 ZHANG Shusen, LIANG Xun, MI Baotong, et al. Content-based social network user identification methods[J]. Chinese Journal of Computers, 2019, 42(8): 1739–1754. doi: 10.11897/SP.J.1016.2019.01739
[4]	HAO Tianyi, ZHOU Jingbo, CHENG Yunsheng, et al. User identification in cyber-physical space: A case study on mobile query logs and trajectories[C]. The 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, USA, 2016: 71.
[5]	CHEN Wei, YIN Hongzhi, WANG Weiqing, et al. Exploiting spatio-temporal user behaviors for user linkage[C]. The ACM International Conference on Information and Knowledge Management, Singapore, 2017: 517–526.
[6]	KONDOR D, HASHEMIAN B, DE MONTJOYE Y A, et al. Towards matching user mobility traces in large-scale datasets[J]. IEEE Transactions on Big Data, 2020, 6(4): 714–726. doi: 10.1109/TBDATA.2018.2871693
[7]	HAO Tianyi, ZHOU Jingbo, CHENG Yunsheng, et al. A unified framework for user identification across online and offline data[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(4): 1562–1575. doi: 10.1109/TKDE.2020.3000287
[8]	HE Wenqiang, LI Yongjun, ZHANG Yinyin, et al. A binary-search-based locality-sensitive hashing method for cross-site user identification[J]. IEEE Transactions on Computational Social Systems, 2022, 10(2): 480–491.
[9]	王前东. 经典轨迹的鲁棒相似度量算法[J]. 电子与信息学报, 2020, 42(8): 1999–2005. doi: 10.11999/JEIT190550 WANG Qiandong. A robust trajectory similarity measure method for classical trajectory[J]. Journal of Electronics &Information Technology, 2020, 42(8): 1999–2005. doi: 10.11999/JEIT190550
[10]	QI Mengjun, WANG Zhongyuan, HE Zheng, et al. User identification across asynchronous mobility trajectories[J]. Sensors, 2019, 19(9): 2102. doi: 10.3390/s19092102
[11]	HAN Xiaohui, WANG Lianhai, XU Lijuan, et al. Social Media account linkage using user-generated geo-location data[C]. IEEE Conference on Intelligence and Security Informatics, Tucson, USA, 2016: 157–162.
[12]	冯朔, 申德荣, 聂铁铮, 等. 一种基于最大公共子图的社交网络对齐方法[J]. 软件学报, 2019, 30(7): 2175–2187. doi: 10.13328/j.cnki.jos.005831 FENG Shuo, SHEN Derong, NIE Tiezheng, et al. Maximum common subgraph based social network alignment method[J]. Journal of Software, 2019, 30(7): 2175–2187. doi: 10.13328/j.cnki.jos.005831
[13]	陈鸿昶, 徐乾, 黄瑞阳, 等. 一种基于用户轨迹的跨社交网络用户身份识别算法[J]. 电子与信息学报, 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130 CHEN Hongchang, XU Qian, HUANG Ruiyang, et al. User identification across social networks based on user trajectory[J]. Journal of Electronics &Information Technology, 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130
[14]	MA Jiangtao, QIAO Yaqiong, HU Guangwu, et al. Social account linking via weighted bipartite graph matching[J]. International Journal of Communication Systems, 2018, 31(7): e3471. doi: 10.1002/dac.3471
[15]	XIAO Xiangye, ZHENG Yu, LUO Qiong, et al. Inferring social ties between users with human location history[J]. Journal of Ambient Intelligence and Humanized Computing, 2014, 5(1): 3–19. doi: 10.1007/s12652-012-0117-z
[16]	WANG Fengzi, ZHU Xinning, and MIAO Jiansong. Semantic trajectories-based social relationships discovery using WiFi monitors[J]. Personal and Ubiquitous Computing, 2017, 21(1): 85–96. doi: 10.1007/s00779-016-0983-z
[17]	LI Yongjun, JI Wenli, GAO Xing, et al. Matching user accounts with spatio-temporal awareness across social networks[J]. Information Sciences, 2021, 570: 1–15. doi: 10.1016/j.ins.2021.04.030
[18]	张伟, 李扬, 张吉, 等. 融合时空行为与社交关系的用户轨迹识别模型[J]. 计算机学报, 2021, 44(11): 2173–2188. doi: 10.11897/SP.J.1016.2021.02173 ZHANG Wei, LI Yang, ZHANG Ji, et al. A user trajectory identification model with fusion of spatio-temporal behavior and social relation[J]. Chinese Journal of Computers, 2021, 44(11): 2173–2188. doi: 10.11897/SP.J.1016.2021.02173
[19]	沈佳琪, 周国民. 跨社交网络的同一用户识别算法[J]. 电子技术应用, 2022, 48(1): 109–114. doi: 10.16157/j.issn.0258-7998.211518 SHEN Jiaqi and ZHOU Guomin. User alignment across social networks[J]. Application of Electronic Technique, 2022, 48(1): 109–114. doi: 10.16157/j.issn.0258-7998.211518
[20]	HAN Xiaohui, WANG Lianhai, XU Shujiang, et al. Linking social network accounts by modeling user spatiotemporal habits[C]. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, 2017: 19–24.
[21]	CHEN Wei, WANG Weiqing, YIN Hongzhi, et al. User account linkage across multiple platforms with location data[J]. Journal of Computer Science and Technology, 2020, 35(4): 751–768. doi: 10.1007/s11390-020-0250-7
[22]	ZHOU Xueyan and YANG Jing. Matching user accounts based on location verification across social networks[J]. Revista Internacional de Métodos Numéricos para Cálculo y Diseñ o en Ingeniería, 2020, 36(1): 8. doi: 10.23967/j.rimni.2019.12.001