Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network

Xiaoguang SHENG; Ying WANG; Li QIAN; Ying WANG

doi:10.11999/JEIT200905

Volume 43 Issue 12

Dec. 2021

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2021 > 43(12): 3442-3450

Xiaoguang SHENG, Ying WANG, Li QIAN, Ying WANG. Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3442-3450. doi: 10.11999/JEIT200905

Citation:

Xiaoguang SHENG, Ying WANG, Li QIAN, Ying WANG. Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3442-3450. doi: 10.11999/JEIT200905

Citation:

PDF( 1415 KB)

Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network

doi: 10.11999/JEIT200905 cstr: 32379.14.JEIT200905

Xiaoguang SHENG^{1
,
,},
Ying WANG²,
Li QIAN^{2, 3},
Ying WANG¹

1.
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
2.
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
3.
Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China

Funds: The National Natural Science Foundation of China (61702038), The National Social Science Foundation of China (15CTQ006)

Received Date: 2020-10-23
Accepted Date: 2021-11-04
Rev Recd Date: 2021-09-23

Available Online: 2021-11-10

Publish Date: 2021-12-21

Abstract

Abstract

In order to solve the problem of exact matching between scholars and articles, a new method of author name disambiguation is proposed based on semi-supervised learning with graph convolutional network. In this method, the SciBERT pre-training language model is applied to calculating the semantic embedding vector of each paper with their title and keywords. Authors and organizations of papers are used to obtain the adjacency matrixes of the paper’s co-author network and co-organization network. The pseudo labels are collected from the co-author network to obtain the positive and negative samples. The semantic embedding vector, adjacency matrixes and the positive and negative samples are used as input to be processed by Graph Convolution neural Network (GCN). In semi-supervised learning, the embedding vectors of papers are learned to be clustered in order to realize the name disambiguation of papers. The experimental results show that, compared with other disambiguation methods, this method achieves better results on the experimental dataset.
- Name disambiguation,
- Graph Convolutional Network (GCN),
- BERT language model

FullText(HTML)

References(24)

References

[1]	ORCID. What is ORCID[EB/OL]. https://www.lanl.gov/library/scholarly/orcid.php.
[2]	Thomson Reuters Company. What is ResearcherID?[EB/OL]. https://libanswers.lib.xjtlu.edu.cn/faq/240918, 2020.
[3]	HAN Hui, GILES L, ZHA Hongyuan, et al. Two supervised learning approaches for name disambiguation in author citations[C]. Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, Tuscon, USA, 2014: 296–305. doi: 10.1145/996350.996419
[4]	MALIN B. Unsupervised name disambiguation via social network similarity[C]. Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, Newport Beach, USA, 2005: 93–102.
[5]	HAN Hui, ZHA Hongyuan, and GILES C L. Name disambiguation in author citations using a K-way spectral clustering method[C]. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, Denver, USA, 2005: 334–343. doi: 10.1145/1065385.1065462.
[6]	ZHANG Yutao, ZHANG Fanjin, YAO Peiran, et al. Name disambiguation in aminer: Clustering, maintenance, and human in the loop[C]. The Twenty-Forth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 2018: 1002–1011. doi: 10.1145/3219819.3219859.
[7]	ZHANG Baichuan and AL HASAN M. Name disambiguation in anonymized graphs using network embedding[C]. The 2017 ACM on Conference on Information and Knowledge Management, Singapore, 2017: 1239–1248. doi: 10.1145/3132847.3132873.
[8]	盖杉, 鲍中运. 基于改进深度卷积神经网络的纸币识别研究[J]. 电子与信息学报, 2019, 41(8): 1992–2000. doi: 10.11999/JEIT181097 GAI Shan and BAO Zhongyun. Banknote recognition research based on improved deep convolutional neural network[J]. Journal of Electronics &Information Technology, 2019, 41(8): 1992–2000. doi: 10.11999/JEIT181097
[9]	卢俊言, 贾宏光, 高放, 等. 语义分割网络重建单视图遥感影像数字表面模型[J]. 电子与信息学报, 2021, 43(4): 974–981. doi: 10.11999/JEIT200031 LU Junyan, JIA Hongguang, GAO Fang, et al. Reconstruction of digital surface model of single-view remote sensing image by semantic segmentation network[J]. Journal of Electronics &Information Technology, 2021, 43(4): 974–981. doi: 10.11999/JEIT200031
[10]	孙晓, 彭晓琪, 胡敏, 等. 基于多维扩展特征与深度学习的微博短文本情感分析[J]. 电子与信息学报, 2017, 39(9): 2048–2055. doi: 10.11999/JEIT160975 SUN Xiao, PENG Xiaoqi, HU Min, et al. Extended multi-modality features and deep learning based microblog short text sentiment analysis[J]. Journal of Electronics &Information Technology, 2017, 39(9): 2048–2055. doi: 10.11999/JEIT160975
[11]	郑睿刚, 陈伟福, 冯国灿. 图卷积算法的研究进展[J]. 中山大学学报:自然科学版, 2020, 59(2): 1–14. doi: 10.13471/j.cnki.acta.snus.2020.02.001 ZHENG Ruigang, CHEN Weifu, and FENG Guocan. A concise survey on graph convolutional networks[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2020, 59(2): 1–14. doi: 10.13471/j.cnki.acta.snus.2020.02.001
[12]	徐冰冰, 岑科廷, 黄俊杰, 等. 图卷积神经网络综述[J]. 计算机学报, 2020, 43(5): 755–780. doi: 10.11897/SP.J.1016.2020.00755 XU Bingbing, CEN Keting, HUANG Junjie, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755–780. doi: 10.11897/SP.J.1016.2020.00755
[13]	葛尧, 陈松灿. 面向推荐系统的图卷积网络[J]. 软件学报, 2020, 31(4): 1101–1112. doi: 10.3969/j.issn.1000-9825.2020.04.016 GE Yao and CHEN Songcan. Graph convolutional network for recommender systems[J]. Journal of Software, 2020, 31(4): 1101–1112. doi: 10.3969/j.issn.1000-9825.2020.04.016
[14]	王鑫, 李可, 宁晨, 等. 基于深度卷积神经网络和多核学习的遥感图像分类方法[J]. 电子与信息学报, 2019, 41(5): 1098–1105. doi: 10.11999/JEIT180628 WANG Xin, LI Ke, NING Chen, et al. Remote sensing image classification method based on deep convolution neural network and multi-kernel learning[J]. Journal of Electronics &Information Technology, 2019, 41(5): 1098–1105. doi: 10.11999/JEIT180628
[15]	HUANG Jian, ERTEKIN S, and GILES C L. Efficient name disambiguation for large-scale databases[C]. 10th European Conference on Principles and Practice of Knowledge Discovery, Berlin, Germany, 2006: 536–544. doi: 10.1007/11871637_53.
[16]	YOSHIDA M, IKEDA M, ONO S, et al. Person name disambiguation by bootstrapping[C]. The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, 2010: 10–17. doi: 10.1145/1835449.1835454.
[17]	ZHU Jia, WU Xingcheng, LIN Xueqin, et al. A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering[J]. Scientometrics, 2018, 114(3): 781–794. doi: 10.1007/s11192-017-2611-8
[18]	FAN Xiaoming, WANG Jianyong, PU Xu, et al. On graph-based name disambiguation[J]. Journal of Data and Information Quality, 2011, 2(2): 10. doi: 10.1145/1891879.1891883
[19]	TANG Jie, FONG A C M, WANG Bo, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 975–987. doi: 10.1109/TKDE.2011.13
[20]	HERMANSSON L, KEROLA T, JOHANSSON F, et al. Entity disambiguation in anonymized graphs using graph kernels[C]. The 22nd ACM International Conference on Information & Knowledge Management, San Francisco, USA, 2013: 1037–1046. doi: 10.1145/2505515.2505565.
[21]	KIPF T N and WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv: 1609.02907, 2016.
[22]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[EB/OL]. https://arxiv.org/pdf/1810.04805.pdf, 2019.
[23]	BELTAGY I, LO K, and COHAN A. SciBERT: A pretrained language model for scientific text[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kang, China, 2019: 3615–3620.
[24]	XU Jun, SHEN Siqi, LI Dongsheng, et al. A network-embedding based method for author disambiguation[C]. The 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 2018: 1735–1738. doi: 10.1145/3269206.3269272.