支持联机分析处理的推特用户兴趣维层次提取方法

俞东进; 倪智勇; 孙景超

doi:10.11999/JEIT170030

支持联机分析处理的推特用户兴趣维层次提取方法

doi: 10.11999/JEIT170030 cstr: 32379.14.JEIT170030

基金项目:

国家自然科学基金项目(61100043, 61472112)，浙江省自然科学基金资助项目(LY12F02003)，浙江省科技计划重点资助项目(2017C01010, 2016F50014)

计量
- 文章访问数: 1446
- HTML全文浏览量: 206
- PDF下载量: 273
- 被引次数: 0
出版历程
- 收稿日期: 2017-01-11
- 修回日期: 2017-08-16
- 刊出日期: 2017-09-19

Extracting Dimension Hierarchy of Tweeters Interests for On-line Analytical Processing

Funds:

The National Natural Science Foundation of China (61100043, 61472112), The Natural Science Foundation of Zhejiang Province (LY12F02003), The Key Science and Technology Project of Zhejiang Province (2017C01010, 2016F50014)

摘要

摘要: 从海量推特数据中探索用户兴趣的分布规律和相关性有利于实现精确的个性化推荐。联机分析处理(On- Line Analytical Processing, OLAP)提供了一种适合人们探究数据的直观形式。将OLAP技术应用于推特数据的关键是如何挖掘和构建推特用户的兴趣维层次。针对现有方法只能提取单一层次兴趣的不足，该文提出一种支持联机分析处理的推特用户兴趣维层次提取方法。该方法首先通过RestAPI获取推特数据，然后通过改进的LDA(Latent Dirichlet Allocation)模型挖掘用户的兴趣和子兴趣，最后在此基础上构建兴趣维层次结构。实验评估了该方法的模型效果和可扩展性，并证实与LDA和hLDA相比可以更有效地提取出推特用户的兴趣维层次并应用于联机分析处理。
- 联机分析处理 /
- 推特 /
- 维层次 /
- 兴趣 /
- LDA(Latent Dirichlet Allocation)模型
Abstract: To explore the distribution and correlation from massive Twitter data helps the accurate personalized recommendation. On-Line Analytical Processing (OLAP) provides an intuitive form that is suitable for people to explore the Twitter data. The key of applying OLAP to Twitter data is how to mine and build dimension hierarchy of tweeter interests. Different from the existing approaches that can extract interests of tweeters with only one level, an approach to the extraction of dimension hierarchy of interests for OLAP is proposed. Firstly, it retrieves the Twitter data through RestAPI. Afterwards, it detects the interests and sub-interests using an improved (Latent Dirichlet Allocation, LDA) model. Based on the extracted interests and sub-interests it finally constructs the dimension hierarchy of interests. The experiment verifies its effectiveness and scalability, and demonstrates it can extract dimension hierarchy of tweeters interests for OLAP more effectively than LDA and hLDA.
- On-Line Analytical Processing (OLAP) /
- Twitter /
- Dimension hierarchy /
- Interests /
- Latent Dirichlet Allocation (LDA) model

HTML全文

参考文献(17)

ZHANG Yubao, RUAN Xin, WANG Haining, et al. Twitter trends manipulation: A first look inside the security of Twitter trending[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(1): 144-156. doi: 10.1109/ TIFS.2016.2604226.

BEHESHTI S M R, BENATALLAH B, and MOTAHARI- NEZHAD H R. Scalable graph-based OLAP analytics over process execution data[J]. Distributed and Parallel Databases, 2016, 34(3): 379-423. doi: 10.1007/s10619-014-7171-9.

OUKID Lamia, BENBLIDIA Nadjia, BENTAYEB Fadila, et al. Contextualized text OLAP based on information retrieval [J]. International Journal of Data Warehousing and Mining, 2015, 11(2): 1-21. doi: 10.4018/ijdwm.2015040101.

DRZADZEWSKI G and TOMPA F W. Partial materialization for online analytical processing over multi- tagged document collections[J]. Knowledge and Information Systems, 2016, 47(3): 697-732. doi: 10.1007/s10115-015- 0871-2.

SISWANTO E, KHODRA M L, and DEWI L J E. Prediction of interest for dynamic profile of Twitter user[C]. International Conference of Advanced Informatics: Concept, Theory and Application, Bandung, 2014: 266-271.

LIM K H and DATTA A. Interest classification of Twitter users using Wikipedia[C]. International Symposium on Wikis and Open Collaboration, Hong Kong, 2013: 1-2.

PU X, CHATTI M A, US H T, et al. Wiki-LDA: A mixed- method approach for effective interest mining on Twitter data[C]. The 8th International Conference on Computer Supported Education, Rome, 2016: 426-433.

XU Z, RU L, XIANG L, et al. Discovering user interest on Twitter with a modified author-topic model[C]. IEEE/WIC/ ACM International Conference on Web Intelligence, Lyon, 2011: 422-429.

ZHAO W X, JIANG J, WENG J S, et al. Comparing Twitter and traditional media using topic models[C]. The 33rd European Conference on IR Research, Dublin, 2011: 338-349.

BLEI D M, GRIFFITH T L, JORDAN M I, et al. Hierarchical topic models and the nested Chinese restaurant process[C]. International Conference on Neural Information Processing Systems, Vancouver, 2003: 17-24.

OUKID L, BOUSSAID O, BENBLIDIA N, et al. TLabel: A new OLAP aggregation operator in text cubes[J]. International Journal of Data Warehousing and Mining, 2016, 12(4): 54-74. doi: 10.4018/IJDWM.2016100103.

BERBEL TDRL and GONZLEZ SM. How to help end users to get better decisions? personalising OLAP aggregation queries through semantic recommendation of text documents[J]. International Journal of Business Intelligence Data Mining, 2015, 10(1): 1-18. doi: 10.1504/ IJBIDM.2015.069022.

BOUAKKAZ M, LOUDCHER S, and OUINTEN Y. OLAP textual aggregation approach using the Google similarity distance[J]. International Journal of Business Intelligence Data Mining, 2016, 11(1): 31-48. doi: 10.1504/IJBIDM.2016. 076425.

BEN K M, FEKI J, KHROUF K, et al. OLAP of the tweets: from modeling toward exploitation[C]. The 8th International Conference on Research Challenges in Information Science IEEE, Marrakech, 2014: 1-10.

REHMAN N U, MANSMANN S, WEILER A, et al. Building a data warehouse for Twitter stream exploration[C]. IEEE/ ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, 2012: 1341-1348.

REHMAN N U, WEILER A, and SCHOLL M H. OLAPing social media: The case of Twitter[C]. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara, Ontario, Canada, 2013: 1139-1146.

BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(1): 993-1022.

施引文献

资源附件(0)

访问统计