基于Web日志的性格预测与群体画像方法研究  被引量:12

Personality Prediction and Group Profiling Method Based on Web Log

在线阅读下载全文

作  者:康海燕[1] 李昊 KANG Haiyan;LI Hao(School of Information Management,Beijing Information Science and Technology University,Beijing 100192,China;School of Computer Science,Beijing Information Science and Technology University,Beijing 100192,China)

机构地区:[1]北京信息科技大学信息管理学院,北京100192 [2]北京信息科技大学计算机学院,北京100192

出  处:《郑州大学学报(理学版)》2020年第1期39-46,共8页Journal of Zhengzhou University:Natural Science Edition

基  金:北京信息科技大学科研水平提高项目(5211910933);国家自然科学基金项目(61370139)

摘  要:提出一种用户性格预测与群体画像方法。该方法将数据挖掘、机器学习和画像技术相结合,首先改进了传统TF-IDF算法没有考虑文章结构的问题,提高网页主题挖掘的准确率;其次根据大五类性格构建“性格-主题-关键词”(PTK)模型,归结不同用户的兴趣属性特征和性格属性特征,并结合用户的基础属性对用户进行综合画像;然后运用K-means方法将拥有相同属性特征的人群进行聚类,描绘在社会中拥有相似特征人群的群体面貌;最后通过实验证明,该方法使用改进的TF-IDF方法对网页文本进行挖掘效果要优于LDA主题模型,而且可以有效对用户的性格进行预测与群体画像。A method of user personality prediction and group profiling was proposed.Data mining,machine learning and user profiling techniques were combined.Firstly,the problem of article structure not considered in traditional TF-IDF algorithm was solved,and the accuracy rate of topic mining was improved.Secondly,the“personality-theme-keywords”(PTK)model was constructed according to the big five character.The comprehensive profiling of the user was formed according to the user’s interest attribute and personality attribute.Finally,the K-means method was used to cluster the groups with the same attribute charactics and describe the group appearance of the groups with similar characteristics in the society.In addition,experiments showed that the improved TF-IDF method was better than LDA topic model for web text mining,and the user’s personality was effectively predicted and the group profiling was effectively formed.

关 键 词:WEB日志 数据挖掘 用户画像 性格预测 TF-IDF K-MEANS 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象