检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:康海燕[1] 李昊 KANG Haiyan;LI Hao(School of Information Management,Beijing Information Science and Technology University,Beijing 100192,China;School of Computer Science,Beijing Information Science and Technology University,Beijing 100192,China)
机构地区:[1]北京信息科技大学信息管理学院,北京100192 [2]北京信息科技大学计算机学院,北京100192
出 处:《郑州大学学报(理学版)》2020年第1期39-46,共8页Journal of Zhengzhou University:Natural Science Edition
基 金:北京信息科技大学科研水平提高项目(5211910933);国家自然科学基金项目(61370139)
摘 要:提出一种用户性格预测与群体画像方法。该方法将数据挖掘、机器学习和画像技术相结合,首先改进了传统TF-IDF算法没有考虑文章结构的问题,提高网页主题挖掘的准确率;其次根据大五类性格构建“性格-主题-关键词”(PTK)模型,归结不同用户的兴趣属性特征和性格属性特征,并结合用户的基础属性对用户进行综合画像;然后运用K-means方法将拥有相同属性特征的人群进行聚类,描绘在社会中拥有相似特征人群的群体面貌;最后通过实验证明,该方法使用改进的TF-IDF方法对网页文本进行挖掘效果要优于LDA主题模型,而且可以有效对用户的性格进行预测与群体画像。A method of user personality prediction and group profiling was proposed.Data mining,machine learning and user profiling techniques were combined.Firstly,the problem of article structure not considered in traditional TF-IDF algorithm was solved,and the accuracy rate of topic mining was improved.Secondly,the“personality-theme-keywords”(PTK)model was constructed according to the big five character.The comprehensive profiling of the user was formed according to the user’s interest attribute and personality attribute.Finally,the K-means method was used to cluster the groups with the same attribute charactics and describe the group appearance of the groups with similar characteristics in the society.In addition,experiments showed that the improved TF-IDF method was better than LDA topic model for web text mining,and the user’s personality was effectively predicted and the group profiling was effectively formed.
关 键 词:WEB日志 数据挖掘 用户画像 性格预测 TF-IDF K-MEANS
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.88.249