基于逻辑回归的中文在线评论有效性检测模型  被引量:11

Detection model of effectiveness of Chinese online reviews based on logistic regression

在线阅读下载全文

作  者:吴含前[1] 朱云杰[1] 谢珏[2] 

机构地区:[1]东南大学计算机科学与工程学院,南京210018 [2]东南大学-蒙纳士大学苏州联合研究生院,苏州215123

出  处:《东南大学学报(自然科学版)》2015年第3期433-437,共5页Journal of Southeast University:Natural Science Edition

基  金:国家自然科学基金资助项目(60803057);国家高技术研究发展计划(863计划)资助项目(2015AA015904)

摘  要:为了实现电子商务和社交网络中文在线评论有效性的自动化检测,提出了一种单一主题环境下基于逻辑回归的垃圾评论检测模型.中文在线评论有效性的检测可以归结为分类问题,结合中文在线评论的特点提取了9个特征以构建分类模型;为获取核心特征主题的相关度,采用基于关联规则的评论名词模式优化了ICTCLAS中文分词系统的主题识别,进而利用交叉语言模型获取在线评论主题相关度.实验中采取了人为标定的1 000条评论作为样本,把支持向量机分类模型作为对比进行试验,利用数据挖掘工具Weka进行计算.结果表明,采用优化评论名词模式下基于逻辑回归的垃圾评论检测模型结果的准确率达到83.54%,比支持向量机分类模型计算得到的准确率高2.10%.In order to realize automated detection of the effectiveness of Chinese online reviews in the context of e-commerce and social networks,a spam detection model based on logistic regression to solve single topic classification problem is proposed. The detection of effectiveness of Chinese online reviews can be regarded as a classification problem. According to the characteristics of Chinese online reviews,nine features are extracted to build the classification model. In order to extract the core feature-topic relevance,an association rule based reviewterm mode is utilized to optimize the topics identification in ICTCLAS( Institute of Computing Technology,Chinese Lexical Analysis System). The cross language model is then used to retrieve relevancy between online reviewtopics. In the experiment,a sample of 1 000 human-labeled reviews is used,and the support vector machine( SVM) classification model is adopted as a comparison. The calculation results of the data mining tool Weka demonstrate that the accuracy rate of the proposed logistic regression classification model based on the optimized reviewterm classification mode is 83. 54%,which is 2. 10% higher than that of the SVM classification model.

关 键 词:在线评论有效性 逻辑回归 关联规则 

分 类 号:P315.69[天文地球—地震学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象