基于改进LDA主题模型的产品特征抽取  被引量:7

Product Feature Extraction Based on Improved LDA Topic Model

在线阅读下载全文

作  者:佘维军 刘子平[1] 杨卫芳[1] 

机构地区:[1]重庆大学计算机学院,重庆400030

出  处:《计算机与现代化》2016年第11期1-6,57,共7页Computer and Modernization

基  金:国家自然科学基金资助项目(90818028)

摘  要:针对LDA主题模型用于产品特征抽取中存在的问题,提出将句法分析和主题模型相结合的SA-LDA方法。首先基于句法分析对产品所在类别下的所有产品评论进行分析抽取显式特征,并聚类产生特征集和观点集,据此构建语料库。接着对待分析产品的每条评论,提取主观句并利用改进LDA模型对其主题进行学习,根据语料库构建must-link和cannot-link约束条件,在主题更新时对其进行约束和引导,每个主题对应一个特征类。实验表明,本文方法对显式特征和隐式特征都具有很好的实验效果,且相比传统的方法和其他改进方法在保证召回率的同时对准确率也有一定程度的提高。Aiming at the problems existing in LDA model used to extract product features, a method combined syntactic analysis and topic model, named SA-LDA, is proposed. Firstly, we analyze reviews under products which belong to a category based on syntactic analysis, extract explicit features and cluster them to get feature set and opinion set, and then construct corpus. After that, opinion sentences are extracted to be used for topic clustering, must-link and cannot-link are constructed for guiding the top- ic learning and each topic corresponds to a specific feature cluster. Experiments show that the performance of the method proposed in this paper is good in explicit features and implicit features, and it not only ensures recall rate, but also improves precision score compared to other methods.

关 键 词:潜在狄利克雷分布 主题模型 句法分析 特征抽取 约束条件 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象