基于N-Gram语言模型的并行自适应新闻话题追踪算法  被引量:11

A parallel adaptive news topic tracking algorithm based on N-Gram language model

在线阅读下载全文

作  者:屈庆涛 刘其成 牟春晓 QU Qingtao;LIU Qicheng;MU Chunxiao(School of Computer and Control Engineering,Yantai University,Yantai 264005,Shandong,China)

机构地区:[1]烟台大学计算机与控制工程学院,山东烟台264005

出  处:《山东大学学报(工学版)》2018年第6期37-43,共7页Journal of Shandong University(Engineering Science)

基  金:山东省自然科学基金(ZR2016FM42);山东省重点研发计划(2016GGX109004);国家海洋局"十三五"海洋经济创新发展示范重点项目(YHC-ZB-P201701);国家自然科学基金(61702439)

摘  要:针对传统的向量空间模型及一元语法模型表示话题的文本特征时忽略词语之间语序关系的问题,提出一种基于NGram语言模型的并行自适应新闻话题追踪算法。使用N-Gram语言模型,利用新闻报道中词语间的语序关系进行文本表示,根据贝叶斯分类算法进行话题追踪,利用最小特征平均可信度阈值更新策略,采用测试新闻报道更新训练集,完善话题模型,并在MapReduce分布式计算模型上予以实现。试验表明,该算法不仅有效地提高了话题追踪效果,而且具有良好的并行加速比和可扩展性。When the traditional vector space model and unigram model expressed the text features of the topic,the word order relations between the words was ignored.In terms of this issue,a parallel adaptive news topic tracking algorithm based on N-Gram language model was proposed.N-Gram language mode was used to express the text features,which made use of word order relations in news reports.The Bayes classification algorithm was applied to conduct topic tracking,with the minimum feature average confidence threshold update strategy,the training set was updated to improve the topic model by using the test news reports.The parallel adaptive news topic tracking algorithm based on N-Gram language model(PATT-Gram)was implemented on the mapreduce distributed computing model.Experiments showed that the algorithm effectively improved the topic tracking effect and had good parallel speedup and scalability.

关 键 词:话题跟踪 N-GRAM语言模型 朴素贝叶斯分类 MapReduce计算模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象