检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:屈庆涛 刘其成 牟春晓 QU Qingtao;LIU Qicheng;MU Chunxiao(School of Computer and Control Engineering,Yantai University,Yantai 264005,Shandong,China)
机构地区:[1]烟台大学计算机与控制工程学院,山东烟台264005
出 处:《山东大学学报(工学版)》2018年第6期37-43,共7页Journal of Shandong University(Engineering Science)
基 金:山东省自然科学基金(ZR2016FM42);山东省重点研发计划(2016GGX109004);国家海洋局"十三五"海洋经济创新发展示范重点项目(YHC-ZB-P201701);国家自然科学基金(61702439)
摘 要:针对传统的向量空间模型及一元语法模型表示话题的文本特征时忽略词语之间语序关系的问题,提出一种基于NGram语言模型的并行自适应新闻话题追踪算法。使用N-Gram语言模型,利用新闻报道中词语间的语序关系进行文本表示,根据贝叶斯分类算法进行话题追踪,利用最小特征平均可信度阈值更新策略,采用测试新闻报道更新训练集,完善话题模型,并在MapReduce分布式计算模型上予以实现。试验表明,该算法不仅有效地提高了话题追踪效果,而且具有良好的并行加速比和可扩展性。When the traditional vector space model and unigram model expressed the text features of the topic,the word order relations between the words was ignored.In terms of this issue,a parallel adaptive news topic tracking algorithm based on N-Gram language model was proposed.N-Gram language mode was used to express the text features,which made use of word order relations in news reports.The Bayes classification algorithm was applied to conduct topic tracking,with the minimum feature average confidence threshold update strategy,the training set was updated to improve the topic model by using the test news reports.The parallel adaptive news topic tracking algorithm based on N-Gram language model(PATT-Gram)was implemented on the mapreduce distributed computing model.Experiments showed that the algorithm effectively improved the topic tracking effect and had good parallel speedup and scalability.
关 键 词:话题跟踪 N-GRAM语言模型 朴素贝叶斯分类 MapReduce计算模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7