基于后缀树算法的地区微博摘要技术研究  

Research of regional microblog summarization based on Suffix Tree Clustering algorithm

在线阅读下载全文

作  者:高永兵[1] 张贵娟 胡文江[1] 马占飞[2] GAO Yongbing;ZHANG Guijuan;HU Wenjiang;MA Zhanfei(School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou,Inner Mongolia 014010,China;Department of Computer,Baotou Teachers College,Baotou,Inner Mongolia 014010,China)

机构地区:[1]内蒙古科技大学信息工程学院,内蒙古包头014010 [2]包头师范学院计算机系,内蒙古包头014010

出  处:《计算机工程与应用》2018年第9期126-132,144,共8页Computer Engineering and Applications

基  金:国家自然科学基金(No.61163025);内蒙古自治区自然科学基金(No.2015MS0621)

摘  要:地区官方微博中包含了大量相关当地的事件信息,聚合地区官方微博数据可以发掘当地的重要事件;结合地区微博数据地区别称、不同层级,地区标签属性突显等特征提出了基于后缀树算法的地区微博摘要技术研究。利用地区权值树和知网HowNet对地区微博数据进行预处理,将意思相近的词汇进行替换统一;利用后缀树聚类算法STC和奇异值分解SVD对地区微博进行聚类;结合地区微博特征对其综合打分,选取有代表性的微博句子生成摘要。实验验证了该方法的可行性,表明所提出的方法能够很好地识别出当地事件并生成可读性高的事件摘要。A large number of region-related event information is contained by regional official Microblog,aggregating these official Microblog data can find the local important events.Depending on the features of regional Microblog data,such as regional nicknames,multi-levels and distinctive attributes of regional label,the research of region-related Microblog summarization based on Suffix Tree Clustering(STC)algorithm is proposed.Regional Microblog data is preprocessed to integrate similar meanings words using regional weight tree and HowNet.Then clusters are generated by adopting Suffix Tree Clustering and Singular Value Decomposition algorithm.At last the regional Microblog data is comprehensively rated considering its features and the representative Microblog sentences are selected as summary.The experiments prove the feasibility of the proposed method which can effectively identity local event and generate events with high readability.

关 键 词:地区微博 地区权值树 知网 后缀树聚类 摘要 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象