一种改进的Single-Pass算法及其在话题检测与跟踪中的应用  

A Improved Single-Pass Algorithm and Its Applications in Topic Detection and Tracking

在线阅读下载全文

作  者:覃永震 妙全兴[2] QIN Yongzhen MIAO Quanxing(Postgraduate Brigade Department of Information Engineering, Engineering University of PAP, Xi'an 710086, China)

机构地区:[1]武警工程大学研究生管理大队,西安710086 [2]武警工程大学信息工程系,西安710086

出  处:《武警工程大学学报》2016年第6期24-28,共5页Journal of Engineering University of the Chinese People's Armed Police Force

摘  要:针对single-pass算法在网络话题检测与跟踪中聚类精度不高的缺点,提出了一种基于加权系数和维度动态调整的single-pass算法。在对原始文本进行分词、词性标注等常规预处理过程中增加指代消解处理。为适应各类网络文档,根据原始网页文档的类型将待聚类文本的特征词进行动态增减,并对特定的关键特征词赋予较高的权重。通过实验表明,改进算法能明显地适应各类文档且提高了聚类的精度。Aiming at the deficiencies of clustering accuracy of single-pass algorithm in online topic detection and tracking, an improved single-pass algorithm based on weighting coeffi- cient and dynamic adjustment of dimension is proposed. Anaphora resolution treatment is in- tegrated into the conventional pretreatment, including word classifying, part-of-speech tag- ging and other treatment to original texts. Besides, according to different types of original webpage files, dynamic addition and detraction are performed for feature words of clustered texts. And some specific key-words are endowed with higher weights. Experiment has proved that the improved algorithm can significantly adapt to all kinds of files and increase clustering accuracy.

关 键 词:single-pass 指代消解 加权系数 维度动态调整 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象