检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:覃永震 妙全兴[2] QIN Yongzhen MIAO Quanxing(Postgraduate Brigade Department of Information Engineering, Engineering University of PAP, Xi'an 710086, China)
机构地区:[1]武警工程大学研究生管理大队,西安710086 [2]武警工程大学信息工程系,西安710086
出 处:《武警工程大学学报》2016年第6期24-28,共5页Journal of Engineering University of the Chinese People's Armed Police Force
摘 要:针对single-pass算法在网络话题检测与跟踪中聚类精度不高的缺点,提出了一种基于加权系数和维度动态调整的single-pass算法。在对原始文本进行分词、词性标注等常规预处理过程中增加指代消解处理。为适应各类网络文档,根据原始网页文档的类型将待聚类文本的特征词进行动态增减,并对特定的关键特征词赋予较高的权重。通过实验表明,改进算法能明显地适应各类文档且提高了聚类的精度。Aiming at the deficiencies of clustering accuracy of single-pass algorithm in online topic detection and tracking, an improved single-pass algorithm based on weighting coeffi- cient and dynamic adjustment of dimension is proposed. Anaphora resolution treatment is in- tegrated into the conventional pretreatment, including word classifying, part-of-speech tag- ging and other treatment to original texts. Besides, according to different types of original webpage files, dynamic addition and detraction are performed for feature words of clustered texts. And some specific key-words are endowed with higher weights. Experiment has proved that the improved algorithm can significantly adapt to all kinds of files and increase clustering accuracy.
关 键 词:single-pass 指代消解 加权系数 维度动态调整
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.217.16