检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张帆 潘亚雄 胡勇 Zhang Fan;Pan Yaxiong;Hu Yong(College of Cybersecurity,Sichuan University,Chengdu 610065;Chengdu Science and Technology Development Center of China Academy of Engineering and Physics,Chengdu 610200)
机构地区:[1]四川大学网络空间安全学院,成都市610065 [2]中物院成都科学技术发展中心,成都市610041
出 处:《信息安全研究》2020年第5期396-403,共8页Journal of Information Security Research
摘 要:为解决如何从海量新闻报道中检测并追踪到目标话题,选择了自增式聚类Single-Pass算法进行研究.在原有的基础上对其进行改进得到改进后的Single-Pass聚类算法,期望能得到更好的解决方法.对于原有算法进行的改进主要有在新闻文本的特征词选取中加入权重系数表达特征词位置信息,同时辅以时间特征进行新闻文本相似度计算,并且在Single-Pass聚类算法步骤中添加子话题阈值判断过程.实验验证改进后的Single-Pass聚类算法不仅可得到不同粒度的话题聚类效果,同时也提升了聚类效率.实验结果证明,在相同条件下,改进后的Single-Pass聚类算法在漏检率和误检率上有明显的改善.In order to solve the problem of how to detect and track the target topic from massive news reports,an auto-increasing clustering Single-Pass algorithm was selected to research.Based on the improvement of the original Single-Pass clustering algorithm,it is expected to get a better solution.The improvement of the original algorithm mainly includes adding weight coefficients to select feature words in news text to express feature word position information,supplemented by temporal features to calculate similarity of news text,and adding sub-segments in the Single-Pass clustering algorithm Topic threshold judgment process.The experiments verify that the improved Single-Pass clustering algorithm can not only obtain the clustering effect of topics with different granularities,but also improve the clustering efficiency.The experimental results show that under the same conditions,the missed detection rate and false detection rate of the improved Single-Pass clustering algorithm are significantly improved.
关 键 词:新闻话题 Single-Pass聚类算法 时间特征 相似度 子话题
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7