检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]武汉大学信息管理学院
出 处:《图书情报工作》2018年第1期96-105,共10页Library and Information Service
基 金:国家社会科学基金重大项目“面向学科领域的网络信息资源深度聚合与服务研究”(项目编号:12&ZD221)研究成果之一
摘 要:[目的/意义]为帮助读者从热点事件产生的海量微博报道中快速了解事件的来龙去脉,提高微博事件摘要的准确性和可读性,提出一种基于事件要素的多模型微博热点事件时间轴摘要提取方法。[方法/过程]针对微博文本特征,结合主题模型(LDA)与互信息最大熵模型(MaRxEnt-MI)的特点提取事件摘要关键词,以微博传播价值和主体相关性为标志筛选微博,以时间-摘要关键词——摘要微博的形式生成时间轴摘要。[结果/结论]利用人工标注的测试集,与传统是TextRank方法进行对比,F值提高8%-13%,内部测试表明摘要可读性提高明显。实验文本和测试集的数量及事件丰富度需要进一步扩展,应考虑更多的加权策略模型以提高摘要的准确性。实验结果及测试反馈表明,本文的方法能很好满足用户对热点事件摘要信息需求,提高微博摘要提取的准确率。[ Purpose/significance] In order to help the readers understand the contexts of the news event on micro-blog platform and improve readability and accuracy of micro-blog event summary, we propose a method for extracting the event summary organized by time axis based on event elements. [ Method/process ] Based on the characteristics of micro-blog text, we combine both advantages and disadvantages of the LDA and mutual information maximum entropy model (MaxEnt-MI) and extract event summary keywords, screening micro-blog with micro-blog communication value and theme relevance and generating event summary in the form of time-keywords-mircro-blog. [ Result/conclusion] Comparing with the traditional TextRank method in the artificially labeled test set, we find the F value increased by 8% to 13%, and the internal tests show that the roadability of the abstracts is significantly improved. The number of experimental texts and test sets and the richness of the event need to be further expanded, and more weighting strategies should be considered in order to improve the accuracy of the abstracts. The experimental results and the test results show that the proposed method is feasible and effective, which can meet the needs of the users for the hot event summary information, and improve the accuracy of the micro-blog abstract extraction.
关 键 词:文本挖掘 事件摘要 潜在狄利克·雷分布 互信息最大熵模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28