检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]江苏科技大学电气与信息工程学院,张家港215600
出 处:《计算机与数字工程》2013年第6期943-946,995,共5页Computer & Digital Engineering
摘 要:总结了面向中文文本网页的文本综述的生成过程,详细分析了文本预处理、语句相似度计算、局部主题区域发现、差异性获取、综述生成等关键技术。在内容选择上,通过融合关键词和语句的内在特征进行相似度计算来考量语句的相关性;使用文本聚类技术来寻找语句的差异性。同时,基于MyEclipse环境的Java ME平台,结合其轻量级UI工具包LWUIT,使用WTK作为开发工具,设计并实现了基于手机终端的自动综述系统。最后选取了近200篇文献作为测试语料,进行了可接受性评测和基于Q&A的信息性评测,测试结果比较满意。The generation process of multi-document automatic summarization for Chinese webpage text is summed up. Several key techniques are analyzed in detail involving text preprocessing, sentence similarity calculation, topic information and difference detection, and summarization generation. For content selection, on the one hand, it includes how to identify the important content by sentence similarity calculation based on inosculated inherent features about key words and sentence. On the other hand, it also includes how to find the differ- ences between sentences using text clustering. At the same time, on the basis of Java ME platform, combining with LWUIT, a mobile phone terminal based multi-document automatic summarization system by means of WTK is designed and implemented. Then nearly 200 articles are selected and the evaluating methods include quality and information evaluation based on Q&A. Finally the applying of this system gained comparatively satisfactory result.
关 键 词:文本综述 语句相似度 文本聚类 JAVAME LWUIT WTK
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30