基于手机终端的中文文本网页自动综述系统的研究  

Automatic Summarization for Chinese Webpage Text Based on Mobile Phone Terminal

在线阅读下载全文

作  者:卢冶[1] 苏勇[1] 须磊[1] 

机构地区:[1]江苏科技大学电气与信息工程学院,张家港215600

出  处:《计算机与数字工程》2013年第6期943-946,995,共5页Computer & Digital Engineering

摘  要:总结了面向中文文本网页的文本综述的生成过程,详细分析了文本预处理、语句相似度计算、局部主题区域发现、差异性获取、综述生成等关键技术。在内容选择上,通过融合关键词和语句的内在特征进行相似度计算来考量语句的相关性;使用文本聚类技术来寻找语句的差异性。同时,基于MyEclipse环境的Java ME平台,结合其轻量级UI工具包LWUIT,使用WTK作为开发工具,设计并实现了基于手机终端的自动综述系统。最后选取了近200篇文献作为测试语料,进行了可接受性评测和基于Q&A的信息性评测,测试结果比较满意。The generation process of multi-document automatic summarization for Chinese webpage text is summed up. Several key techniques are analyzed in detail involving text preprocessing, sentence similarity calculation, topic information and difference detection, and summarization generation. For content selection, on the one hand, it includes how to identify the important content by sentence similarity calculation based on inosculated inherent features about key words and sentence. On the other hand, it also includes how to find the differ- ences between sentences using text clustering. At the same time, on the basis of Java ME platform, combining with LWUIT, a mobile phone terminal based multi-document automatic summarization system by means of WTK is designed and implemented. Then nearly 200 articles are selected and the evaluating methods include quality and information evaluation based on Q&A. Finally the applying of this system gained comparatively satisfactory result.

关 键 词:文本综述 语句相似度 文本聚类 JAVAME LWUIT WTK 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象