自动文摘系统中的主题划分问题研究  被引量:13

Study on Topic Partition in Automatic Abstracting System

在线阅读下载全文

作  者:傅间莲[1] 陈群秀[1] 

机构地区:[1]清华大学计算机系智能技术与系统国家重点实验室,北京100084

出  处:《中文信息学报》2005年第6期28-35,共8页Journal of Chinese Information Processing

摘  要:随着网络的发展,电子文本大量涌现,自动文摘以迅速、快捷、有效、客观等手工文摘无可比拟的优势,使得其实用价值得到充分体现。而主题划分是自动文摘系统中文本结构分析阶段所要解决的一个重要问题。本文提出了一个通过建立段落向量空间模型,根据连续段落相似度进行文本主题划分的算法,解决了文章的篇章结构分析问题,使得多主题文章的文摘更具内容全面性与结构平衡性。实验结果表明,该算法对多主题文章的主题划分准确率为92.2%,对单主题文章的主题划分准确率为99.1%。With the development of network, electronic text grows rapidly. Since automatic abstraction is superior to manual abstraction for its speed, convenience, efficiency, and impersonality. It has wide applications and such research is becoming a hot topic. Topic partition is a significant problem during text structuring in automatic abstracting system. The paper establishes vector space model for the whole article based on paragraph, then proposes an algorithm for multi-topic text partitioning based on sequential paragraphic similarity. It solves the problem of chapter structural analysis in multi-topic article and makes the abstract of the multi-topic to have more general content and more balanced structure. The experiment on close test shows that the precision of topic partition for multi-topic text and single-topic text reach 92.2% and 99.1% respectively.

关 键 词:计算机应用 中文信息处理 自动文摘 向量空间模型 段落相似度 主题划分 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象