基于SWN理论提取复合关键字系统的设计与实现  被引量:4

Design and implement of a system extracting keywords using SWN Theory

在线阅读下载全文

作  者:周雅夫[1] 马力[2] 董洛兵[3] 

机构地区:[1]西安邮电学院计算机系,陕西西安710121 [2]西安邮电学院信息中心,陕西西安710121 [3]西安电子科技大学图书馆,陕西西安710071

出  处:《西安邮电学院学报》2007年第5期82-86,共5页Journal of Xi'an Institute of Posts and Telecommunications

摘  要:实现了一个利用小世界网络模型(SWN)提取中文文档的关键字的系统。小世界网络模型具有两个统计性质:平均路径长度和聚类系数。本系统使用的算法首先对文档进行分词,以分词之间的相邻关系为边、以分词为节点构造文档结构图。然后计算每一个分词的平均路径长度变化量和聚类系数变化量,并且使用这两个变化量作为提取关键字的标准,最后按照一定策略合并关键字成复合关键字。本文首先详细介绍了小世界网络模型的概念和在关键字提取方面的应用,然后介绍了本系统的设计与实现,最后通过实验证明了该算法的正确性和有效性。By using a model of small world network (SWN), a system extracting keywords from Chinese documents is implemented. SWN has two statistical properties which are Average Path Length and Average Clustering Coefficient. Firstly a Chinese document is decomposed into single terms, and it is represented by a network: the nodes represent terms, and the edges represent the co - occurrence of terms, which can describe the semantic association relation between single terms of the document. Next the Average Path Length and Average Clustering Coefficient of each term are computed, and they are used to extracting keywords. Finally the extracted keywords are combined as compound keywords. This paper first introduces concepts about small world network and describes the application of SWN in keyword extracting. Then it shows the way to design and implement the system in detail. The experiment results show that the system is both reasonable and effective.

关 键 词:小世界网络 关键字提取 平均路径长度变化量 聚类系数变化量 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象