基于小世界模型的复合关键词提取方法研究  被引量:14

Research on a Compound Keywords Detection Method Based on Small World Model

在线阅读下载全文

作  者:马力[1,2] 焦李成[1] 白琳[2] 周雅夫[2] 董洛兵[3] 

机构地区:[1]西安电子科技大学智能信息处理研究所,陕西西安710071 [2]西安邮电学院信息中心,陕西西安710061 [3]西安电子科技大学图书馆,陕西西安710071

出  处:《中文信息学报》2009年第3期121-128,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目(60803162);陕西省自然科学基金资助项目(SJ08-ZT15);陕西省教育厅科研计划资助项目(08JK245)

摘  要:该文提出了一种新的基于小世界网络特性的关键词提取算法。首先,利用K最邻近耦合图构成方式,将文档表示成为词语网络。引入词语聚类系数变化量和平均最短路径变化量来度量词语的重要性,选择重要性大的词语组成候选关键词集。利用侯选关键词集词语位置关系和汉语词性搭配关系,提取出复合关键词。实验结果表明该方法是可行和有效的,获取复合关键词比一般关键词所表达的含义更便于人们对文本的理解。In this paper, a new algorithm is proposed for extracting compound keywords from the Chinese document by the small world network. Using k-nearest-neighbor coupled graph, a Chinese document is first represented as a network: the node represent the term, and the edge represent the co-occurrence of terms. Then, two variables, clustering coefficient increment and average path length increment, are introduced to measure term's importance and to generate the candidate keyword set. With factors such as co-operation between two any terms of part of speech in a sentence and the neighborhood between any two terms of the candidate set, some related words in the candidate set are combined as the compound keywords. The experimental results show that the algorithm is effective and accurate in comparision with the manual keywords extraction from the same document. The semantic representation by the compound keywords of a document is far more clearer than that of single keywords set, facilitating a better compre hension of the document.

关 键 词:计算机应用 中文信息处理 小世界网络 词语网络 平均最短路径变化量 聚类系数变化量 复合关键词 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象