基于词共现矩阵的项目关键词词库和关键词语义网络  被引量:11

Project keyword lexicon and keyword semantic network based on word co-occurrence matrix

在线阅读下载全文

作  者:王庆[1,2] 陈泽亚[1,2] 郭静[1,2] 陈晰[3] 王晶华[3] 

机构地区:[1]中国科学技术大学计算机科学与技术学院,合肥230027 [2]中国科学技术大学苏州研究院,江苏苏州215123 [3]国家电网公司信息通信分公司,北京100761

出  处:《计算机应用》2015年第6期1649-1653,共5页journal of Computer Applications

摘  要:针对专业领域中科技项目的关键词提取和项目词库建立的问题,提出了一种基于语义关系、利用共现矩阵建立项目关键词词库的方法。该方法在传统的基于共现矩阵提取关键词研究的基础上,综合考虑了关键词在文章中的位置、词性以及逆向文件频率(IDF)等因素,对传统算法进行改进。另外,给出一种利用共现矩阵建立关键词关联网络,并通过计算与语义基向量相似度识别热点关键词的方法。使用882篇电力项目数据进行仿真实验,实验结果表明改进后的方法能够有效对科技项目进行关键词提取,建立关键词关联网络,并在准确率、召回率以及平衡F分数(F1-score)等指标上明显优于基于多特征融合的中文文本关键词提取方法。In order to solve the problems of keyword extraction and project keyword lexicon establishment of technological projects in professional fields, an algorithm for building the lexicon based on semantic relation and co-occurrence matrix was proposed. On the basis of conventional keyword extraction research based on co-occurrence matrix, the algorithm considered several advanced factors such as the location, property and Inverse Document Frequency (IDF) index of the keywords to improve the traditional approach. Meanwhile, a method was given for the establishment of keyword semantic network using co- occurrence matrix and hot keyword identification through computing the similarity with semantic base vector. At last, 882 project experiment documents in power field were used to perform the simulation. And the experimental results show that the proposed algorithm can effectively extract the keywords for the technological projects, establish the keyword correlation network, and has better performance in precision, recall rate and Fl-score than the keyword extraction algorithm of Chinese text based on multi-feature fusion.

关 键 词:关键词提取 共现矩阵 关键词词库 关键词语义网络 电力项目 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象