基于语义扩展的汉语全覆盖关键词提取算法  被引量:1

An Algorithm for Chinese Full Cover Keyword Extraction Based on Semantic Extension

在线阅读下载全文

作  者:李言武[1] 郑勇[1] LI Yan-wu;ZHENG Yong(Occupation's technology institute of Anhui Industry and Commerce Electrical and Information Engineering Department,Huainan 23200)

机构地区:[1]安徽工贸职业技术学院电气与电子工程系

出  处:《控制工程》2018年第7期1326-1334,共9页Control Engineering of China

摘  要:针对不利于关键词提取质量的同义词现象、一词多义现象及文章主题难以准确全面表达等问题,提出了一种基于语义的关键词提取算法CFCKE_SE,通过《同义词词林》语义词典与统计信息计算语义的相关度、相似度,获得语义扩展度及其计算方法,融合词汇链方法与语义扩展度,对其依次进行预处理、多义词词义消歧、同义词合并、词汇链构建、有效特征选取和对权重进行综合计算的处理,这样提取出的关键词既能杜绝同义词冗余表达,又能将文本的主题全面而准确地覆盖。实验分析表明,相对于基于词频逆向文件频率(TFIDF)的方法和基于词汇链的方法,基于CFCKE_SE的方法具备更好的提取效果,其实际应用价值较高。Aiming at the problem that the phenomenon of synonym phenomenon, the phenomenon of the word meaning and the accuracy of the article, the key words extraction algorithm based on semantic CFCKE_SE is proposed. Semantic dictionary and statistical information are calculated by using the semantic dictionary and statistical information to calculate the correlation degree, the similarity, the semantic extension and the calculation method, and the combination of lexical chain method and semantic extension,in order to deal with the text, such as word processing, word sense disambiguation, synonym combination, lexical chain construction, effective feature selection and comprehensive calculation of weight, the key words can be avoided, and the theme of the text is fully and accurately. Experimental analysis shows that compared with the method based on TFIDF and the method based on CFCKE_SE, the method based on the method has better extraction effect, and its practical application value is higher.

关 键 词:同义词词林 语义扩展度 词汇链 关键词提取 语义分析 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象