融合新词发现和改进TextRank算法的农业领域关键词提取算法  被引量:1

Agricultural Keyword Extraction Algorithm Combining New Word Discovery and Improved TextRank

在线阅读下载全文

作  者:邸小康 张辉[2] 秦晓婧 齐世杰 王彩虹 程旭 DI Xiaokang;ZHANG Hui;QIN Xiaojing;QI Shijie;WANG Caihong;CHENG Xu(Beijing Digital Agriculture and Rural Promotion Center,Beijing 100101,China;Institute of Data Science and Agricultural Economics,Beijing Academy of Agriculture and Forestry Sciences,Beijing 100097,China)

机构地区:[1]北京市数字农业农村促进中心,北京100101 [2]北京市农林科学院数据科学与农业经济研究所,北京100097

出  处:《农业工程》2023年第6期21-25,共5页AGRICULTURAL ENGINEERING

摘  要:针对农业领域文本中专业术语类关键词提取困难的问题,提出了一种融合新词发现和改进TextRank算法的农业领域关键词提取方法。该算法利用信息熵对文本中的词进行成词概率计算,以此发现领域专有名词和新词,通过人工审核扩充分词字典;在分词字典基础上,改进TextRank算法在词图构建中节点值的计算方法,添加词语位置和词性权重,利用词语综合权重提取文本关键词。对比结果表明,该算法的F值比传统的TF-IDF算法平均提高7.5%,比TextRank算法平均提高9.8%,具有一定的实用性。Aiming at difficulty of agricultural keyword extraction in domain text,an agricultural keyword extraction method was pro-posed,which combined new word discovery and improved TextRank algorithm.The algorithm calculated word formation probability of words in text through information entropy to find domain proper nouns and new words,and expanded word segmentation dictionary through manual audit.Based on word segmentation dictionary,calculation method of TextRank algorithm node value in the construc-tion of word graph was improved,word position and part of speech weight were added,and comprehensive weight of words was used to extract text keywords.Through experimental comparison,F value of this algorithm was 7.5%higher than traditional TF-IDF algorithm on average,and 9.8%higher than TextRank algorithm on average.The algorithm had certain practicability.

关 键 词:提取 新词发现 信息熵 TextRank算法 

分 类 号:S126[农业科学—农业基础科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象