检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邸小康 张辉[2] 秦晓婧 齐世杰 王彩虹 程旭 DI Xiaokang;ZHANG Hui;QIN Xiaojing;QI Shijie;WANG Caihong;CHENG Xu(Beijing Digital Agriculture and Rural Promotion Center,Beijing 100101,China;Institute of Data Science and Agricultural Economics,Beijing Academy of Agriculture and Forestry Sciences,Beijing 100097,China)
机构地区:[1]北京市数字农业农村促进中心,北京100101 [2]北京市农林科学院数据科学与农业经济研究所,北京100097
出 处:《农业工程》2023年第6期21-25,共5页AGRICULTURAL ENGINEERING
摘 要:针对农业领域文本中专业术语类关键词提取困难的问题,提出了一种融合新词发现和改进TextRank算法的农业领域关键词提取方法。该算法利用信息熵对文本中的词进行成词概率计算,以此发现领域专有名词和新词,通过人工审核扩充分词字典;在分词字典基础上,改进TextRank算法在词图构建中节点值的计算方法,添加词语位置和词性权重,利用词语综合权重提取文本关键词。对比结果表明,该算法的F值比传统的TF-IDF算法平均提高7.5%,比TextRank算法平均提高9.8%,具有一定的实用性。Aiming at difficulty of agricultural keyword extraction in domain text,an agricultural keyword extraction method was pro-posed,which combined new word discovery and improved TextRank algorithm.The algorithm calculated word formation probability of words in text through information entropy to find domain proper nouns and new words,and expanded word segmentation dictionary through manual audit.Based on word segmentation dictionary,calculation method of TextRank algorithm node value in the construc-tion of word graph was improved,word position and part of speech weight were added,and comprehensive weight of words was used to extract text keywords.Through experimental comparison,F value of this algorithm was 7.5%higher than traditional TF-IDF algorithm on average,and 9.8%higher than TextRank algorithm on average.The algorithm had certain practicability.
关 键 词:提取 新词发现 信息熵 TextRank算法
分 类 号:S126[农业科学—农业基础科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7