检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:常兵 CHANG Bing(Department of Information Engineering,Guiyang Institute of Information Science and Technology,Guiyang,Guizhou 550025,China)
机构地区:[1]贵阳信息科技学院信息工程系,贵州贵阳550025
出 处:《自动化应用》2023年第8期159-162,共4页Automation Application
摘 要:政务领域新词的出现往往带有时间、空间、地域等行文特点。如何准确识别政务领域文本中的新词是开展政务智能化业务研究的重要任务之一。本文针对政务语料集的特点,提出一种融合多维度特征的特定领域新词发现方法。首先,获取语料集并进行预处理;其次,进行数据序列化和字符序列,获取候选集和新词种子数据集,完成新词的筛选;最后,结合通用词典完成语料集中新词的词频对齐和映射,获得领域用户词典。本文通过实验和真实领域语料集验证了该方法的有效性。The appearance of new words in the field of government affairs is often characterized by time,space,region,etc.How to accurately identify the new words in the government domain text is one of the important tasks in the research of government intelligent business.Firstly,get the corpus and preprocess it.Secondly,data serialization and character sequence are performed to obtain candidate set and new word seed data set to complete the screening of new words.Finally,complete the word frequency alignment and mapping of new words in the corpus with the general dictionary,obtain the domain user dictionary.The effectiveness of this method is verified by experiments and real domain corpora.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.158