融合多维度特征的特定领域新词发现方法  被引量:1

Specific Areas New Words Discovery Based on Multi-Dimensional Features

在线阅读下载全文

作  者:常兵 CHANG Bing(Department of Information Engineering,Guiyang Institute of Information Science and Technology,Guiyang,Guizhou 550025,China)

机构地区:[1]贵阳信息科技学院信息工程系,贵州贵阳550025

出  处:《自动化应用》2023年第8期159-162,共4页Automation Application

摘  要:政务领域新词的出现往往带有时间、空间、地域等行文特点。如何准确识别政务领域文本中的新词是开展政务智能化业务研究的重要任务之一。本文针对政务语料集的特点,提出一种融合多维度特征的特定领域新词发现方法。首先,获取语料集并进行预处理;其次,进行数据序列化和字符序列,获取候选集和新词种子数据集,完成新词的筛选;最后,结合通用词典完成语料集中新词的词频对齐和映射,获得领域用户词典。本文通过实验和真实领域语料集验证了该方法的有效性。The appearance of new words in the field of government affairs is often characterized by time,space,region,etc.How to accurately identify the new words in the government domain text is one of the important tasks in the research of government intelligent business.Firstly,get the corpus and preprocess it.Secondly,data serialization and character sequence are performed to obtain candidate set and new word seed data set to complete the screening of new words.Finally,complete the word frequency alignment and mapping of new words in the corpus with the general dictionary,obtain the domain user dictionary.The effectiveness of this method is verified by experiments and real domain corpora.

关 键 词:特定领域 新词发现 多维度特征 词频统计 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象