面向标准文本的词性标注集设计  

Design of Part-of-Speech Tagging Set for Standard Text

在线阅读下载全文

作  者:马小雯 袁满 刘彦林 李臻 李慧杰 

机构地区:[1]之江实验室 [2]浙江大学

出  处:《信息技术与标准化》2022年第10期36-42,共7页Information Technology & Standardization

摘  要:围绕标准文本的词性标注,针对通用词性标注集对标准文本标注不适配的问题,基于词性标注任务研究现状,提出一种针对标准文本特点的词性标注集,去除标准语境下不常用的词性类别,新增专有名词的相关设计。该设计目前应用于标准文本内容理解、知识提取等任务,提高了标注结果的可用性,并为后续文本处理任务提供数据基础。Focusing on the part of speech tagging of standard text,aiming at the problem that the general part of speech tagging set are not suitable for standard text tagging,based on the research status of the part of speech tagging task,this paper proposes a part-of-speech tagging set for the characteristics of standard text.The tagging set proposed in our paper removes the parts of speech that are not commonly used in the standard context,and adds the related design of proper nouns.The design is currently applied to tasks such as standard text content understanding and knowledge extraction,which improves the usability of annotation results and provides data basis for subsequent text processing tasks.

关 键 词:标准文本 词性标注 文本预处理 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象