面向科技语料的短语结构句法分析器被引量：2

A Constituent Parser for Science and Technology Corpus

作　　者：王亚楠[1] 马春鹏[1] 曹海龙[1] 赵铁军[1] WANG YaNan MA ChunPeng CAO HaiLong ZHAO TieJun(Machine Intelligence and Translation Laboratory, Harbin Institute of Technology, Harbin 150001, China)

机构地区：[1]哈尔滨工业大学机器智能与翻译研究室,哈尔滨150001

出　　处：《情报工程》2017年第3期10-20,共11页Technology Intelligence Engineering

基　　金：国家自然科学基金项目(91520204;61572154);863项目(2015AA015405);微软亚洲研究院合作研究计划的资助

摘　　要：本文介绍了一个由哈尔滨工业大学设计和开发的面向科技语料的短语结构句法分析器。与传统的短语结构句法分析器不同,本句法分析器不需要对输入语料进行预处理。给定未经预处理的语料,本句法分析器可以联合地进行分词、词性标注以及短语结构的句法分析。这可以看成是多任务学习的一个实例。此外,针对科技语料的特点,本句法分析器对所使用的特征模板进行了优化,同时构建了面向科技语料的单词内部结构树库。实验结果表明,我们的句法分析器在通用领域的测试集以及科技领域的测试集上均取得了较好的效果。In this paper, we proposed a constituent parser for science and technology corpus, which was designed and developed by Harbin Institute of Technology. Compared with traditional constituent parsers, the parser of this study does not need to pre-processed corpus. Given a raw text as the input, this parser can do the tasks of word segmentation, POS-tagging and constituent parsing simultaneously. This can be regarded as an instance of multi-task learning. Furthermore, based on the characteristics of science and technology corpus, we optimized the feature templates used in our parser, and constructed a new tree-bank of the inner structures of the words in the science and technology corpora. The results of the experiments indicated that our parser performed well both on the corpus of general domain and on the corpus of science/technology domain.

关键词：短语结构句法分析科技语料多任务学习

分类号：G35[文化科学—情报学] TP39[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向科技语料的短语结构句法分析器被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向科技语料的短语结构句法分析器 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向科技语料的短语结构句法分析器被引量：2