检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴呈 王朝坤[1] 王沐贤 WU Cheng;WANG Chaokun;WANG Muxian(School of Software,Tsinghua University,Beijing 100084,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
机构地区:[1]清华大学软件学院,北京100084 [2]哈尔滨工业大学计算机学院,哈尔滨150001
出 处:《计算机工程与应用》2020年第21期115-122,共8页Computer Engineering and Applications
基 金:国家自然科学基金(No.61872207);国家重点研发计划(No.2017YFC0820402)。
摘 要:研究了非结构化中文文本的实体属性抽取方法。引入文本化简作为抽取的预处理过程,解决传统信息抽取方法因为长难句的存在和自然语言表述多样性导致抽取效果不佳的问题。其中,文本化简被建模为一个序列到序列(seq2seq)的翻译过程,并用机器翻译领域的seq2seq-RNN模型进行实现。为了提升模型的化简效果,进行了不同层面的优化,包括使用预训练词向量、收集常用词汇表、引入词性标注和设计化简评分函数,这些优化使模型专注于化简过程中句法转换的学习。针对化简后的文本,设计基于简洁规则的方法进行信息元组和实体属性抽取。实验表明,对seq2seq-RNN的改进能提升文本化简的效果,而且在化简文本上抽取的信息数量比在原始文本上的多,信息也比较精确。In this paper,the method of entity attributes extraction on unstructured Chinese text is studied.Text Simplification(TS)is introduced as the pretreatment process of extraction to solve the problem that traditional information extraction methods are ineffective because of the existence of long and difficult sentences and the diversity of natural language expressions.TS is modeled as a sequence to sequence(seq2seq)procedure,and is implemented with the seq2seq-RNN model in the machine translation field.To improve the model,several strategies,including pre-trained word vectors,common vocabulary,POS tagging and simplifying scoring function,are introduced to make the model focus more on syntax transformation during TS.For the simplified text,a simple rule-based method is used to perform information tuple extraction,and later entity attributes are extracted from those tuples.The experimental results show that the improvements on seq2seq-RNN achieve better performance on text simplification,and the amount of information extracted from the simplified text is more than the original text,while the information is more accurate.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44