中文文本语义错误侦测方法研究被引量：20

Study of Semantic Error Detecting Method for Chinese Text

出　　处：《计算机学报》2017年第4期911-924,共14页Chinese Journal of Computers

基　　金：国家自然科学基金(61070119;61370139);北京市属高等学校创新团队建设与教师职业发展计划(IDHT20130519)资助~~

摘　　要：中文文本语义错误侦测一直以来都是中文文本自动查错的难点.该文针对中文文本语义错误,提出了一种基于语义搭配知识库和证据理论的语义错误侦测模型.讨论了三层语义搭配知识库的构建以及基于该知识库和证据理论的语义错误侦测算法.三层语义搭配知识库的构建主要分为两步:(1)根据《现代汉语实词搭配词典》中的实词搭配框架构建词语搭配规则集,从训练语料中抽取词语搭配,并利用互信息和共现频次进行筛选,构建词语搭配知识库;(2)利用《HowNet》抽取词语的义原信息,生成词语-义原和义原-义原搭配知识库,并利用聚合度进行二次筛选.在三层语义搭配知识库的基础上,首先对知识库采用自顶向下的搜索模式确定可能错误的语义搭配,然后使用语义搭配的互信息量MI和聚合度PD作为证据,采用统计的方法建立证据信任分配函数,结合证据的冲突处理和加权分配D-S规则进行不确定性推理,获取词语的语义搭配关联强度,以判定是否存在语义错误.实验结果显示,该文所提出的查错模型和算法的F-Score值比其他文献中的最好值提高了14.02%.Chinese text semantic error detection is always the difficult point of Chinese text automatic error detection.In this paper,a semantic error detection model is proposed based on semantic knowledge base and D-S theory.We discuss the building method of the three layers semantic collocation knowledge base and the semantic error detection algorithm based on the three layers semantic collocation knowledge base and D-S theory.Construction of three layers semantic collocation knowledge base is divided into two steps：（1）According to the notional collocation frame in Modern Chinese Dictionary of Notional Words Collocation to construct words collocation rule set,extract collocations from the training corpus based on the rule set,and building the collocation knowledge base through filtering the some collocations by mutual information and co-occurrence frequency;（2）use HowNet to extract the sememe information of word in order to generate the word-sememe and the sememe-sememe knowledge base,and use the polymerization degree model to do the second level filtering.On the basis of the three layers semantic generate knowledge base,a top-down search pattern is used to identify the possible errors firstly,and then the semantic collocation mutual information MIand polymerization degree PD are used as evidences,adopt statistical method to generate basic probability assignment,combining the evidence conflict resolution and the weighted distribution D-S rules to get the relevancy of semantic collocation todetermine whether there is a semantic error in Chinese text.The experimental result shows that the F-Score values of the error detecting model and algorithm proposed in this paper improved14.02%than the best values in the literature.

关键词：语义错误知识库 D-S理论语义搭配错误侦测算法自然语言处理社会媒体

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本语义错误侦测方法研究被引量：20

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

中文文本语义错误侦测方法研究 被引量：20

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

中文文本语义错误侦测方法研究被引量：20