检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京信息科技大学智能信息处理研究所,北京100101
出 处:《计算机学报》2017年第4期911-924,共14页Chinese Journal of Computers
基 金:国家自然科学基金(61070119;61370139);北京市属高等学校创新团队建设与教师职业发展计划(IDHT20130519)资助~~
摘 要:中文文本语义错误侦测一直以来都是中文文本自动查错的难点.该文针对中文文本语义错误,提出了一种基于语义搭配知识库和证据理论的语义错误侦测模型.讨论了三层语义搭配知识库的构建以及基于该知识库和证据理论的语义错误侦测算法.三层语义搭配知识库的构建主要分为两步:(1)根据《现代汉语实词搭配词典》中的实词搭配框架构建词语搭配规则集,从训练语料中抽取词语搭配,并利用互信息和共现频次进行筛选,构建词语搭配知识库;(2)利用《HowNet》抽取词语的义原信息,生成词语-义原和义原-义原搭配知识库,并利用聚合度进行二次筛选.在三层语义搭配知识库的基础上,首先对知识库采用自顶向下的搜索模式确定可能错误的语义搭配,然后使用语义搭配的互信息量MI和聚合度PD作为证据,采用统计的方法建立证据信任分配函数,结合证据的冲突处理和加权分配D-S规则进行不确定性推理,获取词语的语义搭配关联强度,以判定是否存在语义错误.实验结果显示,该文所提出的查错模型和算法的F-Score值比其他文献中的最好值提高了14.02%.Chinese text semantic error detection is always the difficult point of Chinese text automatic error detection.In this paper,a semantic error detection model is proposed based on semantic knowledge base and D-S theory.We discuss the building method of the three layers semantic collocation knowledge base and the semantic error detection algorithm based on the three layers semantic collocation knowledge base and D-S theory.Construction of three layers semantic collocation knowledge base is divided into two steps:(1)According to the notional collocation frame in Modern Chinese Dictionary of Notional Words Collocation to construct words collocation rule set,extract collocations from the training corpus based on the rule set,and building the collocation knowledge base through filtering the some collocations by mutual information and co-occurrence frequency;(2)use HowNet to extract the sememe information of word in order to generate the word-sememe and the sememe-sememe knowledge base,and use the polymerization degree model to do the second level filtering.On the basis of the three layers semantic generate knowledge base,a top-down search pattern is used to identify the possible errors firstly,and then the semantic collocation mutual information MIand polymerization degree PD are used as evidences,adopt statistical method to generate basic probability assignment,combining the evidence conflict resolution and the weighted distribution D-S rules to get the relevancy of semantic collocation todetermine whether there is a semantic error in Chinese text.The experimental result shows that the F-Score values of the error detecting model and algorithm proposed in this paper improved14.02%than the best values in the literature.
关 键 词:语义错误 知识库 D-S理论 语义搭配 错误侦测算法 自然语言处理 社会媒体
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.106.206