检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨进才[1] 郭凯凯[1] 沈显君[1] 胡金柱[1]
出 处:《计算机科学》2015年第7期291-294,F0003,共5页Computer Science
基 金:教育部社科基金(13YJAZH117);国家社科基金(14BYY093)资助
摘 要:复句是汉语语法的重要实体单位,关系词的自动识别是复句标识的基础,对复句的标识以及篇章的研究有重要意义。在对汉语复句语料库进行广泛分析的基础上,从复句关系词所在的环境和关系词的组合搭配方面进行特征的提取,对提取的特征进行形式化描述。采用互信息和信息增益相结合的方式进行特征选择以及冗余特征的消除;使用贝叶斯模型对特征集合进行训练和测试;将基于统计过程的结果转化为规则,形成规则库,并根据规则进行关系词自动识别。实验结果显示,本方法获得了较高的识别正确率,具有可行性和有效性。The compound sentence is an important unit of the Chinese sentence and its annotation is important to the research on comprehending Chinese texts.Identification of relation words is the basis of compound sentence annotation.Based on a comprehensive analysis of Chinese compounds corpus,this paper extracted features of relation words from their context and collocation.Those features are described in formulas.A combination of mutual information with information gains is used for selecting features and eliminating redundant features.The Bayesian model is used for training and testing feature sets.Rules are created from the statistics results,and rule base is configured with rules,which are used for automatic identification of relation words.The experimental results show that our method obtains a high accuracy in identification,which proves the feasibility and effectiveness of the method.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.198