基于贝叶斯模型的复句关系词自动识别与规则挖掘  被引量:9

Automatic Identification and Rule Mining for Relation Words of Chinese Compound Sentences Based on Bayesian Model

在线阅读下载全文

作  者:杨进才[1] 郭凯凯[1] 沈显君[1] 胡金柱[1] 

机构地区:[1]华中师范大学计算机学院,武汉430079

出  处:《计算机科学》2015年第7期291-294,F0003,共5页Computer Science

基  金:教育部社科基金(13YJAZH117);国家社科基金(14BYY093)资助

摘  要:复句是汉语语法的重要实体单位,关系词的自动识别是复句标识的基础,对复句的标识以及篇章的研究有重要意义。在对汉语复句语料库进行广泛分析的基础上,从复句关系词所在的环境和关系词的组合搭配方面进行特征的提取,对提取的特征进行形式化描述。采用互信息和信息增益相结合的方式进行特征选择以及冗余特征的消除;使用贝叶斯模型对特征集合进行训练和测试;将基于统计过程的结果转化为规则,形成规则库,并根据规则进行关系词自动识别。实验结果显示,本方法获得了较高的识别正确率,具有可行性和有效性。The compound sentence is an important unit of the Chinese sentence and its annotation is important to the research on comprehending Chinese texts.Identification of relation words is the basis of compound sentence annotation.Based on a comprehensive analysis of Chinese compounds corpus,this paper extracted features of relation words from their context and collocation.Those features are described in formulas.A combination of mutual information with information gains is used for selecting features and eliminating redundant features.The Bayesian model is used for training and testing feature sets.Rules are created from the statistics results,and rule base is configured with rules,which are used for automatic identification of relation words.The experimental results show that our method obtains a high accuracy in identification,which proves the feasibility and effectiveness of the method.

关 键 词:复句关系词 贝叶斯 规则 自动标识 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象