基于组块及记忆的词性自动标注  

Block and memory based part of speech tagging

在线阅读下载全文

作  者:石晶[1] 戴国忠[1] 

机构地区:[1]中国科学院软件研究所人机交互技术与智能信息处理实验室,北京100080

出  处:《吉林大学学报(工学版)》2006年第4期560-563,共4页Journal of Jilin University:Engineering and Technology Edition

基  金:国家自然科学基金资助项目(60373056)

摘  要:基于组块及记忆的模型(BMM)采用与传统方法明显不同的标注思路,以汉语中的整句为处理单元,从组块出发,立足于单个词汇,分析更为丰富的上下文语境知识,并借助知网词典记忆词性集合,同时采用渐增式的机械学习方式获取参数值。对于棘手的稀疏数据问题只简单地设置平伏常数加以平滑,最后利用少量人工规则修正标注结果。实验表明,该模型的封闭式测试准确率将近99%,开放式测试准确率为95%以上。Automatic part-of-speech tagging is often applied to natural language processing. The approach of Block and Memory based Model (BMM) is other than that of the traditional models. BMM takes a whole Chinese sentence as a processing unit. Each word is considered respectively in a more abundant and informative context environment. The lexicon of WordNet is employed to store the tag sets, and, to improve the efficiency, the incremental learning method is applied to obtain parameters. A constant is given to smooth the sparse data and some handcrafting rules are used to amend the results. Experiments show that the accuracy of close test is about 99% and the accuracy of open test is higher than 95%.

关 键 词:人工智能 词性自动标注 基于组块及记忆的模型 渐增式学习 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象