基于用字共现频率统计的外国译名自动识别  被引量:1

Automatic identification of transliterated name based on co-occurrence frequency statistics of words

在线阅读下载全文

作  者:陈阳[1] 赵跃华[1] 程显毅[2] 

机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212000 [2]南通大学计算机科学与技术学院,江苏南通226019

出  处:《计算机工程与设计》2012年第1期362-366,共5页Computer Engineering and Design

基  金:国家自然科学基金项目(60702056)

摘  要:为了减少分词的负面效果,提出了基于用字共现频率统计的外国译名自动识别方法。对译名的用字特征进行了统计,提出译名共现字串的概念,并由译名用字表与汉语常用字表得到了非译名用字表。在上述工作的基础上定义了译名的边界,在边界定义的基础上设计了一种对分词错误的调整方法。对开放语料的测试结果表明,与最大词频分词算法相比,该算法在译名识别中的准确率、召回率、F值均有所提高。To reduce the negative impact of segmentation, an automatic recognition algorithm for transliterated name recognition based on co-occurrence frequency statistics of words is presented. Firstly, the statistical features of word of transliterated name are summarized and then the concept of co-occurrence string is proposed. The character table of non-translated name is obtained through the character table of transliterated name and the commnon Chinese character table. Secondly, the boundary of transliterated name is defined based on these above. Finally, an adjustment method is designed to deal with errors of segmentation based on the definition of boundary. The result of experiment is satisfied in comparison with maximum word frequency segmentation algorithm. The recall rate, precision rate and F values of identification are enhanced.

关 键 词:外国译名 分词 共现字串 频率统计 译名边界 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象