基于模式分类的汉语时态确定方法研究  被引量:5

A Pattern-classification Based Solution for the Recognition of Tense of the Chinese Language

在线阅读下载全文

作  者:林达真[1] 李绍滋[1] 

机构地区:[1]厦门大学计算机科学系,福建厦门361005

出  处:《中文信息学报》2006年第1期67-75,共9页Journal of Chinese Information Processing

基  金:国家863高科技项目(2001AA114110);福建省自然科学基金资助项目(A0310009);福建省科技重点项目(2001J005)

摘  要:汉语时态是中文信息处理领域的一个难点。基于规则的处理方法在无时态特征词的句子,多时态特征词的句子处理等方面存在很大问题。本文从统计的角度,提出一种基于模式分类的时态确定方法,该方法综合评价句子中每个词对时态确定所作的贡献,能够处理无时态特征词的句子和多时态特征词的句子,并且该方法使用线性判别函数,具有对多维数据分析,训练与判别速度快的特性。在开放测试环境下,对单句的汉语时态确定正确率与召回率分别为79.8%和95.3%。As far as NLP is concerned, the tense of the Chinese language is especially hard to tackle. One of the outstanding characteristics of the Chinese language is that its tense is usually implied rather than obvious. Hence, the Rule-based soludon is far from suitable for the recognition of tense in situations where tense-informing words are missing or more than one of such words are present. In this paper, we introduce a pattem-classification based solution, which evaluates each single word in terms of its contribution to the recognition of tense for the concerned sentence. This solution proves effective when processing sentences containing none or more than one tense-informing words. Furthermore, the implementation of linear discrnninafing function in this solution leads to its abilities of multi-dimensional data processing and training, and helps to achieve decent performance. Evaluated under open conditions, the Precision and the Recall of this solution for single sentences are 79.8% and 95.3%, respectively.

关 键 词:计算机应用 中文信息处理 汉语 时态 特征词 线性判别函数 感知器准则函数 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象