检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《中文信息学报》2006年第1期67-75,共9页Journal of Chinese Information Processing
基 金:国家863高科技项目(2001AA114110);福建省自然科学基金资助项目(A0310009);福建省科技重点项目(2001J005)
摘 要:汉语时态是中文信息处理领域的一个难点。基于规则的处理方法在无时态特征词的句子,多时态特征词的句子处理等方面存在很大问题。本文从统计的角度,提出一种基于模式分类的时态确定方法,该方法综合评价句子中每个词对时态确定所作的贡献,能够处理无时态特征词的句子和多时态特征词的句子,并且该方法使用线性判别函数,具有对多维数据分析,训练与判别速度快的特性。在开放测试环境下,对单句的汉语时态确定正确率与召回率分别为79.8%和95.3%。As far as NLP is concerned, the tense of the Chinese language is especially hard to tackle. One of the outstanding characteristics of the Chinese language is that its tense is usually implied rather than obvious. Hence, the Rule-based soludon is far from suitable for the recognition of tense in situations where tense-informing words are missing or more than one of such words are present. In this paper, we introduce a pattem-classification based solution, which evaluates each single word in terms of its contribution to the recognition of tense for the concerned sentence. This solution proves effective when processing sentences containing none or more than one tense-informing words. Furthermore, the implementation of linear discrnninafing function in this solution leads to its abilities of multi-dimensional data processing and training, and helps to achieve decent performance. Evaluated under open conditions, the Precision and the Recall of this solution for single sentences are 79.8% and 95.3%, respectively.
关 键 词:计算机应用 中文信息处理 汉语 时态 特征词 线性判别函数 感知器准则函数
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.148