检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:毛奇[1] 连乐新[1] 周文翠[1] 袁春风[1]
机构地区:[1]南京大学计算机软件新技术国家重点实验室,江苏南京210093
出 处:《中文信息学报》2007年第2期29-34,共6页Journal of Chinese Information Processing
基 金:国家863高技术项目资助(2002AA117010-10);十五攻关教育部科技基础条件平台建设项目资助
摘 要:目前大部分句法解析器都忽略标点符号这一重要的句法特征或者只进行非常简单的处理。本文根据标点符号的句法结构特性,提出单独解析块的概念,并且根据标点符号在句子中的特有特征和位置关系,给出了基于决策树算法(Id3)单独解析块识别方法,将标点融入汉语句法分析中。本文所用的实验数据(包括训练集和测试集)均来自中文宾州树库5.0。对句长大于40个词的汉语长句单独进行了实验,句法分析精度和召回率分别提高1.59%和0.93%,同时时间开销降低了近2/3。实验结果表明,标点对汉语长句句法分析非常有利,系统性能获得了较大提高。So far, most syntactic parsers neglect the punctuations or oversimplify their functions. However, it is actually very important information of syntactic characters. According to the features of punctuation in the syntactic structure, this paper proposes a kind of new concept of separate parsing phrase, and according to the typical character and the position of punctuation in a sentence, this paper also presents one way to identify the separate parsing phrase based on the decision tree algorithm (Id3). In this paper, the punctuation is integrated into syntactic analysis. All the experimental data sets, including the training data and test data, are derived from the Chinese Penn Tree Bank 5.0. The experiments have been done solely using the sentences, the length of which is over 40 Chinese words. The results indicate that the accuracy and the recall rate have been improved by 1.59% and 0.93% respectively, and the time expense has been reduced by nearly 66.6%. The results show that the punctuation is quite useful and effective to parse the long sentences in Chinese.
关 键 词:计算机应用 中文信息处理 句法解析器 单独解析块 决策树(Id3)
分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15