基于标点符号分割的汉语句法分析算法  被引量:7

Chinese Syntactic Parsing Algorithm Based on Segmentation of Punctuation

在线阅读下载全文

作  者:毛奇[1] 连乐新[1] 周文翠[1] 袁春风[1] 

机构地区:[1]南京大学计算机软件新技术国家重点实验室,江苏南京210093

出  处:《中文信息学报》2007年第2期29-34,共6页Journal of Chinese Information Processing

基  金:国家863高技术项目资助(2002AA117010-10);十五攻关教育部科技基础条件平台建设项目资助

摘  要:目前大部分句法解析器都忽略标点符号这一重要的句法特征或者只进行非常简单的处理。本文根据标点符号的句法结构特性,提出单独解析块的概念,并且根据标点符号在句子中的特有特征和位置关系,给出了基于决策树算法(Id3)单独解析块识别方法,将标点融入汉语句法分析中。本文所用的实验数据(包括训练集和测试集)均来自中文宾州树库5.0。对句长大于40个词的汉语长句单独进行了实验,句法分析精度和召回率分别提高1.59%和0.93%,同时时间开销降低了近2/3。实验结果表明,标点对汉语长句句法分析非常有利,系统性能获得了较大提高。So far, most syntactic parsers neglect the punctuations or oversimplify their functions. However, it is actually very important information of syntactic characters. According to the features of punctuation in the syntactic structure, this paper proposes a kind of new concept of separate parsing phrase, and according to the typical character and the position of punctuation in a sentence, this paper also presents one way to identify the separate parsing phrase based on the decision tree algorithm (Id3). In this paper, the punctuation is integrated into syntactic analysis. All the experimental data sets, including the training data and test data, are derived from the Chinese Penn Tree Bank 5.0. The experiments have been done solely using the sentences, the length of which is over 40 Chinese words. The results indicate that the accuracy and the recall rate have been improved by 1.59% and 0.93% respectively, and the time expense has been reduced by nearly 66.6%. The results show that the punctuation is quite useful and effective to parse the long sentences in Chinese.

关 键 词:计算机应用 中文信息处理 句法解析器 单独解析块 决策树(Id3) 

分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象