汉语句法树库标注体系  被引量:90

Annotation Scheme for Chinese Treebank

在线阅读下载全文

作  者:周强[1] 

机构地区:[1]清华大学计算机系智能技术与系统国家重点实验室,北京100084

出  处:《中文信息学报》2004年第4期1-8,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目 (6 990 30 0 7;6 0 1 730 0 8) ;国家 973基金资助项目 (G1 9980 30 5 0 7;G1 9980 30 5 0 1A - 0 3) ;国家 86 3计划资助项目 (2 0 0 1AA1 1 4 0 4 0 )

摘  要:语料库的句法标注是语料库语言学研究的前沿课题。本文在研究和总结国内外句法树库标注实践的基础上 ,提出了一套汉语真实文本的句法树标注体系。它以完整的层次结构树为基础 ,对句法树上的每个非终结符节点都给出两个标记 :成分标记和关系标记 ,形成双标记集的句法信息描述体系。目前 ,这两个标记集分别包含了 1 6和 2 7个标记 ,对汉语句子的不同句法组合的外部功能分布和内部组合特点进行了详细描述。在此基础上 ,我们开发完成了 1 0 0万词规模的汉语句法树库TCT 。The syntactically annotated corpora, commonly called ‘treebanks’, play an important role in empirical linguistics as well as in machine learning methods in natural language processing. After a brief summarization of several treebank annotation of different language, we proposed a new annotation scheme for Chinese treebank in this paper. Under this scheme, every Chinese sentence will be annotated with a complete parse tree, where each non terminal constituent is assigned with two tags. One is the syntactic constituent tag, which describes its external functional relation with other constituents in the parse tree. The other is the grammatical relation tag, which describes the internal structural relation of its sub components. These two tag sets consist of 16 and 27 tags respectively. They form an integrated annotation for the syntactic constituent in a parse tree through top down and bottom up descriptions. Based on this scheme, we built a 1,000,000 words Chinese treebank covering a balanced collection of journalistic, literary, academic, and other documents. The annotating experiments on different kinds of complex linguistic phenomena show the availability and compatibility of this annotation scheme.

关 键 词:计算机应用 中文信息处理 句法树库 标注规范 语料库语言学 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象