检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周强[1]
机构地区:[1]清华大学计算机系智能技术与系统国家重点实验室,北京100084
出 处:《中文信息学报》2004年第4期1-8,共8页Journal of Chinese Information Processing
基 金:国家自然科学基金资助项目 (6 990 30 0 7;6 0 1 730 0 8) ;国家 973基金资助项目 (G1 9980 30 5 0 7;G1 9980 30 5 0 1A - 0 3) ;国家 86 3计划资助项目 (2 0 0 1AA1 1 4 0 4 0 )
摘 要:语料库的句法标注是语料库语言学研究的前沿课题。本文在研究和总结国内外句法树库标注实践的基础上 ,提出了一套汉语真实文本的句法树标注体系。它以完整的层次结构树为基础 ,对句法树上的每个非终结符节点都给出两个标记 :成分标记和关系标记 ,形成双标记集的句法信息描述体系。目前 ,这两个标记集分别包含了 1 6和 2 7个标记 ,对汉语句子的不同句法组合的外部功能分布和内部组合特点进行了详细描述。在此基础上 ,我们开发完成了 1 0 0万词规模的汉语句法树库TCT 。The syntactically annotated corpora, commonly called ‘treebanks’, play an important role in empirical linguistics as well as in machine learning methods in natural language processing. After a brief summarization of several treebank annotation of different language, we proposed a new annotation scheme for Chinese treebank in this paper. Under this scheme, every Chinese sentence will be annotated with a complete parse tree, where each non terminal constituent is assigned with two tags. One is the syntactic constituent tag, which describes its external functional relation with other constituents in the parse tree. The other is the grammatical relation tag, which describes the internal structural relation of its sub components. These two tag sets consist of 16 and 27 tags respectively. They form an integrated annotation for the syntactic constituent in a parse tree through top down and bottom up descriptions. Based on this scheme, we built a 1,000,000 words Chinese treebank covering a balanced collection of journalistic, literary, academic, and other documents. The annotating experiments on different kinds of complex linguistic phenomena show the availability and compatibility of this annotation scheme.
关 键 词:计算机应用 中文信息处理 句法树库 标注规范 语料库语言学
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222