极大规模词语搭配库的建造和构成分析  被引量:2

Building of the Extremely Large Scale Words Collocation Corpus and Its Composition Analysis

在线阅读下载全文

作  者:徐润华[1] 陈小荷[1] 

机构地区:[1]南京师范大学文学院,江苏南京210097

出  处:《南京师范大学文学院学报》2011年第3期56-61,共6页Journal of School of Chinese Language and Culture Nanjing Normal University

基  金:江苏省哲学社会科学基金一般项目(项目号:10YYB007);;国家社会科学青年基金项目(项目号:10CYY021)和(项目号:11CYY030)的资助

摘  要:大规模词语搭配库的建造在自然语言处理领域的诸多方面都有着迫切的需求。本文利用哈工大、伯克利、斯坦福三所大学分别研制开发的三个句法分析器,对9年《人民日报》语料进行了句法分析,通过对三个分析结果的合并比对得到候选搭配;在此基础上通过参数及类型优选进一步提升搭配精度,最终得到规模约为136万的搭配型数据及相关统计信息并以此构建了词语搭配库。词语搭配库中包含了6种常见类型的搭配数据,并且保证了较好的正确率,可以为其它相关工作提供可靠的数据支持。There is an urgent demand for the building of large scale words collocation corpus in various aspects in the field of natural language processing. Using the three syntax analyzing machine developed respectively by Harbin Institute of Technology, the UC Berkley, and the Stanford University, this paper conducts syntax analysis on the corpora of People' s Daily of 9 years. By merging the three results of analysis we get the collocation candidates, and then uses parameters and optimization to further improve the accuracy of collocation, and finally get a database with the scale of about 2.36 million collocation patterns and relevant statistic information, and have build a word collocation corpus. This database includes 6 common types of collocation data, and can insure a rather good precision rate. It may provide a reliable data support for other relevant works .

关 键 词:词语搭配库 句法分析 搭配类型 最优参数 

分 类 号:H03[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象