检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《南京师范大学文学院学报》2011年第3期56-61,共6页Journal of School of Chinese Language and Culture Nanjing Normal University
基 金:江苏省哲学社会科学基金一般项目(项目号:10YYB007);;国家社会科学青年基金项目(项目号:10CYY021)和(项目号:11CYY030)的资助
摘 要:大规模词语搭配库的建造在自然语言处理领域的诸多方面都有着迫切的需求。本文利用哈工大、伯克利、斯坦福三所大学分别研制开发的三个句法分析器,对9年《人民日报》语料进行了句法分析,通过对三个分析结果的合并比对得到候选搭配;在此基础上通过参数及类型优选进一步提升搭配精度,最终得到规模约为136万的搭配型数据及相关统计信息并以此构建了词语搭配库。词语搭配库中包含了6种常见类型的搭配数据,并且保证了较好的正确率,可以为其它相关工作提供可靠的数据支持。There is an urgent demand for the building of large scale words collocation corpus in various aspects in the field of natural language processing. Using the three syntax analyzing machine developed respectively by Harbin Institute of Technology, the UC Berkley, and the Stanford University, this paper conducts syntax analysis on the corpora of People' s Daily of 9 years. By merging the three results of analysis we get the collocation candidates, and then uses parameters and optimization to further improve the accuracy of collocation, and finally get a database with the scale of about 2.36 million collocation patterns and relevant statistic information, and have build a word collocation corpus. This database includes 6 common types of collocation data, and can insure a rather good precision rate. It may provide a reliable data support for other relevant works .
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.185.96