检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Deepak Sharma Prakash R. Devale Akhil K. Khare
出 处:《Computer Technology and Application》2011年第8期663-666,共4页计算机技术与应用(英文版)
摘 要:In this paper, the authors are presenting the approach to extract the multiword expression (MWEs) from monolingual corpora. It both validates and generates multiword candidates. The multiword expression provides a list of candidates which are extracted and filtered according to the number of criteria and a set of standard statistical association measures. The generation of the multiword candidates is based on the surface forms, while the validation consists of series of criteria for removing noise using language independent association measures. For generating corpus count, it provides both a corpus indexation facility. Also, this approach allows easy integration with a machine learning tool for thecreation and application of supervised multiword extraction models if annotated data is available. The authors present the use of multiword in a standard configuration, for extracting MWEs from a corpus of general purpose English.
关 键 词:Multiword candidates association measures surface forms monolingual corpora.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171