检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国电子信息产业发展研究院,北京100044
出 处:《计算机工程与应用》2010年第31期130-134,187,共6页Computer Engineering and Applications
基 金:国家自然科学基金No.60872118~~
摘 要:多词表达(MWE)不仅用来提高当前机器翻译系统质量,而且也用于跨语言检索和数据挖掘等其他自然语言处理领域。为此,提出了基于语义模板与基于统计工具相结合的方法从三元组可比语料库中自动提取本族英语MWE。采用基于词表和分布方法计算词语间的相似度,扩大MWE覆盖范围。利用GIZA++对齐算法提取对译的中文MWE,依据统计方法计算互译概率信息,根据概率大小,选择最佳英汉MWE互译对。实验结果表明上述方法可以有效提高MWE提取和对齐的准确率。Multiword Expressions(MWE) are important for practical applications, such as machine translation(henceforth, MT) ,multilingual information retrieval,data mining and other natural language processing.A method of combining semantic template and statistical tool is proposed for automatically extracting native English MWE from three-tuple comparable corpus. Thesaurus-based and distributional methods are harnessed to calculate the semantic relations between words for improving MWE coverage.GIZA++ is executed to align words at sentence level, aiming at obtaining Chinese MWE candidates.For each native English MWE, all of the Chinese MWE candidates are collected and sorted according to their co-occurrence affinity. Only the top one is accepted as true Chinese translation of the given English MWE.Experimental results show the proposed technique improves MWE extraction and alignment efficiently.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.74.90