检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]洛阳外国语学院语言工程系,河南洛阳471003 [2]中科院计算技术研究所,北京100049
出 处:《山东大学学报(理学版)》2015年第9期21-28,共8页Journal of Shandong University(Natural Science)
基 金:国家重点基础研究发展计划(973计划)项目(2014CB340400;2012CB316303);国家自然科学基金重点项目(61232010);国家自然科学基金面上项目(61173064);国家科技支撑计划项目(2012BAH39B04)
摘 要:在跨语言文本分析任务中,多词短语比单个词汇歧义小,语义表达更加准确,有助于提高文本理解的准确性。现有方法主要关注单个词的跨语言对齐。将多词短语抽取和跨语言对齐相融合,提出了一种基于多策略过滤的汉日多词短语抽取和对齐的方法。首先从一个语种出发,通过重复串、左右邻接熵、内部关联度、多词嵌套、停用词等方法提取并过滤得到具备完整语义的多词短语,然后利用平行语料库计算汉日多词短语的相似度,实现跨语言对齐。在整个过程中可结合日语语言规则与特点,根据语料规模、相关领域对过滤阈值进行动态调整,提高了多词短语的领域适用性。实验结果表明,该方法可有效抽取汉日多词短语并进行准确对齐,以多词短语为对齐单元,语义表达更完整,实用价值更大。On the task of cross-language text analysis,a multi-word phrase is less ambiguous and more accurate than a single word,which helps to understand the text more accurately. Existing methods mainly focus on cross-language alignment of single words. This paper presents an extraction and alignment method for Chinese-Japanese multi-word phrases based on multi-strategy filtering,which combines the multi-word phrases extraction and cross-language alignment. Firstly,we get multi-word phrases with complete semantics using repeated string,left-right adjacent entropy,internal relationship,multi-word nesting,stop-word method etc. Secondly,we use the parallel corpus to compute the similarity of Chinese-Japanese multi-word phrases,to achieve cross-language alignment. In the process,according to the rules and characteristics of the Japanese language,we dynamically adjust the threshold according to corpus' size and related domains,in order to improve the applicability of multi-word phrases. The experimental results showthat this method is effective to extract Chinese-Japanese multi-word phrases as the alignment unit,which makes the semantic expression more complete and more practical value.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.118