基于词映射构建伪查询改善低资源跨语言信息检索研究  被引量:8

Improve Low-resource Cross-language Information Retrieval by Constructing Pseudo Query Sentences Based on Word Mapping

在线阅读下载全文

作  者:李岩 郭军军[1,2] 余正涛 高盛祥[1,2] LI Yan;GUO Junjun;YU Zhengtao;GAO Shengxiang(School of Information Engineering and Automation,Klunming University of Science and Technology,Kunming 650504,China;Ylonnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650504,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650504 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650504

出  处:《山西大学学报(自然科学版)》2022年第2期322-331,共10页Journal of Shanxi University(Natural Science Edition)

基  金:国家自然科学基金(61761026;61972186;61866020);国家重点研发计划(2019QY1802)。

摘  要:拟基于词映射实现跨语言沟通,缓解缺乏查询-文档语料及语言差异给检索带来的影响,提出一种基于双语交互注意力机制的伪查询句融合方法,通过词映射构造伪查询句,并基于双语交互注意力机制获取跨语言特征表示来实现跨语言信息检索(Cross-language information retrieval,CLIR)。主要包括以下三个部分:首先基于词映射分别构造伪查询句;其次,基于共享Transformer获取查询、伪查询及文档的上下文表示,同时借助查询与伪查询之间的双语交互注意力机制获得查询的跨语言特征表示;最后利用双语交互排序获得查询和文档的匹配分数实现跨语言信息检索。基于英菲、英斯两种低资源CLIR公共数据集和本文构建的汉越数据集的实验结果表明,本文方法相比跨语言检索基线方法,MAP指标分别提升了1.5%和5.4%。This paper aims to use word mapping to achieve cross-language communication,alleviate the impact of lack of query-document align-ment corpus and language differences on information retrieval,and propose a pseudo query sentence fusion method based on bilingual interactive attention mechanism.The pseudo query sentence is constructed through the word mapping,and the cross-language feature representation is obtained according to the bilingual interactive attention mechanism to realize cross-language information retrieval(CLIR).It mainly includes the following three parts:(1) We use the pre-built bilingual mapping dictionary to construct pseudo query sentences based on word-level mapping.(2) We obtain the user query,pseudo query sentences and the contextual feature representations of the foreign document via the shared transformer.A bilingual interactive attention gating mechanism is customized between the query and the pseudo query sentences,beneficial for narrowing the semantic gap of different languages and obtaining the cross-language feature representation of the user query.(3) The bilingual interactive ranking module is exploited to obtain user queries and foreign documents’ matching scores for cross-language information retrieval.To validate the effectiveness of the proposed method,we performed ablative and comparative experiments on the self-made Chinese-Vietnamese CLIR dataset,and obtained our optimal model,and we used the optimal model to conduct comparative experiments on two low-language CLIR public datasets in-cluding English-Tagalog CLIR dataset and English-Swahili CLIR dataset.The results show that comparing to the mainstream methods of cross-language information retrieval,the MAP index of this method has increased by 1.5% and 5.4%,respectively.

关 键 词:跨语言信息检索 词映射 双语交互注意力 伪查询句 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象