检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘思源[1,2,3] 毛存礼 张勇丙 Liu Siyuan;Mao Cunli;Zhang Yongbing(South Asia and Southeast Asia Languages Voice Information Processing Engineering Research Center under the Ministry of Education,Kunming,650000,China;School of Information and Automation,Kunming University of Science and Technology,Kunming,650000,China;Key Laboratory of Artificial Intelligence in Yunnan Province,Kunming University of Science and Technology,Kunming,650000,China)
机构地区:[1]南亚东南亚语言语音信息处理教育部工程研究中心,昆明650000 [2]昆明理工大学信息与自动化学院,昆明650000 [3]云南省人工智能重点实验室,昆明理工大学,昆明650000
出 处:《南京大学学报(自然科学版)》2023年第4期610-619,共10页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金(62166023,61866019);云南省自然科学基金重点项目(2019FA023)。
摘 要:汉越跨境民族文本检索是一类面向领域的跨语言检索任务,旨在以一种语言作为问题查询,检索出另一种语言对应的民族、宗教、文化习俗等跨境民族文档.但在汉越跨境民族文本检索任务中存在大量不常见的领域实体,实体表达形式多样,且中文和越南语两种语言领域实体没有直接对应关系,导致跨语言领域词对齐和语义对齐困难,进而影响汉越跨境民族文本检索模型性能.基于此,提出一种基于领域知识图谱和对比学习的汉越跨境民族文本检索方法.首先,利用多头注意力机制将汉越跨境民族领域知识图谱融入查询和文档,丰富查询和文档中不常见的跨境民族领域实体信息;然后,引入对比学习来解决跨语言查询和文档的语义表征对齐困难问题;最后,将融入知识图谱的查询和文档表征之间的相似度计算作为相关性分数.实验表明,提出的方法和基线模型相比,性能提高了4.1%.Chinese⁃Vietnamese cross⁃border ethnic text retrieval is a type of domain⁃oriented cross⁃language retrieval task,which aims to use one language as a query to retrieve cross⁃border ethnic documents such as ethnicity,religion,and cultural customs corresponding to another language.However,in the Chinese⁃Vietnamese cross⁃border ethnic text retrieval task,there are a large number of uncommon domain entities with various expressions,and there is no direct correspondence between Chinese and Vietnamese language domain entities,which leads to difficulties in word alignment and semantic alignment in cross⁃language domains,and in turn affects the performance of the Chinese⁃Vietnamese cross⁃border ethnic text retrieval model.Based on this,this paper proposes a Chinese⁃Vietnamese cross⁃border ethnic text retrieval method that integrates domain knowledge graphs.First,the multi⁃head attention mechanism is used to integrate the Han⁃Vietnamese cross⁃border ethnic domain knowledge graph into queries and documents,enriching the uncommon cross⁃border ethnic domain entity information in queries and documents.Then,contrastive learning is introduced to address the difficult problem of aligning semantic representations of cross⁃lingual queries and documents.Finally,the similarity between the query and document representation incorporated into the knowledge graph is calculated as a relevance score.Experiments show that the proposed method outperforms the baseline model by 4.1%.
关 键 词:跨境民族文化 跨境民族知识图谱 跨语言检索 对比学习 信息检索
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.181.58