机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116024 [2]大连理工大学公共管理与法学学院,辽宁大连116024 [3]认知智能国家重点实验室(科大讯飞),合肥230088
出 处:《计算机学报》2019年第10期2160-2174,共15页Chinese Journal of Computers
基 金:国家自然科学基金(61632011,61572102,61602078,61562080);认知智能国家重点实验室开放基金(COGOS-20190001);博士后科学基金面上项目(2018M641691);教育部人文社会科学基金青年项目(19YJCZH199);中央高校基本科研业务费专项资金(DUT18ZD102)资助~~
摘 要:近年来,随着生物医学相关研究的快速发展,生物医学文献的数量与日俱增,相关人员从海量文献中获取所需信息变得越来越困难,信息检索技术能够为用户提供所需信息,但由于领域专业度高,术语庞杂,传统通用领域的检索技术往往很难胜任这项任务,而生物医学领域存在丰富的语义资源,这些资源涵盖该领域专业术语,可以对文献检索起到辅助和提升作用.因此,为进一步提升生物医学文献检索的性能,该文尝试基于词共现查询扩展模型,结合生物医学领域特征,利用医学主题词表衡量扩展词的重要性,综合权衡扩展词与查询词的共现关系和扩展词在医学主题词表中的分布情况,选择优质扩展词;并在此基础上提出一种基于组排序学习的监督式查询扩展方法,该方法根据候选扩展词对检索性能的影响和候选扩展词能否反映查询的主题信息两个方面对扩展词进行相关性标注,提取与扩展词相关的上下文特征和领域语义特征对扩展词进行向量化表示,最后采用组排序学习方法训练扩展词选择模型,完成查询扩展.在TREC基因任务数据集上的实验结果表明,该方法能够有效提升查询扩展性能,与基于排序学习方法ListMLE的监督式查询扩展方法相比,在文档平均准确率方面分别提升4.41%和11.35%,有效提升了生物医学文献检索的综合性能.In recent years,with the rapid progress in biomedical research,the number of biomedical literature increases rapidly,which becomes a big problem for researchers to obtain the needed information manually.Traditional information retrieval technologies can hardly achieve ideal performance for biomedical retrieval because of some domain-specific characteristics,especially the mismatching on biomedical terminologies.Query expansion method can deal with the problem by adding relevant terms to interpret users’query and fulfill the information need.Given that biomedicine domain has abundant semantic resources,which contain a large amount of terminologies and may assist the retrieval process,we first propose a novel query expansion model based on co-occurrence model and MeSH thesaurus.The model can help to choose the useful expansion terms by balancing the co-occurrences of terms and the distribution of terms in MeSH.Furthermore,based on the MeSH-based method,we obtain a large set of candidate expansion terms,and proposed to select high-quality expansion terms using group ranking methods for supervised query expansion.Compared with unsupervised query expansion,supervised query expansion takes much more information about candidate expansion terms at the same time to refine the set of expansion terms,and improve the quality of the expanded queries.Specifically,we use a group-based modified ListMLE method to learn the term selection models.ListMLE is a listwise ranking method based on permutation likelihood probability between targeted ranking list and optimum ranking list,and group sampling divides its sample space further by taking one query term with higher relevance label and several terms with lower relevance labels as a group.The modified ListMLE model can thus be learned with more focus on the expansion terms with higher relevance label,which contributes much on the quality of the expanded queries.To give each candidate expansion term a ground truth label,we not only consider the latent impact of the term on retriev
关 键 词:生物医学文献检索 医学主题词表 词共现模型 查询扩展 组排序
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...