一种联合语义和关联匹配的工程咨询报告检索模型  

A Retrieval Model of Engineering Consulting Report Based on Joint Semantic and Association Matching

在线阅读下载全文

作  者:张乐 杜一凡 吕学强[1] 李业龙 夏雷 ZHANG Le;DU Yifan;LYU Xueqiang;LI Yelong;XIA Lei(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Engineering Consulting Company,Beijing 100124,China)

机构地区:[1]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101 [2]北京市工程咨询有限公司,北京100124

出  处:《北京邮电大学学报》2024年第2期123-129,共7页Journal of Beijing University of Posts and Telecommunications

基  金:国家自然科学基金项目(62171043);国家语委重点项目(ZDI145-10);北京市教育委员会科学研究计划项目(KM202311232001)。

摘  要:提出了一种面向工程咨询报告的文本检索模型,通过联合语义匹配和关联匹配实现了标题与段落的准确、高效检索,可有效地辅助工程咨询报告的撰写工作。首先,基于工程咨询报告的文本检索语料集,对对比学习模型进行微调,并对标准的基于变换器的双向编码器(Vanilla BERT)模型进行初始化;接着,利用Vanilla BERT模型和线性层对语料文本进行训练,得到语义匹配分数。同时,构建了文本信息和关键词信息的义原词向量表示,并通过深度文本交互模型获得关联匹配分数。将语义匹配分数和关联匹配分数归一化后进行加权融合,得到最终的匹配分数,完成标题与段落之间的文本检索。在所提模型中结合了上下文向量表示和文本交互匹配方法,相较于最优的对比模型,所提模型的P@20评价指标提升了7.49%,有效增强了文本检索的效果。A text retrieval model for engineering consulting reports is proposed,combining semantic and association matchings to achieve accurate and efficient retrieval of titles and paragraphs,and effectively assisting the writing of engineering consulting reports.Based on text retrieval corpus for engineering consulting reports,the comparative learning model is fine-tuned by the corpus set.Then the vanilla bidirectional encoder representations from transformers model(Vanilla BERT)is initialized,the textual data is then trained through the Vanilla BERT model and a linear layer to obtain semantic matching score.At the same time,we build vector representations of semantic primitives for textual and keyword information,and obtain the association matching score through the deep text interaction model.The obtained semantic matching score and association matching score are normalized and then weighted and fused to acquire the final matching score,and the text retrieval between the title and the paragraph is completed.Compared with the optimal comparative model,a combination of contextual vector representation and text interaction matching methods is incorporated,which improves the evaluation index of P@20 by 7.49%and effectively enhances the effects of text retrieval.

关 键 词:文本检索 联合排序 词向量 字向量 义原 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象