检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:彭俊峰 俞凯[1] 李国靖 PENG Jun-Feng;YU Kai;LI Guo-Jing(School of Information Science and Technology,Hangzhou Normal University,Hangzhou 311121 China)
机构地区:[1]杭州师范大学信息科学与技术学院,杭州311121
出 处:《计算机系统应用》2025年第2期135-144,共10页Computer Systems & Applications
摘 要:关键句抽取技术是指利用人工智能,自动从一段长文本中寻找核心句.该技术可用于信息检索的预处理,对文本分类、抽取式摘要等下游任务有着重要意义.传统的无监督关键句抽取技术多数基于统计学以及图模型的方法,存在着精度不高以及需要提前建立大规模语料库等问题.本文提出了一种中文环境下的无监督抽取关键句方法T5KSEChinese,该方法利用编码器-解码器架构,通过输入和输出提示词来忽略目标句与原文长度不匹配的问题,以得到更准确的结果.同时,本文提出一种对比学习正样本构造方式,并将该方式结合对比学习来对模型编码器部分进行半监督训练,提升下游任务效果.本研究使用轻量化的模型,在无监督下游任务中得分优于参数量大于自身数十倍的大语言模型,最终实验结果证明了提出方法的准确度和可靠性.Key sentence extraction technology refers to using artificial intelligence to automatically find key sentences from a long text.This technology can be used for preprocessing information retrieval and is of great significance for downstream tasks such as text classification and extractive summarization.Traditional unsupervised key sentence extraction technologies are mostly based on statistics and graphical model methods,which have problems such as low accuracy and the need to build a large-scale corpus in advance.This study proposes T5KSEChinese,a method that can extract key sentences without supervision in the Chinese context.This method uses an encoder-decoder architecture to ignore the mismatch in length between the target sentence and the original text by inputting and outputting prompt words to obtain more accurate results.At the same time,a contrastive learning positive sample construction method is also proposed and combined with contrastive learning to conduct semi-supervised training on the encoder part of the model,which can improve the performance of downstream tasks.The method uses lightweight models to outperform the large language model with tens of times the number of parameters in the unsupervised downstream task.The final experimental results prove the accuracy and reliability of the proposed method.
关 键 词:关键句抽取 生成式预训练语言模型 对比学习 正样本 编码器-解码器
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38