融合抽取式和抽象式的藏文摘要算法  

Tibetan summarization algorithm combining extractive and abstractive methods

在线阅读下载全文

作  者:高一鸣 魏志恒 多拉 王文强[4] 左祥建 贾星星[1,2,3] GAO Yiming;WEI Zhiheng;DUO La;WANG Wenqiang;ZUO Xiangjian;JIA Xingxing(School of Mathematics and Statistics,Lanzhou University,Lanzhou 730000,P.R.China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining 810000,P.R.China;Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,Xining 810000,P.R.China;School of Cyber Science and Technology,Sun Yat-sen University,Shenzhen 210000,P.R.China;School of Cybersecurity and Information Law,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China)

机构地区:[1]兰州大学数学与统计学院,兰州730000 [2]省部共建藏语智能信息处理及应用国家重点实验室,西宁810000 [3]青海省藏文信息处理与机器翻译重点实验室,西宁810000 [4]中山大学网络空间安全学院,广东深圳210000 [5]重庆邮电大学网络空间安全与信息法学院,重庆400065

出  处:《重庆邮电大学学报(自然科学版)》2024年第6期1215-1222,共8页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基  金:国家自然科学基金项目(61902164);藏文文本分类关键技术研究项目(2023-Z-004)。

摘  要:为了推动文本摘要技术在藏文领域的发展,采用两阶段微调的方法,构建了一种融合抽取式和抽象式的藏文摘要模型(BERT-ext-abs),保留了摘要的流畅性和语义一致性。训练抽取式藏文摘要模型BERT-ext,在此基础上进行第二次微调,得到抽象式藏文摘要模型BERT-ext-abs。从训练模型结构和数据规模两个角度分别设置对比实验,结果表明,相较于未经过二次微调的抽象式藏文摘要模型BERT-abs, BERT-ext-abs模型在ROUGE-1分数上提高了3.23%,在BERT Score分数上提高了0.95%。此外,与BERT-abs相比,BERT-ext-abs的模型参数量和训练数据量更少,能更高效地生成流畅且语义一致的摘要。To advance text summarization technology in the Tibetan language, this study employs a two-stage fine-tuning approach to develop a Tibetan summarization model that integrates extractive and abstractive techniques, ensuring both fluency and semantic consistency in summaries. An extractive Tibetan summarization model, BERT-ext, was trained first, followed by a second fine-tuning stage to create the abstractive model, BERT-ext-abs. Comparative experiments were conducted in terms of model structure and dataset size. Results indicate that, compared to the purely abstractive Tibetan summarization model, BERT-abs, the BERT-ext-abs model achieves a 3.23% improvement in ROUGE-1 score and a 0.95% increase in BERT Score. Additionally, the BERT-ext-abs model requires fewer parameters and less training data than BERT-abs, enabling it to generate fluent and semantically consistent summaries more efficiently.

关 键 词:抽取式摘要 抽象式摘要 预训练模型 双向编码器表征法 藏文 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TN919[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象