基于深度主动学习的柬语单文档抽取式摘要方法  被引量:1

EXTRACTIVE SUMMARY OF KHMER SINGLE DOCUMENT BASED ON DEEP ACTIVE LEARNING

在线阅读下载全文

作  者:余兵兵 严馨[1,2] 周枫[1,2] 徐广义 莫源源[4,5] Yu Bingbing;Yan Xin;Zhou Feng;Xu Guangyi;Mo Yuanyuan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Provincial Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650040,Yunnan,China;Yunnan Nantian Electronic Information Industry Co.,Ltd.,Kunming 650040,Yunnan,China;School of Southeast&South Asia Languages and Culture,Yunnan Minzu University,Kunming 650500,Yunnan,China;Institute of Language Studies,Shanghai Normal University,Shanghai 200234,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650040 [3]云南南天电子信息产业股份有限公司,云南昆明650040 [4]云南民族大学东南亚南亚语言文化学院,云南昆明650500 [5]上海师范大学语言研究所,上海200234

出  处:《计算机应用与软件》2021年第4期165-170,189,共7页Computer Applications and Software

基  金:国家自然科学基金项目(61562049,61462055)。

摘  要:深层神经网络在文档摘要方面取得了很好的效果,其优势只有在大数据集下才能显示出来。为了解决在使用深度学习做柬语单文档抽取式摘要时语料标注不足的问题,提出一种将主动学习和深度学习相结合的方法。利用主动学习抽样策略选择出定量的文档,通过专家标注,结合深度学习中编码器解码器模型进行训练模型抽取得到摘要。实验结果表明,在训练语料显著标注不足的情况下,该方法能够有效地提升柬语单文档摘要的质量。The deep neural network has made a lot of progress in document summarization,and its advantages can only be displayed under the big dataset.In order to solve the problem that the Khmer uses the deep learning to make the single document extraction abstract corpus insufficient labeling,a method combining active learning and deep learning is proposed.The active learning sampling strategy was used to select the quantitative documents,marked by the experts,then combined with the encoder decoder model in deep learning,and the training model was extracted to obtain a summary.The experimental results show that even if the training corpus is not markedly marked,the result of extracting the abstract can effectively improve the quality of the Khmer single document abstract.

关 键 词:柬语 主动学习 单文档摘要 深度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象