基于机器学习的技术术语识别研究综述  被引量:15

Review of Technology Term Recognition Studies Based on Machine Learning

在线阅读下载全文

作  者:胡雅敏 吴晓燕 陈方[1,2] Hu Yamin;Wu Xiaoyan;Chen Fang(Chengdu Library and Information Center,Chinese Academy of Sciences,Chengdu 610041,China;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]中国科学院成都文献情报中心,成都610041 [2]中国科学院大学经济与管理学院图书情报与档案管理系,北京100190

出  处:《数据分析与知识发现》2022年第2期7-17,共11页Data Analysis and Knowledge Discovery

摘  要:【目的】梳理机器学习算法在技术术语识别中的应用现状与前景。【文献范围】在WOS核心库和CNKI数据库中,以“technology term*recognition”、“技术术语识别”为检索词检索文献,并延伸阅读相关算法文献,共筛选62篇代表性文献进行述评。【方法】类比命名实体识别研究,归纳机器学习在技术术语识别中的应用和区别,从算法分类、一般流程、现存问题和下游应用4个方面进行梳理,并展望未来的应用前景。【结果】应用算法可分为单一的统计机器学习、单一深度学习和两者结合的混合算法,应用最广泛的是两者结合的混合算法,主流的模型代表是BiLSTM-CRF模型,迁移学习是未来重要的研究方向。【局限】深度学习快速发展,混合模型不断涌现,所归纳的算法模型仅为应用较为广泛的算法,并未逐一列出。【结论】现有方法仍然有诸多待优化研究的问题,应加强细粒度的实体识别、特征表示方法、评估方法和开源工具包等方面的研究。[Objective]This paper reviews the status quo and future directions of technology term recognition studies based on machine learning.[Coverage]We searched“technology term*recognition”in Chinese and English with the Web of Science and CNKI.Then,we expanded our search to include the relevant algorithms literature.A total of 62 representative papers were chosen for this review.[Methods]We summarized the application and differences of machine learning in technology term recognition,and then examined it from four prospects:the classification of algorithms,general procedures,the existing problems,and downstream applications.Finally,we discussed the development trends and future studies.[Results]The algorithms can be divided into single statistical machine learning,single deep learning and hybrid algorithms.The most widely used algorithm is the hybrid method,i.e.,the BiLSTM-CRF model.Transfer learning is an important research direction in the future.[Limitations]With the rapid progress of deep learning,hybrid models are constantly emerging,this paper only summarized the popular ones.[Conclusions]There are many issues needs to be addressed.In the future,research on fine-grained entity recognition,feature representation,evaluation and open source toolkits should be strengthened.

关 键 词:技术术语识别 机器学习 深度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象