融合递增词汇选择的深度学习中文输入法  

Deep learning Chinese input method with incremental vocabulary selection

在线阅读下载全文

作  者:任华健 郝秀兰[1] 徐稳静 REN Huajian;HAO Xiulan;XU Wenjing(Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources,Huzhou University,Huzhou 313000,China)

机构地区:[1]湖州师范学院浙江省现代农业资源智慧管理与应用研究重点实验室,浙江湖州313000

出  处:《电信科学》2022年第12期56-64,共9页Telecommunications Science

基  金:浙江省现代农业资源智慧管理与应用研究重点实验室基金项目(No.2020E10017)。

摘  要:输入法的核心任务是将用户输入的按键序列转化为汉字序列。应用深度学习算法的输入法在学习长距离依赖和解决数据稀疏问题方面存在优势,然而现有方法仍存在两方面问题,一是采用的拼音切分与转换分离的结构导致了误差传播,二是模型复杂难以满足输入法对实时性的需求。针对上述不足提出了一种融合了递增词汇选择算法的深度学习的输入法模型并对比了多种softmax优化方法。在人民日报数据和中文维基百科数据上进行的实验表明,该模型的转换准确率相较当前最高性能提升了15%,融合递增词汇选择算法使模型在不损失转换精度的同时速度提升了130倍。The core task of an input method is to convert the keystroke sequences typed by users into Chinese cha-racter sequences.Input methods applying deep learning methods have advantages in learning long-range dependen-cies and solving data sparsity problems.However,the existing methods still have two shortcomings:the separation structure of pinyin slicing in conversion leads to error propagation,and the model is complicated to meet the demand for real-time performance of the input method.A deep-learning input method model incorporating incremental word selection methods was proposed to address these shortcomings.Various softmax optimization methods were com-pared.Experiments on People’s Daily data and Chinese Wikipedia data show that the model improves the conversion accuracy by 15%compared with the current state-of-the-art model,and the incremental vocabulary selection method makes the model 130 times faster without losing conversion accuracy.

关 键 词:中文输入法 长短期记忆 词汇选择 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象