检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:付尧明[1] 陈余杰 侯宽新[1] 蒋正 FU Yao-ming;CHEN Yu-jie;HOU Kuan-xin;JIANG Zheng(Civil Aviation Flight University of China,College of Aviation Engineering,Guanghan Sichuan 618300,China)
机构地区:[1]中国民航飞行学院航空工程学院,四川广汉618300
出 处:《计算机仿真》2025年第1期30-35,125,共7页Computer Simulation
基 金:国家自然科学基金(52105132);国家自然科学基金青年科学基金项目(2022-01~2024-12);中央高校基本科研业务费专项资金资助(J2022-029)。
摘 要:中文分词是对维修文本数据处理的基础任务,面对专业领域语料往往比通用领域涵盖更多的未登录词,例如通航领域语料包含大量口语化或人工合成的结构名、部件名、故障名、工具名等未登录词,是造成分词准确率低的最主要原因。针对以上问题,面向通航领域提出一种基于BERT-BiLSTM-CRF的中文分词模型,首先利用BERT(Bidirectional Encoder Representation from Transformers)预训练模型来获取输入文本的语义特征,其次结合双向长短记忆神经网络学习上下文特征信息,最后通过条件随机场算法(CRF:Conditional RandomField)预测最优序列,提高分词准确性。利用收集通航领域维修文本数据,经过数据处理与文本标注,构建通航领域维修文本数据语料库,并基于此展开对比实验。相较于传统的BiLSTM、BiLSTM-CRF等模型,所提方法得到的综合指标F1值为96.93%,与BiLSTM-CRF相对比提升1.41%。验证了所提方法对通航领域维修文本数据进行分词的有效性。Chinese word segmentation is a fundamental task in the processing of maintenance text data.Professional domain corpora often cover more unregistered words than general domain corpora.For example,aviation domain corpora contain a large number of colloquial or artificially synthesized structure names,component names,fault names,tool names,and other unregistered words,which is the main reason for low word segmentation accuracy.To solve this problem,this paper proposes a Chinese word segmentation model based on BERT BiLSTM-CRF for the navigation field.First,the BERT(Bidirectional Encoder Representation from Transformers)pre-training model is used to obtain the semantic features of the input text.Second,the context feature information is learned by combining the bidirectional long-short memory neural network.Finally,the optimal sequence is predicted by the conditional random field algorithm,improving the accuracy of word segmentation.By collecting maintenance text data in the field of navigation,through data processing and text annotation,a corpus of maintenance text data in the field of navigation is constructed,and comparative experiments are conducted based on this.Compared to traditional models such as BiLSTM and BiLSTM-CRF,the comprehensive index F1 value obtained by the proposed method in this paper is 96.93%,which is 1.41%higher than that of BiLSTM-CRF.The effectiveness of the method proposed in this article is verified for word segmentation of maintenance text data in the navigation field.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222