基于多特征提取的语料库翻译数据标识系统设计  被引量:1

Design of corpus translation data identification system based on multi feature extraction

在线阅读下载全文

作  者:杜茜[1] 孙洪建[1] 任海涛[2] DU Qian;SUN Hong-jian;REN Hai-tao(Shandong Police College,Jinan 250014,China;Jinan Power Generation Equipment Factory,Jinan 250100,China)

机构地区:[1]山东警察学院,济南250014 [2]济南发电设备厂,济南250100

出  处:《自动化与仪器仪表》2023年第4期112-116,共5页Automation & Instrumentation

基  金:山东警察学院院级科研课题《外宣译者主体性研究》(YSKYB201804)。

摘  要:网络信息的多样性和表达不规范性对数据的挖掘翻译和情感分析造成了较大的难度。研究基于语料库领域特点,以候选情感关联性来进行句子情感词性的表达,并引入Att-BiLSTM算法实现对语义文本信息的特征提取和改进K-SVD算法实现分词特征提取,将语料句子按照词、词性、音节和位置来进行多特征向量建构,并构建起语料库翻译数据标识系统。将研究提出的多特征算法进行性能应用分析,结果表明,研究提出的算法在情感词性数据集下的平均准确率均在75%以上,且其在两种向量和四种向量下的正负性分类准确率分别为89.54%和80.15%,在中文语料库识别中具有超过85%的准确率,英语语料库下的F1值为85.47,数据处理误差均小于其他对比算法。研究提出的算法能有效把握语料句子信息之间的关联性,并从情感性和特征值上进行权重识别,有效为数据翻译提供较为全面的识别依据。The diversity and nonstandard expression of network information have caused great difficulties in data mining,translation and emotion analysis.Based on the characteristics of the corpus field,the research uses candidate emotional relevance to express the emotional part of speech of sentences,introduces the Att BiLSTM algorithm to realize the feature extraction of semantic text information and improves the K-SVD algorithm to realize the feature extraction of word segmentation,constructs multiple feature vectors according to the word,part of speech,syllable and location of the corpus sentences,and constructs a corpus translation data identification system.The performance application analysis of the proposed multi feature algorithm shows that the average accuracy of the proposed algorithm in the emotional part of speech dataset is more than 75%,and its positive and negative classification accuracy in two vectors and four vectors are 89.54%and 80.15%respectively.The accuracy in Chinese corpus recognition is more than 85%,and the F1 value in English corpus is 85.47,and the data processing error is less than other comparison algorithms.The proposed algorithm can effectively grasp the relevance of sentence information in the corpus,and identify the weight from the emotional and eigenvalue,effectively providing a more comprehensive identification basis for data translation.

关 键 词:多特征提取 语料库 翻译数据 标识 Att-BiLSTM 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象