基于不平衡短文本的农业问句分类方法研究  

Research on agricultural question classification method based on unbalanced short text

在线阅读下载全文

作  者:成继福 郭晓娟[2] 周俊明 CHEN Jifu;GUO Xiaojuan;ZHOU Junming(Principal’s Office,Henan Institute of Science and Technology,Xinxiang 453003,China;School of Computer Science and Technology,Henan Institute of Science and Technology,Xinxiang 453003,China;School of Information Engineering,Henan Institute of Science and Technology,Xinxiang 453003,China)

机构地区:[1]河南科技学院校长办公室,河南新乡453003 [2]河南科技学院计算机科学与技术学院,河南新乡453003 [3]河南科技学院信息工程学院,河南新乡453003

出  处:《河南科技学院学报(自然科学版)》2024年第6期38-48,共11页Journal of Henan Institute of Science and Technology(Natural Science Edition)

基  金:河南省科技攻关项目(222102210098,222102210020,212102210431);河南省高等学校重点科研项目计划(20A520013,21A520001)。

摘  要:目的解决中国农技推广信息平台、中国农业信息网等问答社区中农业问句数据快速自动分类问题.方法针对采集的农业数据集中文本长度较短、样本类别不均衡性等问题,提出了一种文本语义信息扩展的方法.根据农业问句文本的特征,该方法采用Word2Vec模型,把问句中的关键词用TextRank算法进行抽取,在Word2Vec模型中查找关键词的近义词,并对关键词进行替换,生成新的同义问句.并用深度学习模型Bi-LSTM、Bi-GRU与增加注意力机制的Bi-LSTM-Att、Bi-GRU-Att和TextRCNN 5种模型对此方法进行验证.结果对比实验结果表明,该方法在5种模型上的Precision、Recall和F1 score均有提升,尤其在Bi-LSTM-Att模型上,Acc和平均F1值分别提升了0.8和2.5个百分点.结论实验结果表明该方法可有效地解决短文本和类别分布不平衡性问题,提高了不平衡短文本分类效果.Objective To address the challenge of rapid automatic classification of agricultural questionnaire data in Q&A communities,such as the China Agricultural Technology Extension Information Platform and the China Agricultural Information Network.Methods We propose a method for extending text semantic information.This method aims to mitigate issues related to short text length and the uneven distribution of sample categories in the collected agricultural dataset.Our approach utilizes the Word2Vec model,tailored to the characteristics of agricultural interrogative sentences.By employing the TextRank algorithm,we extract keywords from these interrogative sentences and identify their near-synonyms within the Word2Vec model.These keywords are then replaced to generate new synonymous interrogative sentences.We validate this method using five deep learning models:Bi-LSTM,Bi-GRU,Bi-LSTM with an attention mechanism(Bi-LSTM-Att),Bi-GRU with an attention mechanism(Bi-GRU-Att),and TextRCNN.Results Comparative experimental results demonstrate that our method improves Precision,Recall,and F1 scores across all five models,with the most significant improvements observed in the Bi-LSTM-Att model.Specifically,the Accuracy and average F1 values for the Bi-LSTM-Att model increased by 0.8 and 2.5 percentage points,respectively.Conclusion These findings indicate that our method effectively addresses the issues of unbalanced short text and category distribution,enhancing the performance of unbalanced short text classification.

关 键 词:短文本分类 不平衡样本 语义信息扩展 农业问句 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象