检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘昕 徐洪珍[1,2] 刘爱华 邓德军 LIU Xin;XU Hongzhen;LIU Aihua;DENG Dejun(School of Information Engineering,East China University of Technology,Nanchang 330013,China;School of Software,East China University of Technology,Nanchang 330013,China)
机构地区:[1]东华理工大学信息工程学院,江西南昌330013 [2]东华理工大学软件学院,江西南昌330013
出 处:《郑州大学学报(工学版)》2024年第3期89-95,共7页Journal of Zhengzhou University(Engineering Science)
基 金:国家自然科学基金资助项目(62066003);江西省教育厅科技计划项目(GJJ160554);江西省抚州市人才计划项目(2021ED008);江西省网络空间安全智能感知重点实验室室开放项目(JKLCIP202202)。
摘 要:地质命名实体识别中常用的基于BERT预训练模型的深度学习方法是基于字的方法,没有利用词信息,且神经网络中的Dropout机制会导致训练阶段和推理阶段之间存在不一致性。针对该问题,提出了一种基于MacBERT和R-Drop的地质命名实体识别模型MBCR。首先,通过MacBERT学习文本特征表示,充分利用字词信息;其次,运用BiGRU编码上下文特征,有效提取完整的语义信息;最后,采用CRF获取标签间的依赖关系,生成最优标签序列。此外,在训练过程中引入R-Drop,进一步提升模型的泛化能力。结果表明:与BiLSTM-CRF、BERTBiLSTM-CRF等模型相比,所提MBCR模型在NERdata数据集上的F1值提高了2.08百分点~4.62百分点,在Boson数据集上的F1值提高了1.26百分点~17.54百分点。The commonly used deep learning methods based on BERT pre-trained model in geological named entity recognition were character-based approaches,and could not utilize word-level information.Additionally,the dropout mechanism in neural networks might cause inconsistency between the training and inference stage.To address this issue,a geological named entity recognition model MBCR based on MacBERT and R-Drop was proposed.Firstly,MacBERT was used to learn text feature representations,which could fully utilize character and word information.Then,BiGRU was employed to encode context features,effectively extracting complete semantic information.Subsequently,CRF was adopted to capture dependencies between labels and generate the optimal label sequence.Moreover,R-Drop was introduced during the training process to further enhance the model′s generalization capabilities.Compared with BiLSTM-CRF,BERT-BiLSTM-CRF,and other models,the proposed MBCR model improved the F1-score on the NERdata dataset by 2.08-4.62 percentage points and on the Boson dataset by 1.26-17.54 percentage points.
关 键 词:命名实体识别 地质 MacBERT BiGRU R-Drop
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.9.230