检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:米健霞 谢红薇[1] MI Jianxia;XIE Hongwei(School of Software,Taiyuan University of Technology,Taiyuan 030024,China)
出 处:《计算机工程与应用》2023年第2期314-320,共7页Computer Engineering and Applications
基 金:国家自然科学基金(61872262);山西省自然科学基金(201801D121143)。
摘 要:招标领域中各单位对物料数据的书写方法各不相同,通过对物料数据的实体识别能够实现对物料数据的标准化,为后续的物料查询及分析提供基础。传统的物料命名实体识别方法存在分词不准确,无法有效地处理一词多义,没有考虑中文特有的字形特征等问题,从而影响识别效果。针对上述问题,提出了一种CB-BiLSTM-CRF模型,采用卷积神经网络对汉字的五笔编码进行提取,与BERT所获得的字符特征相结合,以增强不同语境中的语法和语义信息的表征能力,通过BiLSTM模型对组合特征进行深层次提取处理,CRF模型获得最优序列结果。实验结果表明,该模型在收集到的招标领域中物料数据的F1值达到95.82%,优于其他常用模型。同时,在此基础上搭建了“智能物料”在线识别网页平台,用户可以快速在大量数据中提取到有效信息。In the bidding field,each unit has different writing methods for the material data.Through the entity identification of the material data,the standardization of the material data can be realized,which provides a basis for the subsequent material inquiry and analysis.The traditional identification method of named entity of materials has some problems,such as inaccurate word segmentation,unable to deal with polysemy effectively,and failing to consider the unique character characteristics of Chinese characters,which affect the recognition effect.In view of the above problems,a CB-BILSTM-CRF model is proposed,which uses CNN to extract the Wubi encoding of Chinese characters,and combines it with the character features obtained by BERT to enhance the representation ability of grammatical and semantic information in different contexts.The BiLSTM model is used to extract and process the combined features in a deep level.The CRF model obtains the optimal sequence results.The experimental results show that the F1 value of the material data collected by this model reaches 95.82%,which is better than other common models.At the same time,the“intelligent material”online identification web platform is built on this basis,so that users can quickly extract effective information from a large amount of data.
关 键 词:命名实体识别 招标物料识别 BERT预训练模型 双向长短期记忆网络 条件随机场
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.188.113