检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张鹏翔 ZHANG Pengxiang(Standards&Metrology Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)
机构地区:[1]中国铁道科学研究院集团有限公司标准计量研究所,北京100081
出 处:《中国安全科学学报》2022年第6期109-114,共6页China Safety Science Journal
摘 要:为解决铁路设备事故调查报告数据分析困难的问题,提出基于多维字符特征表示设备事故信息抽取方法,在数据预处理阶段,提出主题模式匹配方法,抽取命名实体所属的主题段落;在文本特征表示中,提出多维特征表示方法将文本转化为特征向量;采用长短时记忆网络(BiLSTM)与条件随机场(CRF)神经网络实现铁路设备事故命名实体识别模型训练;采用铁路设备事故调查报告进行试验验证。结果表明:通过主题模式匹配预处理,多维字符特征+BiLSTM+CRF模型的综合评价指标提升22.86%,多维字符特征表示方法相比word2vec特征表示方法,能够使BiLSTM+CRF模型的综合评价指标提升4.89%。In order to address difficulty in data analysis in investigation reports of railway equipment accidents,an accident information extraction method based on multi-dimensional character feature representation was proposed.Firstly,a subject pattern matching method was put forward for data preprocessing stage to extract subject paragraphs to which named entity belonged.For text feature representation,a multi-dimensional feature representation method was proposed to transform text into feature vector,and training of named entity recognition model was carried out by using bidirection long short term memory(BiLSTM)+conditional random fields(CRF)neural network.Finally,accident investigation report was used for experimental verification.The results show that the comprehensive evaluation index of multi-dimensional character+BiLSTM+CRF model is improved by 22.86%through preprocessing of subject pattern matching.And compared with word2vec feature representation,multi-dimensional one can improve evaluation index of BiLSTM+CRF model by 4.89%.
关 键 词:多维字符特征 铁路设备事故 信息抽取 主题模式匹配 命名实体识别
分 类 号:X928.02[环境科学与工程—安全科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49