多维字符特征表示的铁路设备事故信息抽取方法  被引量:7

Information extraction method for railway equipment accidents based on multi-dimensional character feature representation

在线阅读下载全文

作  者:张鹏翔 ZHANG Pengxiang(Standards&Metrology Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)

机构地区:[1]中国铁道科学研究院集团有限公司标准计量研究所,北京100081

出  处:《中国安全科学学报》2022年第6期109-114,共6页China Safety Science Journal

摘  要:为解决铁路设备事故调查报告数据分析困难的问题,提出基于多维字符特征表示设备事故信息抽取方法,在数据预处理阶段,提出主题模式匹配方法,抽取命名实体所属的主题段落;在文本特征表示中,提出多维特征表示方法将文本转化为特征向量;采用长短时记忆网络(BiLSTM)与条件随机场(CRF)神经网络实现铁路设备事故命名实体识别模型训练;采用铁路设备事故调查报告进行试验验证。结果表明:通过主题模式匹配预处理,多维字符特征+BiLSTM+CRF模型的综合评价指标提升22.86%,多维字符特征表示方法相比word2vec特征表示方法,能够使BiLSTM+CRF模型的综合评价指标提升4.89%。In order to address difficulty in data analysis in investigation reports of railway equipment accidents,an accident information extraction method based on multi-dimensional character feature representation was proposed.Firstly,a subject pattern matching method was put forward for data preprocessing stage to extract subject paragraphs to which named entity belonged.For text feature representation,a multi-dimensional feature representation method was proposed to transform text into feature vector,and training of named entity recognition model was carried out by using bidirection long short term memory(BiLSTM)+conditional random fields(CRF)neural network.Finally,accident investigation report was used for experimental verification.The results show that the comprehensive evaluation index of multi-dimensional character+BiLSTM+CRF model is improved by 22.86%through preprocessing of subject pattern matching.And compared with word2vec feature representation,multi-dimensional one can improve evaluation index of BiLSTM+CRF model by 4.89%.

关 键 词:多维字符特征 铁路设备事故 信息抽取 主题模式匹配 命名实体识别 

分 类 号:X928.02[环境科学与工程—安全科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象