基于文本的高速铁路信号设备故障知识抽取方法研究  被引量:22

Research on Knowledge Extraction Method for High-speed Railway Signal Equipment Fault Based on Text

在线阅读下载全文

作  者:李新琴 史天运 李平 代明睿 张晓栋 LI Xinqin;SHI Tianyun;LI Ping;DAI Mingrui;ZHANG Xiaodong(Institute of Computing Technology,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)

机构地区:[1]中国铁道科学研究院集团有限公司电子计算技术研究所,北京100081 [2]中国铁道科学研究院集团有限公司,北京100081

出  处:《铁道学报》2021年第3期92-100,共9页Journal of the China Railway Society

基  金:中国国家铁路集团有限公司重点课题(N2019S008)。

摘  要:针对高速铁路信号设备故障文本数据,提出命名实体与实体关系管道式知识抽取模型。该模型采用统一标注,分别训练命名实体识别与实体关系抽取的策略,以实现信号设备故障知识抽取。定义信号设备故障的知识结构及样本标注方法,提出基于多维字符特征表示的命名实体特征表示方法;采用BiLSTM+CRF实现命名实体识别,提出多维分词特征的实体关系表示方法;基于多维分词特征设计Transformer网络,实现实体关系的抽取。采用高速铁路10年的信号转辙机故障数据进行实验分析,实验结果表明,高速铁路信号设备故障命名实体与关系抽取模型,具有较高的评价指标,可以应用于基于文本的设备故障知识抽取。A pipeline knowledge extraction model of Named Entity and Entity Relationship was proposed for the fault text data of high-speed railway signal equipment.The model implemented fault knowledge extraction of signal equipment by uniform labeling and training Named Entity Recognition and Entity Relation Extraction respectively.The knowledge structure and sample labeling method of signal equipment fault were defined,and a Named Entity feature representation method based on multi-dimensional character feature representation was proposed.In addition,BiLSTM+CRF was adopted to realize the Named Entity Recognition,and the representation method of entity relations based on multi-dimensional word segmentation features was proposed.Furthermore,the transformer network was designed to realize the Entity Relation Extraction based on multi-dimensional word segmentation features.The experimental results from the experimental analysis on the 10-year fault data of signal switch machine of high-speed railway show that the Named Entity and Relation Extraction model for high-speed railway signal equipment fault has high evaluation index and can be applied to text-based fault knowledge extraction.

关 键 词:信号设备故障 知识抽取 多维字符特征 多维分词特征 双向长短时记忆+条件随机场 

分 类 号:U284.92[交通运输工程—交通信息工程及控制]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象