基于长距离依赖条件随机域的文本信息抽取  被引量:2

TEXT INFORMATION EXTRACTION BASED ON CONDITIONAL RANDOM FIELDS WITH LONG-DISTANCE DEPENDENCIES

在线阅读下载全文

作  者:朱道辉[1] 肖基毅[1] 程阳[2] 吴诗祥 

机构地区:[1]南华大学计算机科学与技术学院,湖南衡阳421001 [2]广西师范大学生命科学学院,广西桂林541004 [3]武冈市大田乡中心小学,湖南武冈422400

出  处:《计算机应用与软件》2011年第5期203-205,共3页Computer Applications and Software

摘  要:信息抽取中,同一token在文本中可能出现多次,且token多次出现的位置通常相隔很远,传统线性链CRF模型由于Markov假设不能表达长距离依赖关系于是将多次出现的同一token分开标注,丧失了全局信息。提出了长距离依赖条件随机域模型,该模型能结合多次出现的同一token各处的特征,对其进行联合标注。由于长距离依赖使得精确的标注算法不可计算,采用了TRP估计算法。实验表明该模型抽取性能优于线性链CRF模型,尤其是speaker域上的召回率有了很大的提高。In information extraction,a token may occur multiple times in a document and usually there is long distance among multiple occurrences of a same token.Traditional linear-chain CRFs' models annotate the multiple occurrences of the same token separately at the cost of losing global information because it cannot represent long-distance dependent relations among labels under the Markov assumption.We present a CRF model with long-distance dependencies.This model can collectively annotate a token for all of its occurrences by combining its features everywhere.Because of the long-distance dependencies,precise annotation algorithm computation becomes unavailable,so we adopt the estimation algorithm as TRP.Experiment shows,our model performs better in extraction performance than the linear-chain CRFs,in particular,its recall on speaker field is improved quite a lot.

关 键 词:长距离依赖 条件随机域 线性链 同一token 文本 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象