面向短文本的命名实体识别  被引量:18

Named entity recognition for short text

在线阅读下载全文

作  者:王丹[1] 樊兴华[1] 

机构地区:[1]重庆邮电大学计算机科学与技术研究所,重庆400065

出  处:《计算机应用》2009年第1期143-145,171,共4页journal of Computer Applications

基  金:国家自然科学基金资助项目(60703010);重庆市自然科学基金资助项目(2006BB2374);重庆市教委科学技术研究项目(KJ070519);教育部回国留学人员启动基金资助项目(教外司留【2007】1109号)

摘  要:针对短文本命名实体识别这项紧缺任务,提出了一种面向短文本的快速有效的命名实体识别方法。该方法主要分成三步:第一步,针对短文本表达不规范特性对命名实体识别的干扰,采取去干扰字符,化繁为简等规范化操作。第二步,针对短文本语意不完整特性,提出用HMM(隐马尔可夫模型)以词性做观察值进行初步命名实体识别。第三步,据初步识别结果,构建拼音同指关系库来识别潜在实体。在由8464篇短文本构成的测试集上运行的实验表明,该方法能较好地进行短文本命名实体识别。Aiming at the urgent task of named entity recognition for short text, a fast and effective method was proposed. The method comprised three steps: Firstly, according to the disturbance of non-standard expression in short text, the elimination of interferential characters and text simplification were adopted. Secondly, according to the non-integrity of short text, Hidden Markov Model (HMM) was employed to preliminarily name entity recognition, in which the part of speech was used as observed value. In the end, by means of the preliminary recognition result, a pinyin co-referential relation library, was established to identify the potential entity. The experiment on the test-set including 8464 short texts shows that this method has better performance to named entity recognition for short text.

关 键 词:短文本 隐马尔可夫模型 命名实体识别 拼音同指关系库 词性 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象