一种基于共坐标上升算法的人名识别方法  被引量:2

NAME ENTITY RECOGNITION BASED ON COORDINATE ASCENT ALGORITHM

在线阅读下载全文

作  者:戴播[1] 毛奇[1] 袁春风[1] 

机构地区:[1]南京大学计算机软件新技术国家重点实验室,江苏南京210093

出  处:《计算机应用与软件》2010年第4期7-9,22,共4页Computer Applications and Software

基  金:国家863高科技重点项目(2006AA010109);国家自然科学基金(60673043)

摘  要:共坐标上升算法(coordinate ascent algorithm)是一种迭代优化技术,可以用来指导特征权值的训练。提出一种基于该算法的中国人名识别方法,避免了已有的一些方法中人为指定特征权值的问题,更好地体现特征之间存在的隐含关系。该方法从基础语料中获取特征库及成名概率词典,在训练语料上提取相应特征后,采用共坐标上升学习算法训练得到特征权重以及成名阈值参数,运用学习得到的各参数对普通文本中的中国人名进行识别。提出的方法无需对训练语料进行人工标注,在人名识别时也无需进行分词和词性标注处理,代价低、性能优良、有较好的实用性,在开放测试集上F1值达到93.02%。Coordinate ascent algorithm is an iterative optimisation technique,and can be used to guide the features' weights training. In this paper,we proposed a method for Chinese name recognition based on this algorithm,which reflects the hidden dependence among features better,and also avoids the problem in some approaches that the feature weights are manually assigned. The method goes as follows. Primarily,we get features library and character-for-name probability dictionary from the fundamental corpus,after extracting relevant features from training corpus,we use coordinate ascent learning algorithm to auto-generate the feature weights and character-for-name threshold parameter,various parameters derived from learning are utilised to recognise the Chinese names from ordinary text. The method proposed in the paper does not need to manually annotate the training corpus and to process the annotation on participle and lexical categories for named entity recognition,so it costs little with good performance and has preferable practicability. In open test set its F-1 value achieves 93. 02% .

关 键 词:共坐标上升算法 人名识别 特征权重训练 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象