检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京大学计算机软件新技术国家重点实验室,江苏南京210093
出 处:《计算机应用与软件》2010年第4期7-9,22,共4页Computer Applications and Software
基 金:国家863高科技重点项目(2006AA010109);国家自然科学基金(60673043)
摘 要:共坐标上升算法(coordinate ascent algorithm)是一种迭代优化技术,可以用来指导特征权值的训练。提出一种基于该算法的中国人名识别方法,避免了已有的一些方法中人为指定特征权值的问题,更好地体现特征之间存在的隐含关系。该方法从基础语料中获取特征库及成名概率词典,在训练语料上提取相应特征后,采用共坐标上升学习算法训练得到特征权重以及成名阈值参数,运用学习得到的各参数对普通文本中的中国人名进行识别。提出的方法无需对训练语料进行人工标注,在人名识别时也无需进行分词和词性标注处理,代价低、性能优良、有较好的实用性,在开放测试集上F1值达到93.02%。Coordinate ascent algorithm is an iterative optimisation technique,and can be used to guide the features' weights training. In this paper,we proposed a method for Chinese name recognition based on this algorithm,which reflects the hidden dependence among features better,and also avoids the problem in some approaches that the feature weights are manually assigned. The method goes as follows. Primarily,we get features library and character-for-name probability dictionary from the fundamental corpus,after extracting relevant features from training corpus,we use coordinate ascent learning algorithm to auto-generate the feature weights and character-for-name threshold parameter,various parameters derived from learning are utilised to recognise the Chinese names from ordinary text. The method proposed in the paper does not need to manually annotate the training corpus and to process the annotation on participle and lexical categories for named entity recognition,so it costs little with good performance and has preferable practicability. In open test set its F-1 value achieves 93. 02% .
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46