检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王密平 王昊[1,2] 邓三鸿[1,2] 吴志祥[1,2]
机构地区:[1]南京大学信息管理学院,南京210023 [2]江苏省数据工程与知识服务重点实验室,南京210023
出 处:《现代图书情报技术》2016年第6期28-36,共9页New Technology of Library and Information Service
基 金:江苏省自然科学基金项目"面向专利预警的中文本体学习研究"(项目编号:BK20130587);江苏省"333"工程项目"面向知识服务的中文本体学习研究"(项目编号:BRA2015401)的研究成果之一
摘 要:【目的】探讨冶金领域中文专利术语抽取模型的最优条件,用于有效地抽取冶金领域专利术语。【方法】使用尚不完善的核心语料库,在无需人工标引的情况下,采用条件随机场(CRFs)构建字角色标注的冶金领域中文专利术语识别模型。详细说明模型的构建过程,同时重点对比CFRs的各个因素(特征组合、字长窗口等)对识别效果的影响。【结果】实验结果表明字序列、级别特征、领域特征、温度特征的组合在字长窗口为3,c等于1,f等于1时,准确率达到94.26%,召回率达到94.37%,F1值达到94.5%。【局限】核心词典欠完善,使得部分词语标注不够准确;未与其他方法作详细比较,未详细说明CRFs的可靠性。【结论】CRFs在适当的角色和特征以及特征模板的组合下能较好地识别出冶金领域的中文专利术语。[Objective] This paper proposed a model to extract metallurgy patent terms in Chinese effectively. [Methods] We created the model to automatically identify metallurgy patent terminologies in Chinese with the help of conditional random fields(CRFs) technology. This model was tested with an incomplete core corpus. We discussed the development process and then compared the impacts of various CRFs factors to this character-role-labeled model. [Results] The new model combined the character sequences, level features, areal features and temperature features of the patent terms. Its precision rate was 94.26%, the recall rate was 94.37%, and the FI value was 94.5%, while the length of the proximity window and the values of the parameter c and f were 3, 1, and 1 respectively. [Limitations] Some of the term labels were not accurate enough due to the incomplete core corpus. We did not compare our model with other methods to discuss the reliability of the CRFs. [Conclusions] The CRFs model could effectively identify the metallurgy patent terms in Chinese under appropriate working conditions.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222