检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:姜春涛[1,2]
机构地区:[1]南京大学计算机科学与技术系,南京210023 [2]江苏省专利信息服务中心,南京210008
出 处:《现代图书情报技术》2015年第10期81-87,共7页New Technology of Library and Information Service
摘 要:【目的】自动标注嵌入中文专利文本中的专利、标准、学术论文、其他专著4类引用信息。【方法】对于专利、标准和其他专著的引用,应用模式匹配的方法标注;对于学术论文的引用,应用由两阶段构成的机器学习方法标注,自动检测含有引用的句子,并从中自动提取6类文献特征信息。【结果】10层交叉验证的结果表明:专利引用标注的精确度和查全度均为100%,标准引用标注的精确度和查全度分别达到92%和94%,而其他专著引用标注的精确度和查全度分别达到80%和71%;标注学术论文引用的精确度和查全度在阶段一分别为95.7%和96.0%,阶段二分别为95.3%和94.9%。【局限】模式匹配方法需要人工分析大量的专利文件,训练数据规模相对较小。【结论】运用模式匹配方法标注专利、标准引用的性能高于92%;运用机器学习方法标注学术论文引用的平均性能达到95%。[Objective] This paper aims to automatically annotate four types of bibliographical references in Chinese patent documents, such as patents, standards, papers, and other monographs public documents. [Methods] Use a pattern matching approach to annotate the references of patents, standards, and public documents, and use a two-phase machine learning approach to annotate the paper references, firstly, automatically detecte the sentences that contain citation information, then extracte 6 categories of bibliographic features from the results. [Results] The results of ten-fold cross validation show that the accuracy for annotating patents is 100%, and the precision and recall for annotating standards is 92% and 94% respectively, while the precision and recall for annotating public documents is 80% and 71% respectively. For annotating paper references, the precision and recall in phase one is 95.7% and 96.0% and in phase two is 95.3% and 94.9% respectively. [Limitations] The pattern matching approach requires analyzing a lot of patent documents manually, and the size of the training model used by the proposed machine learning approach is relatively small. [Conclusions] The performance of annotating patents and standards using a pattern matching approach achieves over 92%, and the performance of annotating papers using a machine learning approach achieves 95%.
关 键 词:专利引用文献提取 专利标注 模式匹配 条件随机场 信息提取
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70