检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:白光祖[1,2] 何远标[3,2] 马建霞[1] 刘建华[3,2] 邹益民[4]
机构地区:[1]中国科学院兰州文献情报中心,兰州730000 [2]中国科学院大学,北京100049 [3]中国科学院文献情报中心,北京100190 [4]浙江师范大学经济与管理学院,金华321004
出 处:《现代图书情报技术》2014年第7期34-40,共7页New Technology of Library and Information Service
基 金:中国科学院西部之光联合学者项目"基于计算情报方法的甘肃省战略新兴产业技术创新竞争与发展研究"(项目编号:Y200201001)的研究成果之一
摘 要:【目的】通过在小样本量下基于机器学习算法实现文摘语句的自动分类,以此实现学术文摘结构的自动识别。【方法】设计多种学术文摘的文本表示特征,利用自然语言处理技术实现特征的自动提取,以此指导朴素贝叶斯、支持向量机模型进行训练,并利用训练模型自动识别文摘结构。【结果】实验证明该方法较之于同类方法能够在较少训练语料下实现较好的识别准确率。【局限】由于文摘中"方法"类别语句缺乏固定的类别特征词与核心动词,导致算法对该类别语句识别准确率较低。【结论】所提方法是一种小样本量情况下行之有效的学术文摘结构自动识别方法。[Objective] This study aims to identify structural contents of scientific abstract automatically by classifying the academic abstracts sentences based on machine learning with limited samples. [Methods] This paper designs a variety of text features to represent scientific abstract sentences, then extracts these features from the academic abstracts based on na^ral language processing techniques so as to instruct Naive Bayesian Model and Support Vector Machines in training, and ultimately identifies the structure of academic abstracts automatically by using these models. [Results] Experiments show that the method can achieve fairly even better recognition accuracy compared with previous methods by using less training corpus. [Limitations] Due to the lack of feature words and core verbs in abstract sentences with "METHOD" class label, it resulted in a lower recognition accuracy on these sentences. [Conclusions] This method is an effective approach to achieve the automatic recognition of academic abstracts structure by using limited corpus.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79