检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:薛春香[1]
机构地区:[1]南京理工大学信息管理系,江苏南京210094
出 处:《情报科学》2013年第7期121-125,共5页Information Science
基 金:教育部人文社会科学研究基金青年项目(09YJC870014);江苏省社会科学基金青年项目(09TQC011)
摘 要:报纸文献主题标引、分类标引和命名实体抽取是其内容深加工的主要形式,基于知识库的自动标引是报纸文献标引自动化的一种实现方式。在报纸文献自动标引研究现状基础上提炼出报纸文献自动标引一般流程,提出知识库建设是其实现自动标引的前提。结合报纸文献标引的特点,提出报纸文献标引用知识库应由主题标引库、分类知识库和实体标引库三部分多个词表组成,具有多词表融合、规模大、可扩充、简单易行等特点。同时,就知识库构建中的主题规范表、分类主题对照表和命名实体抽取规则库建设等关键技术进行阐述。Subject indexing, categorization and named entity extraction of newspaper literature are the main forms for its deep content processing. It is a major method that realizes automatic indexing the news- paper literature based on knowledge base. The general flow of automatic indexing for newspaper literature was figured based on the survey of its state of the art. From the flow, it could be found that the construc- tion of knowledge base is the premise of automatic indexing. The knowledge base was composed of subject indexing base, classification base and named entity extraction base which including many vocabularies and word lists. The characteristics of knowledge bases were analyzed in the paper. At last, the key tech- niques, such as the construction of vocabulary for subject control, cross concordances of class numbers and keyword strings and extraction rules for named entity, were expounded.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.200.70