基于语义的汉语文献主题词提取算法研究被引量：16

Algorithm of Thematic Words Extraction from Chinese Texts Based on Semantic

机构地区：[1]长春工业大学计算机科学与工程学院,长春130012 [2]吉林大学计算机科学与技术学院,长春130012

出　　处：《吉林大学学报（信息科学版）》2005年第5期535-540,共6页Journal of Jilin University（Information Science Edition）

基　　金：国家档案局科技基金资助项目

摘　　要：为了适应信息时代的迅速发展,提高从汉语文献中自动提取主题词的准确率,给出一种基于语义理解的汉语文献主题词自动提取算法模型。该模型以中文文本为处理对象,结合领域背景,构建概念语义网络作为分词词典和知识库,用概念之间的联系和匹配取代传统的字面匹配,克服了仅局限于表面形式的缺陷;把自然语言处理从目前基于关键词层面提高到基于知识的层面,从而在概念层次上理解文献主题,突破了传统的关键词匹配的局限,在一定程度上解决了词汇差异问题。该方法能对自然语言进行某种程度的语义理解,利用领域知识来实现主题词的规范标引。实验表明,采用本方法对测试文档进行主题词提取的准确率可达到71.03%,与传统方法相比提高了近1.87倍。To meet the requirement of information times development and to improve the accuracy of extracting automatic thematic words from Chinese texts we provide an algorithm model from Chinese text thematic words extraction based on semantic. It constructs concept semantic network as dictionary and knowledge base by combining domain background knowledge and substitutes concept matching for traditional literal mating. It understands the Chinese texts subject from concept level and overcomes the limitation of literal matching and enhances the natural language processing from keyword level to knowledge level. And it solves the vocabulary difference problem to certain extent. The method can understand natural language in semantic to certain extent. Standardizing thematic words achieved by using domain knowledge. Results of experiments show that the approach gains accuracy of 71.03%. in thematic words extraction from test document and it increases about 1.87 times comparing with traditional approach.

关键词：自然语言处理主题词提取概念语义网络

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义的汉语文献主题词提取算法研究被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义的汉语文献主题词提取算法研究 被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于语义的汉语文献主题词提取算法研究被引量：16