基于文本集密度的特征选择与权重计算方案被引量：8

Feature Selection and Weighting Scheme Based on Text Set Density

机构地区：[1]山东大学计算机科学与技术学院,山东济南250061 [2]山东轻工业学院物理系,山东济南250014

出　　处：《中文信息学报》2004年第1期42-47,共6页Journal of Chinese Information Processing

基　　金：山东省教育厅项目 (J0 0F0 4 )

摘　　要：在信息检索的向量空间模型中 ,文本被形式化表示为由词语权重组成的向量。因此如何让这种向量尽量准确的有效的表示出文本内容一直是该模型中的基础性问题。在这篇论文中 ,我们提出了一种基于文本集密度的特征词选择与权重计算方案的方法。它是一种使用词对文本集密度的贡献衡量该词的价值的方法。使用这种方法 ,我们能找出不损失文本有效信息的最小特征词语集 ,并且创造出更为合理权重计算方案。在文中还用了一种新的衡量权重好坏的标准———元打分法。In vector space model of information retrieval,a text is represented as a weighted vector which is composed of terms weighting of the text. And it is a fundamental issue to how to represent the content of a text as exactly and efficiently as possible. In this paper, we will propose a method of feature selection and weighting scheme based on text set density,which is a way of measure of contribution to the text set density about some word. By the means, we can find the set containing least elements, which can represent all valuable information of a text, and invent a more reasonable weighting scheme. And this paper presents a new measure standard of the sense of goodness of some weighting schemes: meta scoring. Through the criterion, it is proved that the approach helps.

关键词：计算机应用中文信息处理信息检索文本集密度权重计算方案元打分法

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本集密度的特征选择与权重计算方案被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本集密度的特征选择与权重计算方案 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于文本集密度的特征选择与权重计算方案被引量：8