基于差异—相似矩阵的文本降维方法被引量：1

Dimensionality reduction for text document using difference-similitude matrix

出　　处：《计算机应用》2005年第8期1821-1823,共3页journal of Computer Applications

基　　金：国家自然科学基金资助项目(90204008)

摘　　要：由于文本文档数量多、词量大,形成的文档空间维度高,很多自动文本分类算法不能直接有效地发挥作用。基于差异—相似矩阵(DSM)的方法在很大程度上降低了文档空间的维度。已经分好类的文集经过预处理后被表示成特征项—文档矩阵,再转化为差异—相似矩阵,其中同类文档采用相似项描述,而异类文档则采用差异项描述。通过对差异—相似矩阵的处理,最终得到维度较低的文本特征集,并同时生成分类规则。实验说明,对于大规模文集,DSM方法能在保持良好的分类质量的同时,获得较高的属性降维率和样本降维率。Due to the huge amount of text documents and their vocabulary, document spaces are commonly of high dimensionality, and many automatical text categorization algorithms can not get their best performences directly. Difference-similitude Matrix-based (DSM) method reduces dimensionality to a great extend. Pre-classified collection is represented as a item-document matrix after preprocessing, then transmitted into a DSM, in which documents in the same classes are depicted with similitude while documents in different classes with difference. The method generates an item set of low dimensionality and a set of classification rules after dealing with the DSM. Results of experiments suggest that DSM-based method could achieve high attribute reduction degree and sample reduction degree with good classification quality.

关键词：文本分类维度消减差异-相似矩阵

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于差异—相似矩阵的文本降维方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于差异—相似矩阵的文本降维方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于差异—相似矩阵的文本降维方法被引量：1