文本分类中结合评估函数的TEF-WA权值调整技术  被引量:26

A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization

在线阅读下载全文

作  者:唐焕玲[1,2] 孙建涛 陆玉昌[3] 

机构地区:[1]烟台职业学院计算机与信息工程系 [2]清华大学计算机科学与技术系 北京 100084 [3]清华大学计算机科学与技术系

出  处:《计算机研究与发展》2005年第1期47-53,共7页Journal of Computer Research and Development

基  金:国家自然科学基金重大项目(79990584)国家"九七三"重点基础研究发展规划基金项目(G1998030414)

摘  要:文本自动分类面临的难题之一是如何从高维的特征空间中选取对文本分类有效的特征,以适应文本分类算法并提高分类精度.针对这一问题,在分析比较特征选择和权值调整对文本分类精度和效率的影响后,提出了一种结合评估函数的TEF-WA权重调整技术,设计了一种新的权重函数,将特征评估函数蕴含到权值函数,按照特征对文本分类的辨别能力调整其在分类器中的贡献.实验结果证明了TEF-WA权值调整技术在提高分类精度和降低算法的时间复杂度方面都是有效的.Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.

关 键 词:向量空间模型(VSM) 特征选择 权重调整 特征评估函数 文本分类 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象