检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]福建工程学院,计算机与信息科学系,福建,福州,350014 福建工程学院,计算机与信息科学系,福建,福州,350014
出 处:《电脑知识与技术(过刊)》2007年第16期1125-1126,1169,共3页Computer Knowledge and Technology
摘 要:中心分类法性能高效,但需要大量的训练文档(已标识文档)来训练分类器以保证分类的正确性.而训练文档因需花费大量人力物力来分类而数量有限,同时,网络上存在着很多未标识文档.为此,对中心分类法进行改进,提出了ONUC和0FFUC算法,以弥补当训练文档不足时,中心分类法性能急剧下降的缺陷.考虑到中心分类法易受孤立点的影响,采取了去边处理.实验证明,与普通的中心分类法、其它半监督经典算法比较,在训练文档很少的情况下,该算法能获得较好的性能.Centroid-based Classification Algorithms is a high efficient class of Algorithms for Text Categorization.However,in order to obtain classification model well,it requires a number of labeled documents.in practical applications,labeled documents are often very sparse because manually labeling data is tedious and costly,while there are often abundant unlabeled documents.So,we propose OFFUC and ONUC algorithms to mend the matter that centroid-based classification algorithms degrade dramatically when the training data is scarce.Considering that the training data items that are far away from the center of its training category reduce the accuracy of classification.,we exclude them from consideration.Experiment results show that OFFUC and ONUC algorithms,proposed in this paper,can improve the performance of centroid-based Classification Algorithms and outperforms the generic semi-supervised methods when the the number of labeled text is very small.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30