检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李荣陆[1] 王建会[1] 陈晓云[1] 陶晓鹏[1] 胡运发[1]
机构地区:[1]复旦大学计算机与信息技术系,上海200433
出 处:《计算机研究与发展》2005年第1期94-101,共8页Journal of Computer Research and Development
基 金:国家自然科学基金项目(60173027)
摘 要:随着WWW的迅猛发展,文本分类成为处理和组织大量文档数据的关键技术.由于最大熵模型可以综合观察到各种相关或不相关的概率知识,对许多问题的处理都可以达到较好的结果.但是,将最大熵模型应用在文本分类中的研究却非常少,而使用最大熵模型进行中文文本分类的研究尚未见到.使用最大熵模型进行了中文文本分类.通过实验比较和分析了不同的中文文本特征生成方法、不同的特征数目,以及在使用平滑技术的情况下,基于最大熵模型的分类器的分类性能.并且将其和Bayes,KNN,SVM三种典型的文本分类器进行了比较,结果显示它的分类性能胜于Bayes方法,与KNN和SVM方法相当,表明这是一种非常有前途的文本分类方法.With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data. Maximum entropy model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. However, relatively little work has been done on applying maximum entropy model to text categorization problems. In addition, no previous work has focused on using maximum entropy model in classifying Chinese documents. Maximum entropy model is used for text categorization. Its categorization performance is compared and analyzed using different approaches for text feature generation, different number of feature and smoothing technique. Moreover, in experiments it is compared to Bayes, KNN and SVM, and it is shown that its performance is higher than Bayes and comparable with KNN and SVM. It is a promising technique for text categorization.
分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229