检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:苏贵洋[1] 李建华[1] 马颖华[1] 李生红[1]
出 处:《上海交通大学学报》2004年第z1期76-79,共4页Journal of Shanghai Jiaotong University
基 金:国家高技术研究发展(863)项目资助(2001AA142160;2002AA145090)
摘 要:从不良信息中最为普及的中文色情文本过滤入手,用近邻法(KNN)算法构造对比了4种特征项选择的方法在中文色情文本中的应用,它们分别是字、词、标点符号和词性等特征.试验表明,中文色情文本不同特征项的选择对中文色情文本的分类效果起到了重要的作用.实验同时表明,所设计的过滤器在保证速度的前提下,很好地完成了基于内容的高精度过滤.Ill text filtering is one of the most important research areas in net content security. The technology of content based text filtering with high accuracy is pursued. A K-Nearest Neighbors (KNN) algorithm was used to find a high accurate way to distinguish erotic text. Various features of Chinese text are used, such as features of character, word, punctuation and part of speech. The results of experiments show that in text filtering different features used can lead to different precision, so it is important to select appropriate features for each ill text filtering application. The results of experiments also show that the filter designed gains a high accurate erotic text filtering based on text's content without the loss of high processing speed.
关 键 词:文本过滤 信息过滤 文本表示 向量空间模型 特征选择
分 类 号:TN915.08[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.75