用于中文色情文本过滤的近邻法构造算法  被引量:6

A KNN Algorithm on Chinese Erotic Text Filtering

在线阅读下载全文

作  者:苏贵洋[1] 李建华[1] 马颖华[1] 李生红[1] 

机构地区:[1]上海交通大学信息安全学院,上海200030

出  处:《上海交通大学学报》2004年第z1期76-79,共4页Journal of Shanghai Jiaotong University

基  金:国家高技术研究发展(863)项目资助(2001AA142160;2002AA145090)

摘  要:从不良信息中最为普及的中文色情文本过滤入手,用近邻法(KNN)算法构造对比了4种特征项选择的方法在中文色情文本中的应用,它们分别是字、词、标点符号和词性等特征.试验表明,中文色情文本不同特征项的选择对中文色情文本的分类效果起到了重要的作用.实验同时表明,所设计的过滤器在保证速度的前提下,很好地完成了基于内容的高精度过滤.Ill text filtering is one of the most important research areas in net content security. The technology of content based text filtering with high accuracy is pursued. A K-Nearest Neighbors (KNN) algorithm was used to find a high accurate way to distinguish erotic text. Various features of Chinese text are used, such as features of character, word, punctuation and part of speech. The results of experiments show that in text filtering different features used can lead to different precision, so it is important to select appropriate features for each ill text filtering application. The results of experiments also show that the filter designed gains a high accurate erotic text filtering based on text's content without the loss of high processing speed.

关 键 词:文本过滤 信息过滤 文本表示 向量空间模型 特征选择 

分 类 号:TN915.08[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象