维语网页中n-gram模型结合类不平衡SVM的不良文本过滤方法被引量：5

Reactionary text filtering method based on n-gram and class-unbalanced SVM for Uyghur webpages

作　　者：如先姑力·阿布都热西提[1] 亚森·艾则孜[1] 郭文强[2] Ruxianguli Abudurexiti;Yasen Aizezi;Guo Wenqiang(Dept.of Information Security,Engineering Xinjiang Police College,Urumqi 830013,China;School of Computer Science&Engineering,Xinjiang University of Finance&Economic,Urumqi 830013,China)

机构地区：[1]新疆警察学院信息安全工程系,乌鲁木齐830013 [2]新疆财经大学计算机科学与工程学院,乌鲁木齐830013

出　　处：《计算机应用研究》2019年第11期3410-3414,共5页Application Research of Computers

基　　金：国家自然科学基金资助项目(61762086);新疆维吾尔自治区高校科研计划项目(XJEDU2017M046);国家社会科学基金资助项目(13CFX055)

摘　　要：提出了一种结合n-gram统计模型和类不平衡支持向量机(SVM)分类器的维语文本过滤方法。首先,将网页文本进行预处理操作,通过n-gram统计模型来初步提取词干;然后,对词干进行语义分析,将具有相似含义的词干聚合为一类,以此降低词干维度;最后,在传统SVM中引入一个控制超平面之间距离的参数,构建一种类不平衡SVM,使其能够很好地分类具有非线性不可分和不平衡性的维吾尔语文本。实验结果表明,该方法能够准确分类出不良文本,且具有较短的分类时间。This paper proposed a Uyghur text filtering method combining n-gram statistical model and class-unbalanced support vector machine(SVM)classifier.Firstly,it preprocessed the webpage text,and extracted the stem initially by the n-gram statistical model.Then,it carried out the semantic analysis of the stems,and aggregated the stems with similar meanings into one class,thereby reducing the stem dimension.Finally,it introduced a parameter that controlled the distance between hyperplanes in the traditional SVM,and constructed a class-unbalanced SVM to classify Uyghur texts with nonlinear indivisibility and imbalance.The experimental results show that the method can accurately classify bad texts and has a shorter classification time.

关键词：维吾尔语网页不良文本过滤 n-gram词干提取类不平衡SVM

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

维语网页中n-gram模型结合类不平衡SVM的不良文本过滤方法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

维语网页中n-gram模型结合类不平衡SVM的不良文本过滤方法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

维语网页中n-gram模型结合类不平衡SVM的不良文本过滤方法被引量：5