基于集成学习与遗传算法的网络书写纹识别研究  被引量:2

Research of Online Writeprint Identification Based on Ensemble Learning and Genetic Algorithm

在线阅读下载全文

作  者:孙建文[1] 杨宗凯[1] 刘三(女牙)[1] 王佩[2] 

机构地区:[1]华中师范大学国家数字化学习工程技术研究中心,武汉430079 [2]武汉大学信息管理学院,武汉430072

出  处:《计算机科学》2011年第6期242-245,共4页Computer Science

基  金:国家863计划项目(2008AA01Z131);华中师范大学中央高校基本科研业务费项目(CCNU09A02006)资助

摘  要:N-gram字符是网络书写纹识别最有效的特征类型之一。针对其特征维数高、冗余特征多且无关特征少等特点,提出一种基于特征空间划分来构造集成学习分类器的网络书写纹识别方法。该方法首先根据一定的划分粒度,将初始特征集划分为等维度、无交叉的特征子集,然后基于每一个特征子集训练生成对应的基分类器(多元朴素贝叶斯),最后采用算术与几何平均相结合的融合策略完成集成学习分类器的构造。特征空间的划分(即特征子集的选择)采用遗传算法进行优化。实验在一个真实数据集上开展,其结果表明该方法有效地提高了网络书写纹的识别性能。Online writeprint identification is a technique to identify individuals based on textual identity cues people leave behind online messages.Character N-gram is one of the most effective approaches to identify writeprint according to previous research.To deal with the high dimensional and redundant feature problems and the property of each feature being valuable for the task of writeprint identification,an ensemble learning approach based on feature subspacing was proposed in this study.The essence of this method is to partition the features into distinct subsets.Firstly,the whole feature set is split into equally sized and disjoint subsets.Then each of them is used to train a base classifier using Multinomial Naive Bayes.Finally,these individual classifiers are aggregated to construct the ensemble via an appropriate combination rule which is a simple average of arithmetic mean and geometric mean.Additionally,genetic algorithm was used to optimize the feature subspacing(i.e.feature subsets selection).To examine the approach,experiment was conducted on a real world test bed.Performance results showed the proposed approach was quite effective and obtained a considerable improvement in accuracy compared with the benchmark technique in writeprint identification(Support Vector Machine).

关 键 词:网络书写纹 集成学习 遗传算法 特征子集 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象