符号序列多阶Markov分类  被引量:2

Classification of symbolic sequences with multi-order Markov model

在线阅读下载全文

作  者:程铃钫[1] 郭躬德[2] 陈黎飞[2] 

机构地区:[1]福建农林大学金山学院,福州350002 [2]福建师范大学数学与计算机科学学院,福州350117

出  处:《计算机应用》2017年第7期1977-1982,共6页journal of Computer Applications

基  金:国家自然科学基金资助项目(61672157)~~

摘  要:针对基于固定阶Markov链模型的方法不能充分利用不同阶次子序列结构特征的问题,提出一种基于多阶Markov模型的符号序列贝叶斯分类新方法。首先,建立了基于多阶次Markov模型的条件概率分布模型;其次,提出一种附后缀表的n-阶子序列后缀树结构和高效的树构造算法,该算法能够在扫描一遍序列集过程中建立多阶条件概率模型;最后,提出符号序列的贝叶斯分类器,其训练算法基于最大似然法学习不同阶次模型的权重,分类算法使用各阶次的加权条件概率进行贝叶斯分类预测。在三个应用领域实际序列集上进行了系列实验,结果表明:新分类器对模型阶数变化不敏感;与使用固定阶模型的支持向量机等现有方法相比,所提方法在基因序列与语音序列上可以取得40%以上的分类精度提升,且可输出符号序列Markov模型最优阶数参考值。To solve the problem that the existing methods based on the fixed-order Markov models cannot make full use of the structural features involved in the subsequences of different orders, a new Bayesian method based on the multi-order Markov model was proposed for symbohc sequences classification. First, a Conditional Probability Distribution (CPD) model was built based on the multi-order Markov model. Second, a suffix tree for n-order subsequences with efficient suffix-tables and its efficient construction algorithm were proposed, where the algorithm could be used to learn the multi-order CPD models by scanning once the sequence set. A Bayesian classifier was finally proposed for the classification task. The training algorithm was designed to learn the order-weights for the models of different orders based on the Maximum Likelihood (ML) method, while the classification algorithm was defined to carry out the Bayesian prediction using the weighted conditional probabilities of each order. A series of experiments were conducted on real-world sequence sets from three domains and the results demonstrate that the new classifier is insensitive to the predefined order change of the model. Compared with the existing methods such as the support vector machine using the fixed-order model, the proposed method can achieve more than 40% improvement on both gene sequences and speech sequences in terms of classification accuracy, yielding reference values for the optimal order of a Markov model on symbolic sequences.

关 键 词:符号序列 MARKOV链模型 多阶模型 贝叶斯分类 后缀树 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象