机构地区:[1]the Department of Mathematics, School of Science, Anhui Science and Technology University [2]Machine Learning and Systems Biology Laboratory, Tongji University
出 处:《Tsinghua Science and Technology》2013年第5期446-453,共8页清华大学学报(自然科学版(英文版)
基 金:supported by the Key Project from Education Department of Anhui Province (No.KJ2013A076);the PhD Programs Foundation of Ministry of Education of China (No.20120072110040);the National Natural Science Foundation of China (Nos.61133010,31071168,and 61005010);the China Postdoctoral Science Foundation (No.2012T50582)
摘 要:Numerical characterizations of DNA sequence can facilitate analysis of similar sequences. To visualize and compare different DNA sequences in less space, a novel descriptors extraction approach was proposed for numerical characterizations and similarity analysis of sequences. Initially, a transformation method was introduced to represent each DNA sequence with dinucleotide physicochemical property matrix. Then, based on the approximate joint diagonalization theory, an eigenvalue vector was extracted from each DNA sequence,which could be considered as descriptor of the DNA sequence. Moreover, similarity analyses were performed by calculating the pair-wise distances among the obtained eigenvalue vectors. The results show that the proposed approach can capture more sequence information, and can jointly analyze the information contained in all involved multiple sequences, rather than separately, whose effectiveness was demonstrated intuitively by constructing a dendrogram for the 15 beta-globin gene sequences.Numerical characterizations of DNA sequence can facilitate analysis of similar sequences. To visualize and compare different DNA sequences in less space, a novel descriptors extraction approach was proposed for numerical characterizations and similarity analysis of sequences. Initially, a transformation method was introduced to represent each DNA sequence with dinucleotide physicochemical property matrix. Then, based on the approximate joint diagonalization theory, an eigenvalue vector was extracted from each DNA sequence,which could be considered as descriptor of the DNA sequence. Moreover, similarity analyses were performed by calculating the pair-wise distances among the obtained eigenvalue vectors. The results show that the proposed approach can capture more sequence information, and can jointly analyze the information contained in all involved multiple sequences, rather than separately, whose effectiveness was demonstrated intuitively by constructing a dendrogram for the 15 beta-globin gene sequences.
关 键 词:descriptors approximate joint diagonalization dendrogram physicochemical property similarity analysis
分 类 号:Q523[生物学—生物化学] TN911.7[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...