基于生物信息学中DNA分子序列模式匹配算法研究实现  

Research and Implementation of DNA Molecular Sequence Pattern Matching Algorithm Based on Bioinformatics

在线阅读下载全文

作  者:陈亭宇 尹国才[1] 魏国晟 

机构地区:[1]北华航天工业学院计算机学院,河北 廊坊

出  处:《计算机科学与应用》2023年第2期236-250,共15页Computer Science and Application

摘  要:生物信息学是融合先进的生物科学和计算机技术的一门综合运用数学、信息科学、计算机技术等对生物学、医学的信息进行科学的组织、整理和归纳的科学。DNA分子序列比对是生物信息学中最重要和最基础的研究方向之一,是探究基因与疾病关系的重要手段。本文研究的主要目标是在不确定的分子序列数据中找到所有与目标序列相同且出现概率大于给定阈值的序列,并给出目标序列总数及每个目标序列的起始位点。本文针对现有基于“空间换时间”的分子序列模式匹配算法仅限于次数的计算以及基于生物信息学中双DNA序列比对算法的图像立体匹配方法对于不确定的源数据具有局限性的问题,提出了一种基于加权后缀树的DNA分子序列模式匹配算法。该方法应用加权后缀树为主要数据结构,改进了不确定的源数据的匹配准确度,解决了map数据结构仅限于次数计算的问题,实验结果表明,本文提出的算法在匹配速度及灵敏度上有了一定的提高。Bioinformatics is a science that integrates advanced biological science and computer technology. It integrates mathematics, information science and computer technology to scientifically organize, sort out and conclude the information of biology and medicine. DNA sequence alignment is one of the most important and basic research directions in bioinformatics and an important means to explore the relationship between genes and diseases. The main objective of this paper is to find all sequences that are identical to the target sequence and whose occurrence probability is greater than the given threshold in the uncertain molecular sequence data and to give the total number of target sequences and the starting site of each target sequence. In this paper, a weighted suffix tree-based DNA sequence pattern matching algorithm is proposed to solve the problem that the existing molecular sequence pattern matching algorithm based on “space for time” is limited to the calculation of times, and the image stereo matching method based on the double DNA sequence alignment algorithm in bioinformatics is limited to uncertain source data. This method uses weighted suffix trees as the main data structure, improves the matching accuracy of uncertain source data, and solves the problem that map data structure is limited to number calculation. Experimental results show that the proposed algorithm has improved the matching speed and sensitivity to a certain extent.

关 键 词:生物信息学 分子序列 模式匹配 算法 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象