检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:祝斌 亓合媛[3] 马俊才[1,3] ZHU Bin;QI He-Yuan;MA Jun-Cai(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Institute of Microbiology,Chinese Academy of Sciences,Beijing 100101,China)
机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049 [3]中国科学院微生物研究所,北京100101
出 处:《计算机系统应用》2018年第9期163-169,共7页Computer Systems & Applications
基 金:国家高技术研究发展计划(863计划)(2014AA021501)~~
摘 要:在物种鉴定领域中,权威方法是基于BLAST的序列比对算法,然而该算法出现计算量过于庞大,运算效率低以及资源消耗较高等问题.为解决以上问题,本文借鉴经典文献中的K-String组份向量方法,对向量空间模型作出改进,将其应用于基于16S rRNA序列的物种鉴定领域,并在巴拿赫空间的理论体系下,对改进向量空间模型算法中的遗传距离公式进行等价替换,给出不同范数背景下对应的遗传距离公式,供科研人员参考.本文从计算效率和物种鉴定效果两个方面来判断改进算法的性能,最终得到如下结论:欧几里得空间下的内积范数从计算效率上较经典的blast算法具有显著优势,而其分类效果在检出率这一方面,达到了比对结果的一致性.In the field of species identification, the traditional algorithm is based on the BLAST method, which is regarded as the authoritative method, but the method has a series of problems such as complex calculating process, timeconsuming, as well as space-consuming. In this study, we propose an improved VSM algorithm based on K-String compositional vector method, and give the alternative norm-format formula in calculating the genetic distance between species in the Banach space for the reference of other scientific researchers. In this study, the computational efficiency and the result of the species identification are the two aspects to determine the properties of the improved method. The conclusion is that the calculating time of improved VSM algorithm based on 2-norm has decreased obviously than that of the BLAST algorithm, in addition, the result of classification demonstrates good consistence and convergence with the comparison result in terms of detection rate.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15