检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王占兵 宋伟[1] 彭智勇[1] 杨先娣[1] 崔一辉[1] 申远 WANG Zhan-bing;SONG Wei;PENG Zhi-yong;YANG Xian-di;CUI Yi-hui;SHEN Yuan(School of Computer,Wuhan Universit)
出 处:《计算机科学》2018年第6期51-56,共6页Computer Science
基 金:国家自然科学基金(61232002;61572378)资助
摘 要:精准医疗是一种强烈依赖病人基因组分析结果的医疗模式,而子串检索是执行基因组分析的重要方法。近年来,基因数据的数据量急剧增长,其存储代价和处理复杂度已远超医疗方可承受的范围。于是,利用云服务提供商廉价的存储设备和强大的计算能力,将基因数据托管至云服务提供商成为切实可行的解决方案。考虑到云服务提供商并不完全可信,在数据上传至云端之前执行数据加密是保证数据安全性和隐私性的有效方法。然而,如何基于加密数据执行序列检索成为亟待解决的问题。针对这一问题,对基因数据处理和密文检索领域进行调研,提出采用q-gram技术对序列数据的定长窗口创建前缀签名的方案,并在执行查询时在每个窗口中完成前缀查询的解决方案。在子序列查询过程中,云端并不能获取用户数据明文。最后通过实验验证了所提方案具有较好的性能和存储开销,例如当窗口大小为100且q取6时,对100000长序列串执行构建索引耗时15.06s。与GPSE相比,所提方法的性能更优。Precision medicine is a medical model that relies heavily on patient genome analysis.The subsequence search plays an important role in performing genome analysis.Recently,the amount of genomic data are increasing dramatically,and the storage cost and processing complexity of them have been far beyond the capacity of hospitals.So,utilizing the powerful cloud computing capability to analyze and process such massive genomic sequence data is becoming popular.Considering that cloud service provider is not completely trusted,encrypting genomic data before uploading is a straightforward and effective solution to guarantee the privacy and security of DNA sequence data.However,how to perform queries over the encrypted genomic sequence data becomes another difficult problem.To address this problem,this paper made a detailed survey on genomic data processing and full-text retrieval fields.It constructed indexes on fixlength windows of the genomic sequence using q-gram mapping,and performed queries in every window.If the query sequence is the prefix of any window in genomic sequence,the query hits.Throughout all the processes,cloud service provider stores indexes and performs subsequence query,without obtaining any privacy details.Moreover,this paper set up the system model and several security assumptions,and proved their security.Experiments were carried out to evaluate the performance of scheme on a public dataset.The results show that the proposed solution achieves better performance in time cost and storage cost,i.e.when wis 100 and qis 6,the building index algorithm costs 15.60 sfor sequence of100000 length.Compared with GPSE,the proposed solution has higher execution efficiency in performing queries.
分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145