基于持久性内存的民航重复数据删除方法  

Method of civil aviation deduplication based on persistent memory

在线阅读下载全文

作  者:丁建立[1] 李慧[1] DING Jianli;LI Hui(Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学,天津300300

出  处:《现代电子技术》2022年第10期131-136,共6页Modern Electronics Technique

基  金:国家自然科学基金项目(U1833114)。

摘  要:针对民航数据在存储、容灾备份时存在数据量大、备份时间长的问题,文中改进基于机械硬盘的传统去重方法,提出一种基于持久性内存(PM)的民航重复数据删除方法。该方法根据民航数据长度小而数量多的特点,采用基于位置的内容比较重删法,首先采集文件数据块的指纹并提取指纹样本;然后利用持久性内存,根据指纹样本的ID定位文件位置;再匹配内容,判断是否需要进行二次细化分析;最后进行重删或备份。实验结果表明,与传统重删方法相比,文中优化方法在对民航数据库进行容灾备份时能够去除的重复数据占比约为98.08%,相较传统方法去重时间缩短1 2~2 3,所提方法能够提高去重效率,减少存储空间开销,使网络传输的带宽压力最小化。In allusion to the problems of large data volume and long backup time in the storage and disaster recovery backup of civil aviation data,the traditional deduplication method based on mechanical hard disk is improved,and a method of civil aviation deduplication based on persistent memory(PM)is proposed. The location-based content comparison deduplication method is adopted according to the characteristics of civil aviation data with small length and large amount. The fingerprint of the file data block is collected,the fingerprint sample is extracted,and then the persistent memory is used to locate the file location according to the ID of the fingerprint sample. The matching content is used to determine whether a secondary detailed analysis is required,and then the deduplication or backup is performed. The experimental results show that,in comparison with the traditional deduplication method,the proportion of duplicate data that can be removed by the optimization method in the disaster recovery backup of civil aviation database is about 98.08%,and the deduplication time is 1/2~2/3 shorter than the traditional method,which can increase the deduplication efficiency,reduce the storage space that data requires,and minimize the bandwidth pressure of network transmission.

关 键 词:民航数据 重复数据删除 持久性内存 文件定位 重复率阈值 容灾备份 

分 类 号:TN919-34[电子电信—通信与信息系统] TP311[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象