基于正交编码的大数据纯文本水印方法  被引量:7

Big Data Plain Text Watermarking Based on Orthogonal Coding

在线阅读下载全文

作  者:李兆璨 王利明[2] 葛思江 马多贺[2] 秦勃[1] LI Zhao-can;WANG Li-ming;GE Si-jiang;MA Duo-he;QIN Bo(College of Information Science and Engineering,Ocean University of China,Qingdao,Shandong 266100,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;College of Cyberspace Security,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国海洋大学信息科学与工程学院,山东青岛266100 [2]中国科学院信息工程研究所,北京100093 [3]中国科学院大学网络空间安全学院,北京100049

出  处:《计算机科学》2019年第12期148-154,共7页Computer Science

基  金:国家重点研发计划(2017YFB1010000)资助

摘  要:数据泄露是大数据应用面临的重要挑战之一。数字水印技术是实现数据追踪和版权保护的有效手段。当前的数字水印方法主要针对终端用户的多媒体文件流转场景,如图像、音视频等,缺少面向大数据环境的文本数据泄露防护的数字水印研究。文中提出了一种基于正交编码的大数据纯文本水印方法,该方法通过编码将明文水印转换为二进制字节流,设计基于行散列值和基于行序置换的正交编码水印方法。首先对二进制水印串分段,按照每行内容的散列值计算待嵌入水印段号,将对应水印段按照自定义规则转换为不可见字符串后嵌入到文本行末;再调整行序,使得每行内容的散列值与加入标志位的二进制水印串对应,以此将水印嵌入大数据纯文本中。水印提取方法为嵌入方法的逆过程。所提方法能够抵抗大数据环境下复杂数据行序变换运算等操作对水印的破坏,同时通过嵌入脆弱水印来达到文本篡改检测的效果。基于所提方法设计并实现了一个大数据纯文本水印系统,采用Spark分布式处理架构来解决海量文本的水印嵌入和提取性能问题,达到了对数据泄露快速追踪溯源的目的,提高了大数据的安全性。实验和理论分析证明,该方法具有较好的水印容量性能和良好的隐蔽性,同时能够抵御多种内容攻击;由于纯文本没有格式,格式攻击对该方法无效,其具有良好的鲁棒性。Data leakage is one of the biggest challenges for big data applications.Digital watermarking is an effective way for data tracking and copyright protection.However,the current digital watermarking method is mainly focus on multimedia file,such as images,audio and video files.There are little digital watermarking methods for data protection in the big data environment.Therefore,this paper proposed a plain text watermarking method based on orthogonal co-ding for big data.First,the plain text watermark is converted into a binary byte stream by coding.The orthogonal watermarking method based on row hash value and row-sequence permutation are designed.The binary watermark string is divided into segments and numbers.The watermark segment number to be embedded is calculated according to the hash value of each line of content,and the corresponding watermark segment is converted into an invisible string which is embedded to the end of line.Then,the line order is adjusted so that the hash value of each line corresponds to the binary watermark string with the flag added,which achieves the embedding of the watermark.Watermark extraction method is the inverse process of the embedding method.It can resist the destruction of watermark by operations such as replacement operation for row order in big data environment,and achieve the effect of text tampering detection by embedding fragile watermarks at the same time.Based on the proposed method,a big data watermarking system was designed and implemented.Spark was adopted to solve the problem of watermark embedding and extraction performance of massive texts,which can quickly trace the source of data leakage and improve the security of big data.Experimental and theoretical analysis prove that the proposed method has better watermark capacity performance and good concealment.At the same time,it has strong robustness since it can resist multiple content attacks and format attacks.

关 键 词:数字水印 纯文本 正交 大数据 追踪溯源 

分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象