基于信息丰富度的切碎中文文档自动拼接复原  被引量:5

Automatic Reconstruction of Cross-Cut Chinese Documents Using Information Quantity

在线阅读下载全文

作  者:赵波[1,2] 周宇[1] 张正宇[1,3] 那莹[1] 马廷淮[1] 

机构地区:[1]南京信息工程大学计算机与软件学院,南京210044 [2]北京大学计算机科学技术研究所,北京100080 [3]中国科学院计算技术研究所,北京100190

出  处:《计算机辅助设计与图形学学报》2015年第6期1039-1046,共8页Journal of Computer-Aided Design & Computer Graphics

基  金:国家自然科学基金(61173143);公益性行业(气象)科研专项(GYHY201506080)

摘  要:针对切碎中文文档的自动拼接复原中无法利用碎纸片形状特征的问题,提出一种基于内容信息丰富度的拼接算法.首先分析了基于汉字内容的碎纸片特征表达方式;在此基础上,提出从横纵2个方面进行碎纸片特征匹配度估计的方法;最后采用信息丰富度确定拼接次序,逐一高效地完成碎纸片的拼接.基于不同碎纸片数量的匹配实验结果表明,相对于传统方法,横纵特征匹配度估计方法分别提高了约4.73%,3.76%的准确度;自动拼接复原实验结果表明,相对于传统算法,基于信息丰富度拼接算法的错误率下降约18%,并大大降低了时间复杂度.Considering the lack of shape character in reconstruction of cross-cut Chinese documents, an in- formation quantity based automatic reconstruction algorithm is proposed in this paper. First, we analyze how to describe the feature of shreds based on Chinese characters. Then, a new evaluation method of feature matching is presented, which consists of horizontal and vertical two aspects. Finally, an automatic recon- struction algorithm is designed according to the orders which are decided by information quantity. Experi- ments on different scales of shreds show that the accuracy of proposed method is improved about 4.73% and 3.76% respectively on horizontal and vertical, compared with traditional methods. For automatic reconstruction of shreds, it indicates that proposed information quantity based automatic reconstruction algorithm decreases the error rate by 18% and the time complexity greatly, compared with traditional algorithms.

关 键 词:文档复原 中文文档 碎纸片 匹配度估计 信息丰富度 自动拼接算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象