低质古籍文献图像预处理方法研究  

Study on Image Preprocessing Methods for Low-Quality Ancient Books

在线阅读下载全文

作  者:高定国 李婧怡[1,2] 索朗曲珍 Gao Dingguo;Li Jingyi;Suoang-Quzhen(School of Information Science and Technology,Tibet University,Lhasa 850000,China;Tibetan Information Technology Innovative Talent Cultivation Demonstration Base,Tibet University,Lhasa 850000,China)

机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000 [2]西藏大学藏文信息技术创新人才培养示范基地,西藏拉萨850000

出  处:《高原科学研究》2024年第1期112-120,共9页Plateau Science Research

基  金:国家自然科学基金项目(62166038);四川省科技计划项目(2023YFQ0044)。

摘  要:敦煌藏文文献是研究唐代吐蕃社会历史的珍贵文献。目前在敦煌藏文文献数字化研究方面,由于文献年代久远、书写载体低劣、保存条件差等各方面的原因使得文档图像背景杂乱、文字模糊并残缺不全,严重影响了文本识别系统的准确性和鲁棒性。为了研究低质古籍文献图像的预处理对文字识别的影响,文章以古籍文献图像质量极差的敦煌藏文文献作为研究对象,分别采用对数变换、伽马变换、中值滤波变换、高斯滤波处理和PS人工批处理等传统方法,及全局阈值、自适应阈值和自定义阈值的二值化、基于神经网络ViT的图像增强方法对图像进行增强。对比实验表明,低质古籍图像预处理对文字识别率提升影响不大,但高斯滤波处理、自定义阈值的图像二值化和基于神经网络的图像数据增强对识别率提升有一定的促进作用。Dunhuang Tibetan literature is a precious document for the study of the social history of Tubo in the Tang Dynasty.At present,in the digital research of Dunhuang Tibetan literature,due to the age of the document,the document writing carrier,preservation conditions and other aspects of the reasons make the document image background messy,text fuzzy and incomplete,which seriously affects the accuracy and robustness of the text rec-ognition system.In order to study the influence of image preprocessing of low-quality ancient books on character recognition,this paper takes the Dunhuang Tibetan documents with extremely poor image quality as the research object and uses traditional methods such as logarithmic transformation,gamma transform,median filter trans-form,Gaussian filter processing,and PS manual batch processing to enhance the images,and adopts the binariza-tion of global threshold,adaptive threshold and custom threshold,and image enhancement based on neural net-work ViT.Comparative experiments show that the preprocessing of low-quality ancient book images has little im-pact on the improvement of the recognition rate,however,Gaussian filtering processing,custom threshold image binarization,and neural network-based image data enhancement have a certain effect on the improvement of the recognition rate.

关 键 词:古籍 敦煌文献 低质文档 预处理 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象