基于序列相似性计算的甲骨残片缀合算法  被引量:1

Oracle Bone Fragments Conjugation Based on Sequence Matching

在线阅读下载全文

作  者:张重生[1,2] 王斌[1] ZHANG Chong-sheng;WANG Bin(Henan Key Laboratory of Big Data Analysis and Processing,Henan University,Kaifeng,Henan 475001,China;Laboratory of the Yellow River Heritage,Henan University,Kaifeng,Henan 475001,China)

机构地区:[1]河南大学河南省大数据分析与处理重点实验室,河南开封475001 [2]河南大学黄河文化遗产实验室,河南开封475001

出  处:《电子学报》2023年第4期860-869,共10页Acta Electronica Sinica

基  金:科技部高端外国专家项目(No.G2021026016L)。

摘  要:甲骨残片缀合一直是甲骨学研究中最急迫最具基础性的工作,它使得甲骨残片经过拼接,复原为更加完整的原始材料.尽管前人及同行曾提出若干计算机辅助的甲骨缀合方法,但这些方法缀合准确度不足,未能真正投入使用,并不能真正帮助专家解决甲骨缀合问题,导致当前的甲骨缀合工作仍旧依靠人工、依旧费时费力.为了更好地研究甲骨残片的机器缀合问题,本文使用一个较大规模甲骨缀合基准数据集OB-Rejoin,该数据集包含了约一千幅甲骨拓片图像,且融入了大量的甲骨学界已缀成果,用于算法评估.基于该数据集,本文设计了一种基于斜率变化量序列匹配的甲骨缀合算法(Slope United Sequence Matching for Oracle Bone Fragments Conjugation,SUM),该方法将甲骨残片的断边碴口图像匹配问题转化为数值型的序列数据和序列相似性比对问题,以将尚不够非常精密的计算机视觉领域的碴口图像匹配问题转换为数据科学领域较为成熟的序列数据相似性匹配问题.SUM将数值型的碴口序列数据进一步转换为斜率变化量序列和字符序列数据,最后利用字符序列的模糊匹配完成甲骨残片的碴口匹配.在实验环节,SUM算法与经典的序列相似性计算方法在精确率、召回率、漏检率方面进行了对比,并与两个较新的基于深度学习的序列匹配算法和形状匹配算法进行了性能对比.整体而言,SUM在OB-Rejoin数据集上的Top-15缀合召回率达到了95.181%,超越了对比算法.重要出土文献的精准复原本身是历史学和古文字研究中客观存在的重大现实需求,具有重要的史学价值和意义,因此,本文的研究成果,不但有助于解决甲骨残片的机器缀合问题,还对秦汉简牍和敦煌遗书等重要出土文献的精准复原具有重要的参考价值.Rejoining the oracle bone fragments is an important prerequisite for the research of oracle bone inscrip-tions(OBI),which can restore the original appearance and content of the oracle bones.Though computer-aided oracle bone fragments conjugation solutions have been investigated for decades,they could not be applied in real-world OBI re-search,due to their unsatisfactory performance.Consequently,until today,OBI researchers still have to rejoin the oracle bone fragments manually.To solve this problem,we first introduce OB-Rejoin,a large-scale dataset with about one thou-sand oracle bone rubbings.It includes a large number of fragments that have already been rejoined by OBI experts,which are used as the ground-truth in experiments.Moreover,we propose the SUM(Slope United Sequence Matching)algorithm for oracle bone fragments conjugation,which transforms the challenging curve matching problem of the oracle bone frag-ments into the numerical sequence matching problem.SUM next transforms the sequence data into slope variation-based se-quence data and character sequences,and finally uses string matching algorithms for oracle bone fragments conjugation.We conduct comprehensive experiments to compare SUM with classic sequence matching methods,in terms of precision,recall,mis-rejoin rates.We also compare SUM with two very recent deep learning-based sequence matching and shape matching algorithms.All these experiments demonstrate the superiority of SUM over existing methods in oracle bone frag-ments conjugation,which achieves a Top-15 recall rate of 95.181%on OB-Rejoin.Overall,the recovery of unearthed docu-ments is an important real-world problem that has historical significance,this research work is therefore not only useful for rejoining the oracle bone fragments,but also has important reference value for the recovery of other unearthed documents,in particular the conjugation of fragmented bamboo strips and Dunhuang manuscripts.

关 键 词:甲骨文 甲骨缀合 序列相似性计算 形状匹配 边缘匹配 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象