复杂表格数据化中的单元格语义关系识别研究

Research on Cell Semantic Relation Recognition in Complex Table Digitization

作　　者：林鑫[1,2] 余华娟闫奕臻 LIN Xin;YU HuaJuan;YAN YiZhen(School of Information Management,Central China Normal University,Wuhan 430079,P.R.China;Research Center for Data Governance and Intelligent Decision Making of Hubei Province,Wuhan 430079,P.R.China)

机构地区：[1]华中师范大学信息管理学院,武汉430079 [2]湖北省数据治理与智能决策研究中心,武汉430079

出　　处：《数字图书馆论坛》2022年第9期28-35,共8页Digital Library Forum

基　　金：国家社会科学基金青年项目“社会网络中基于用户认知结构的知识标注研究”(编号:17CTQ024)资助。

摘　　要：复杂表格能够以简单、直观的方式描述数据,被广泛应用于各行各业,然而,复杂表格具有结构复杂、单元格类型多样、表格文档构成方式不一等问题,需要进行数据化处理后才能实现共享与复用。因此,本文构建一种基于无监督学习的单元格语义关系识别模型来实现复杂表格数据化,首先利用机器视觉技术实现复杂表格分割,然后基于表格结构和内容相似度识别同模板表格,在此基础上,结合表头单元格、说明性单元格、表体单元格3类单元格的取值、位置特点,设置启发式规则进行单元格语义关系的识别,最后通过实证研究验证本文的方法能够在复杂表格数据化中取得较高的准确率和召回率,具有可行性。Complex tables can describe data in a simple and intuitive way,and are widely used in all walks of life.However,complex tables have problems such as complex structures,diverse cell types,and different forms of table documents.They need to be data processed before they can be shared and reused.Therefore,this paper constructs a cell semantic relationship recognition model based on unsupervised learning to realize the digitization of complex tables.First,it uses machine vision technology to realize the segmentation of complex tables,and then recognizes the same template table based on the similarity of table structure and content.On this basis,heuristic rules are set to identify the semantic relationship of cells in combination with the value and location characteristics of header cells,illustrative cells and table body cells.Finally,the empirical research verifies that the method in this paper can achieve high accuracy and recall rate in complex table digitization,which is feasible.

关键词：复杂表格语义关系表格数据化机器视觉

分类号：G202[文化科学—传播学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

复杂表格数据化中的单元格语义关系识别研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

复杂表格数据化中的单元格语义关系识别研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索