检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林鑫[1,2] 余华娟 闫奕臻 LIN Xin;YU HuaJuan;YAN YiZhen(School of Information Management,Central China Normal University,Wuhan 430079,P.R.China;Research Center for Data Governance and Intelligent Decision Making of Hubei Province,Wuhan 430079,P.R.China)
机构地区:[1]华中师范大学信息管理学院,武汉430079 [2]湖北省数据治理与智能决策研究中心,武汉430079
出 处:《数字图书馆论坛》2022年第9期28-35,共8页Digital Library Forum
基 金:国家社会科学基金青年项目“社会网络中基于用户认知结构的知识标注研究”(编号:17CTQ024)资助。
摘 要:复杂表格能够以简单、直观的方式描述数据,被广泛应用于各行各业,然而,复杂表格具有结构复杂、单元格类型多样、表格文档构成方式不一等问题,需要进行数据化处理后才能实现共享与复用。因此,本文构建一种基于无监督学习的单元格语义关系识别模型来实现复杂表格数据化,首先利用机器视觉技术实现复杂表格分割,然后基于表格结构和内容相似度识别同模板表格,在此基础上,结合表头单元格、说明性单元格、表体单元格3类单元格的取值、位置特点,设置启发式规则进行单元格语义关系的识别,最后通过实证研究验证本文的方法能够在复杂表格数据化中取得较高的准确率和召回率,具有可行性。Complex tables can describe data in a simple and intuitive way,and are widely used in all walks of life.However,complex tables have problems such as complex structures,diverse cell types,and different forms of table documents.They need to be data processed before they can be shared and reused.Therefore,this paper constructs a cell semantic relationship recognition model based on unsupervised learning to realize the digitization of complex tables.First,it uses machine vision technology to realize the segmentation of complex tables,and then recognizes the same template table based on the similarity of table structure and content.On this basis,heuristic rules are set to identify the semantic relationship of cells in combination with the value and location characteristics of header cells,illustrative cells and table body cells.Finally,the empirical research verifies that the method in this paper can achieve high accuracy and recall rate in complex table digitization,which is feasible.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.105.194