混合相似性度量的仪表询价电子表格结构识别  

Hybrid similarity metric for instrument quotationspreadsheet structure recognition

在线阅读下载全文

作  者:徐传运 马莹丽 李刚 舒涛 李星光 XU Chuanyun;MA Yingli;LI Gang;SHU Tao;LI Xingguang(School of Artificial Intelligence,Chongqing University of Technology,Chongqing 401135,China;School of Computer and Information Science,Chongqing Normal University,Chongqing 401331,China)

机构地区:[1]重庆理工大学两江人工智能学院,重庆401135 [2]重庆师范大学计算机与信息科学学院,重庆401331

出  处:《重庆理工大学学报(自然科学)》2024年第1期150-159,共10页Journal of Chongqing University of Technology:Natural Science

基  金:重庆市巴南区科委项目(2020QC413);重庆市科委项目(cstc2020jscx-msxmX0086,cstc2019jscx-zdztzx0043);重庆市教委项目(KJQN202001137);重庆理工大学研究生创新项目(gzlcx20222137)。

摘  要:对仪表企业来说,快速高效地自动响应用户的询价请求,实现无人化询价,具有非常重要的意义。但不同用户提供的物料清单表无统一规范的格式,导致仪表企业只能获得半结构化的询价电子表格,无人化询价系统难以分析与理解。构建无人化询价系统的关键是准确地自动提取仪表参数,而提取参数的前提是正确理解表格结构。因此,以构建无人化询价系统为目标,研究仪表询价电子表格的结构识别,提出混合相似性度量表格结构识别方法(hybrid similarity metrics for table structure recognition, HSMTSR)。所提方法结合Levenshtein距离、Dice系数和单元格类型相似度(cell type similarity, TySim),根据单元格和行数据的相似度解析识别表格结构。同时,建立流量仪表电子表格数据集(flowmeter spreadsheet dataset, FSDS)研究分析仪表询价电子表格的结构,包括714个电子表格,8 574行数据。实际应用表明,所提方法可准确高效地自动识别多种复杂结构的仪表询价电子表格,并在多个评价指标上均取得较好效果。For instrumentation companies,it is of great significance to quickly and efficiently automate the response to users’request for quotation and to realize unmanned quotation.Nevertheless,there is no unified and standardized format for the bill of materials spreadsheets provided by different users,resulting in semi-structured quotation spreadsheets for instrumentation companies and creating difficulties for unmanned quotation systems to perform analysis.The key to building an unmanned quotation system is to accurately automate the extraction of meter parameters,which presupposes a proper understanding of the spreadsheet structure.Therefore,with the goal of building an unmanned quotation system,this paper studies the structure recognition of instrument quotation spreadsheets and proposes hybrid similarity metrics for table structure recognition(HSMTSR).With Levenshtein distance,Dice coefficient and cell type similarity(TySim),this approach identifies spreadsheet structures based on the similarity resolution of cell and row data.Meanwhile,flowmeter spreadsheet dataset(FSDS)is built to analyze the structure of meter quotation spreadsheet,including 714 spreadsheets with 8574 rows of data.Practical applications show the method accurately and efficiently automates the identification of multiple complex structures of instrument quotation spreadsheets,and achieves superior results in several evaluation metrics.

关 键 词:电子表格 结构识别 相似性度量 类型相似度 仪表询价 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象