SQL-to-text模型的组合泛化能力评估方法  

Combinatorial Generalization Ability Evaluation Method of SQL-to-text Model

在线阅读下载全文

作  者:陈琳 范元凯 何震瀛[1] 刘晓清 杨阳 汤路民 CHEN Lin;FAN Yuankai;HE Zhenying;LIU Xiaoqing;YANG Yang;TANG Lumin(School of Computer Science,Fudan University,Shanghai 200433,China;Transwarp Information Technology(Shanghai)Co.,Ltd.,Shanghai 200233,China)

机构地区:[1]复旦大学计算机科学技术学院,上海200433 [2]星环信息科技(上海)股份有限公司,上海200233

出  处:《计算机工程》2024年第3期326-335,共10页Computer Engineering

摘  要:数据库的结构化查询语言(SQL)到自然语言的翻译(SQL-to-text)能提高关系数据库的易用性。近年来该领域主要使用机器学习的方法进行研究并已取得一定进展,然而现有翻译模型的能力仍不足以投入实际应用。由于组合泛化能力是SQL-to-text模型在实际应用中提升翻译效果的必要能力,且目前缺少对此类模型组合泛化能力的研究,因此提出一种SQL-to-text模型的组合泛化能力评估方法。基于现有的SQL-to-text数据集生成大量SQL和对应的自然语言翻译(SQL-自然语言对),并按SQL-自然语言对所含SQL子句的个数将其划分为训练数据与测试数据,使测试数据中的SQL子句皆以不同的组合方式在训练数据中出现,从而得到可评估模型组合泛化能力的新数据集。评估结果表明,该方法对查询知识的使用程度较高,划分数据的方式更加合理,所得数据集符合评估组合泛化能力的需求且贴近模型的实际应用场景,受到原始数据集的限制程度更低,并证实现有模型的组合泛化能力仍需提升,其中针对SQL-to-text任务设计的关系感知图转换器模型组合泛化能力最弱,表明原有的SQL-to-text数据集对组合泛化能力的考察存在欠缺。Translating from Structured Query Language(SQL)to natural language can improve the usability of a database.Some progress is currently being made in this research,which mainly uses machine learning models.However,the capabilities of the existing translation models are still insufficient for practical applications.Because combinatorial generalization is a necessary ability for an SQL-to-text model to improve the translation effect in practical applications,and there is currently a lack of research on this ability for such models,a combination of SQL-to-text models is proposed as a generalization ability assessment method.This method generates a large amount of SQL and corresponding naturallanguage translations(referred to as SQL-natural language pairs)based on an existing SQL-to-text dataset.These SQLnatural language pairs are then divided into training and test data according to the number of SQL clauses they contain.Thus,the SQL clauses in the test data appear in the training data in different combinations,which produces a new data set that can be used to evaluate the generalization ability of the model combination.The evaluation results show that this method has a higher degree of query-knowledge use.It utilizes a more reasonable method to divide data,and the obtained data set meets the requirements for the evaluation of combinatorial generalization ability.It is close to the actual application scenario of the model,and is less restricted by the original data set.The combinatorial generalization ability of the existing models still needs to be further improved.Among them,the relationship-aware graph converter model designed for SQL-to-text tasks has the weakest combinatorial generalization ability,indicating that the original SQL-to-text data set is insufficient for the investigation of the combinatorial generalization ability.

关 键 词:结构化查询语言 组合泛化 机器翻译 数据库 长短期记忆模型 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象