机构地区:[1]中国铁道科学研究院集团有限公司电子计算技术研究所,北京100081
出 处:《铁道科学与工程学报》2024年第9期3529-3539,共11页Journal of Railway Science and Engineering
基 金:中国国家铁路集团有限公司科技研究开发计划课题(K2022X028)。
摘 要:为解决因对铁路客运营销业务知识匮乏、客运营销数据库结构不熟悉及结构化查询语句使用不熟练等因素导致的铁路客运营销数据查询门槛高及营销数据利用率低的问题,提出一种基于NL2SQL的铁路客运营销数据智能交互模型。首先,基于铁路客运营销数据高频查询需求,建立了包含多张数据表,涉及大量铁路客运营销业务专业数据的实验库,并人工标记得到2000条常用结构化查询语句的实验数据。然后,使用收集到的铁路客运营销业务相关语料数据,利用P-tuning参数微调方法对Chinese-RoBerta-wwm-ext预训练模型进行了微调,实现了非结构化文本数据的数字化表达,从而建立了专精于铁路客运营销业务的动态词嵌入模型;接着,针对SQL语法的结构特点,基于双向长短期记忆网络建立了关键词预测、聚合运算符预测、算术运算符预测、逻辑运算符预测、排序预测、聚合条件预测和列预测等7个预测子模型,进而基于SQL各模块关联关系对7个子模型进行整合构建了SQL预测模型;最后,将该SQL预测模型作为微调后的Chinese-RoBerta-wwm-ext预训练模型的下游任务,构建了基于动态词嵌入和SQL抽象语法树的SQL预测模型,并利用由客运营销标记数据和CSpider数据集组成的混合数据集对该模型进行了训练和测试。经过对比实验和验证,该模型对标记的客运营销SQL数据预测的逻辑形式准确性达到68.4%,执行正确率达到75.9%,能够准确预测出要查询的列名、表名、操作符和条件等,相较于基于GLoVe固定词嵌入的模型(基准模型)和基于池化层参数微调的模型(对比模型)皆有较大的提升。该模型的应用对于进一步降低数据库使用门槛,更好地让数据服务于决策具有重要推动性作用。In order to solve the problems of high query threshold and low utilization rate in railway passenger transportation marketing data which caused by the lack of professional knowledge,unfamiliarity with the structure of database,and unable to use the structured query language.An intelligent interaction model for railway passenger transport data based on NL2SQL was proposed.First,an experimental database which contains multiple tables and involved lots of railway passenger transportation’s marketing data was established based on the high-frequency query requirements.And 2000 common query statements experimental data were marked manually.Next,the pre-trained model Chinese-RoBerta-wwm-ext was fine-tuned by the P-tuning technology based on the relevant corpus of railway passenger transportation marketing business.This realized the digital expression of unstructured text data,and established a dynamic word embedding model specialized in railway passenger transportation marketing business.And then,according to the structural characteristics of SQL syntax,seven sub models were established based on the bidirectional long short-term memory network,including keyword prediction,aggregation prediction,operation prediction,condition prediction,sorting prediction,aggregation condition prediction and column prediction.And the seven sub models were fused to an SQL prediction model based on the correlation relationships between these modules.At last,the SQL prediction model was used as a downstream task for the fine-tuned Chinese-RoBerta-wwm-ext model,which constructed a SQL prediction model based on dynamic word embedding and SQL abstract syntax tree.And the model was trained and tested by a mixed dataset which consists the labeled marketing data and the CSpider data.After experiments and verification,the logic form accuracy and execution accuracy of the model is 68.4%and 75.9%,respectively,which can predict the column names,table names,operators and conditions accurately.There is a significant accuracy improvement compared to t
关 键 词:智能交互 铁路客运 营销数据 Chinese-RoBerta-wwm-ext
分 类 号:U29-39[交通运输工程—交通运输规划与管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...