带有范例元组的交互式数据转换映射方法研究  

Research on Interactive Data Conversion Mapping Method with Example Tuples

在线阅读下载全文

作  者:李静 李贵[1] 李征宇[1] 韩子扬[1] 曹科研 

机构地区:[1]沈阳建筑大学,信息与控制工程学院,辽宁 沈阳

出  处:《数据挖掘》2021年第2期84-99,共16页Hans Journal of Data Mining

摘  要:模式映射是Web异构大数据集成的重要研究内容之一,通常包含实例层和模式层两方面的研究,本文的研究重点主要集中在模式层。要想在短时间内完全掌握这门技术并且加以运用,这对于那些不熟悉模式转换所涉及的转换语义和语言的非专家用户来说几乎是不可能的。因此,本文在已有的关于数据转换研究成果的基础之上提出了一个适用于非专家用户的交互式模式映射设计框架系统。首先,对由非专家用户提供的不完整的表达性较差的数据转换范例元组进行预处理。然后,再通过简单的用户交互递归地对初始范例元组的有效性进行布尔查询从而得到最终映射规则。其次,本文提出了两种探索所有数据转换映射空间的策略以满足任意用户范例元组。在探索过程中系统会根据与用户交互的结果来保留最适合用户需求的规则,并动态地剪枝搜索空间从而减少与用户交互的次数,本文实验采用来自中国土地市场网的数据集成转换来验证本文方法的有效性。Schema mapping is one of the important research contents of heterogeneous big data integration on Web, which usually includes two aspects: instance layer and schema layer. The focus of this paper is mainly on schema layer. It is almost impossible for non-expert users who are not familiar with the semantics and language involved in schema transformation to master this technology and apply it in a short time. Therefore, based on the existing research results on data conversion, this paper proposes an interactive schema mapping design framework system for non-expert users. Firstly, the incomplete data transformation paradigm tuples with poor expressiveness provided by non-expert users are preprocessed. Then, the validity of the initial example tuple is recursively queried by simple user interaction, and the final mapping rules are obtained. Secondly, this paper proposes two strategies to explore the mapping space of all data transformations to satisfy any user paradigm tuple. In the process of exploration, the system will keep the rules that are most suitable for users’ needs according to the results of interaction with users, and prune the search space dy-namically to reduce the number of interactions with users. In this experiment, the data integration transformation from China Land Market Network is used to verify the effectiveness of this method.

关 键 词:Web大数据 数据集成 数据转换 模式映射 布尔查询 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象