众包数据库综述被引量：8

Crowd-Powered Database System:A Survey

作　　者：柴成亮李国良[1] 赵天宇骆昱宇于明鹤 CHAI Cheng-Liang;Li Guo-Liang;ZHAO Tian-Yu;LUO Yu-Yu;YU Ming-He(Department of Computer Science,Tsinghua University,Beijing 100084;Software College of Northeastern University,Shenyang 110167)

机构地区：[1]清华大学计算机系,北京100084 [2]东北大学软件学院,沈阳110167

出　　处：《计算机学报》2020年第5期948-972,共25页Chinese Journal of Computers

基　　金：国家自然科学基金(61632016,61925205);国家“九七三”重点基础研究发展计划项目基金(2015CB358700)资助。

摘　　要：现如今,很多数据处理与分析的任务仅仅依靠机器算法难以达到理想的效果.因此,众包技术应运而生,其利用群体的智慧来解决对于计算机而言比较难的问题.其中,众包平台(例如Amazon Mechanical Turk)为众包技术的应用提供了有力的支撑.平台上有成千上万的网络大众来为任务发布者解决问题.然而,对于任务发布者而言,其与众包平台交互是不方便的,因为平台会要求任务发布者设置很多参数甚至书写代码.所以研究者们借鉴传统数据库的思想,提出了众包数据库的概念,其封装了任务发布者、众包平台以及众包工人之间的复杂交互过程,为发布者提供友好的API.使发布者可以通过简单的类SQL语言与平台交互.在这篇综述中,我们首先介绍众包的概念;然后介绍设计众包数据库时需考虑的一些基本技术,例如真值推理、任务分配,代价优化等;接着我们介绍几种主流的众包数据库系统.此外,我们会介绍对于不同的数据库算子,包括选择、连接、排序等优化技术.最后我们会介绍该领域未来的研究方向与挑战.Nowadays,many data management tasks cannot purely rely on machine-based algorithms to be resolved.Therefore,crowdsourcing has attracted the interest of many researchers,which leverages the crowd ability to address the problems that are hard for the computer.Thanks to crowdsourcing platforms,e.g.,Amazon Mechanical Turk,we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks.The technical difficulty of crowdsourcing is the complexity of interactions among the above three components,which makes the requesters hard to use and manage their tasks.For example,it is inconvenient for the requester to interact with the crowdsourcing platforms,which require the requesters to set parameters and write codes to display the tasks.Inspired by traditional DBMS,crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd.The challenges include how to easily use crowdsourcing platforms,how to design query optimization models to optimize crowdsourcing costs,quality and latency and how to support complex crowdsourcing operations.In this paper,we will survey a wide spectrum of existing studies on crowdsourcing database systems.We first give an overview of crowdsourcing,and then introduce the fundamental techniques in designing crowdsourcing databases,including truth inference,task assignment,cost control,etc.In this part,we focus on reviewing sophisticated techniques on improving quality,reducing cost and reducing latency.Next,we will illustrate several popular crowd-powered database systems,including Deco,Qurk,CrowdDB and CDB.We mainly discuss the query language,query optimization models and supporting operations in these databases.Moreover,we review techniques on designing different operators,including selection,join,sort,etc.In this part,we mainly focus on how to optimize the cost,quality and latency for these operators.Finally,we discuss the future works and challenges.

关键词：数据库众包查询优化质量控制成本控制

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

众包数据库综述被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

众包数据库综述 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

众包数据库综述被引量：8