检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:柴成亮 李国良[1] 赵天宇 骆昱宇 于明鹤 CHAI Cheng-Liang;Li Guo-Liang;ZHAO Tian-Yu;LUO Yu-Yu;YU Ming-He(Department of Computer Science,Tsinghua University,Beijing 100084;Software College of Northeastern University,Shenyang 110167)
机构地区:[1]清华大学计算机系,北京100084 [2]东北大学软件学院,沈阳110167
出 处:《计算机学报》2020年第5期948-972,共25页Chinese Journal of Computers
基 金:国家自然科学基金(61632016,61925205);国家“九七三”重点基础研究发展计划项目基金(2015CB358700)资助。
摘 要:现如今,很多数据处理与分析的任务仅仅依靠机器算法难以达到理想的效果.因此,众包技术应运而生,其利用群体的智慧来解决对于计算机而言比较难的问题.其中,众包平台(例如Amazon Mechanical Turk)为众包技术的应用提供了有力的支撑.平台上有成千上万的网络大众来为任务发布者解决问题.然而,对于任务发布者而言,其与众包平台交互是不方便的,因为平台会要求任务发布者设置很多参数甚至书写代码.所以研究者们借鉴传统数据库的思想,提出了众包数据库的概念,其封装了任务发布者、众包平台以及众包工人之间的复杂交互过程,为发布者提供友好的API.使发布者可以通过简单的类SQL语言与平台交互.在这篇综述中,我们首先介绍众包的概念;然后介绍设计众包数据库时需考虑的一些基本技术,例如真值推理、任务分配,代价优化等;接着我们介绍几种主流的众包数据库系统.此外,我们会介绍对于不同的数据库算子,包括选择、连接、排序等优化技术.最后我们会介绍该领域未来的研究方向与挑战.Nowadays,many data management tasks cannot purely rely on machine-based algorithms to be resolved.Therefore,crowdsourcing has attracted the interest of many researchers,which leverages the crowd ability to address the problems that are hard for the computer.Thanks to crowdsourcing platforms,e.g.,Amazon Mechanical Turk,we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks.The technical difficulty of crowdsourcing is the complexity of interactions among the above three components,which makes the requesters hard to use and manage their tasks.For example,it is inconvenient for the requester to interact with the crowdsourcing platforms,which require the requesters to set parameters and write codes to display the tasks.Inspired by traditional DBMS,crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd.The challenges include how to easily use crowdsourcing platforms,how to design query optimization models to optimize crowdsourcing costs,quality and latency and how to support complex crowdsourcing operations.In this paper,we will survey a wide spectrum of existing studies on crowdsourcing database systems.We first give an overview of crowdsourcing,and then introduce the fundamental techniques in designing crowdsourcing databases,including truth inference,task assignment,cost control,etc.In this part,we focus on reviewing sophisticated techniques on improving quality,reducing cost and reducing latency.Next,we will illustrate several popular crowd-powered database systems,including Deco,Qurk,CrowdDB and CDB.We mainly discuss the query language,query optimization models and supporting operations in these databases.Moreover,we review techniques on designing different operators,including selection,join,sort,etc.In this part,we mainly focus on how to optimize the cost,quality and latency for these operators.Finally,we discuss the future works and challenges.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.104