检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吕鹤轩 黄山 艾力卡木·再比布拉 吴思衡 段晓东 Lv He-xuan;HUANG Shan;Alkam·Zabibul;WU Si-heng;DUAN Xiao-dong(College of Computer Science and Engineering,Dalian Minzu University,Dalian 116600;State Ethnic Affairs Commission Key Laboratory of Big Data Applied Technology,Dalian 116600;Dalian Key Laboratory of Digital Technology for National Culture,Dalian 116600,China)
机构地区:[1]大连民族大学计算机科学与工程学院,辽宁大连116600 [2]大数据应用技术国家民委重点实验室,辽宁大连116600 [3]大连市民族文化数字技术重点实验室,辽宁大连116600
出 处:《计算机工程与科学》2023年第2期237-245,共9页Computer Engineering & Science
基 金:国家重点研发计划(2018YFB1004402)。
摘 要:衡量大数据的数据挖掘性能有2个最重要的任务指标:一是实时性,二是准确性。流数据从数据产生到消息队列再通过数据源流入Flink进行计算,这个过程中因为网络传输速度不同,不同节点的计算性能不同等原因,流数据进入计算框架的先后顺序和数据产生的事件时间顺序会有局部乱序的现象。面对窗口作业的传统水位线机制在不确定乱序程度的流数据情况下无法同时兼顾作业结果的实时性和准确性。针对这个问题,建立了流数据微簇模型。通过局部乱序度算法,根据流数据微簇的流数据事件时间局部乱序程度计算出可以代表当前时刻流数据的乱序度。设计了水位线动态调整策略,使水位线根据流数据的乱序程度动态调整大小。最后,在Apache Flink框架中对基于事件时间窗口的水位线动态调整策略进行了实现。实验结果表明,弹性或不确定乱序流数据条件下,基于事件时间窗口的水位线动态调整策略可以有效地同时兼顾窗口作业的准确性和实时性。Two of the most important task metrics that measure data-mining performance specific to big data:one is real-time and the other is accuracy.The stream data flows from data generation to message queue and then into Flink through data source for calculation.In this process,due to different network transmission speed and different computing performance of different nodes,the sequence of stream data entering the computing framework and the time sequence of events generated by data will be partially out of order.The traditional watermark mechanism for window-facing operations cannot consider the real-time performance and accuracy of the operation results in the case of streaming data with uncertain out-of-order degree.To solve this problem,a stream data microcluster model is established.Based on the local out-of-order degree of stream data event time,the out-of-order degree of stream data representing the current moment is calculated by the local out-of-order degree algorithm.A dynamic watermark adjustment strategy is designed to adjust the watermark dynamically according to the degree of flow data disorder.Finally,the dynamic watermark adjustment strategy based on event time window is implemented in Apache Flink framework.Experimental results show that the dynamic watermark adjustment strategy based on event time window can effectively consider the accuracy and real-time performance of window operation under the condition of elastic or uncertain chaotic flow data.
关 键 词:Apache Flink 水位线 乱序流数据 事件时间
分 类 号:TP316.4[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.223.25