检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵青[1] 权文利 陈亚瑞[1,2] 崔辰州 樊东卫[3] ZHAO Qing;QUAN Wen-li;CHEN Ya-rui;CUI Chen-zhou;FAN Dong-wei(College of Arti cial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China;Engineering Research Center for Integration and Application of E-Learning Technology,Ministry of Education,Beijing 100039,China;National Astronomical Observatories,Chinese Academy of Science,Beijing 100101,China)
机构地区:[1]天津科技大学人工智能学院,天津300457 [2]数字化学习技术集成与应用教育部工程研究中心,北京100039 [3]中国科学院国家天文台,北京100101
出 处:《天文学进展》2024年第1期86-101,共16页Progress In Astronomy
基 金:国家自然科学基金(11803022,12273077);国家重点研发计划(2022YFF0711500);数字化学习技术集成与应用教育部工程研究中心创新基金(1221004)。
摘 要:时序重构是时域天文学中的一个重要数据处理步骤,也是拟合光变曲线、开展时域分析研究的基础。Hadoop、Spark这类MapReduce分布式模型在执行过程中分布式集群节点间的任务比较独立,需要跨节点的数据传输量较少。提出了非阻塞异步执行流程,每个分布式进程完全针对独立天区的数据进行连续处理,而分块边缘的新增天体导致的其他节点的新增证认任务延时批量追加,并且会根据各进程间的进度不同确定追加方式,保证证认计算没有遗漏,从而在提高并发效率的同时保证算法的精度。此外,对两表间的不同Join策略从理论和实验两个角度进行了研究并提出了免Join策略。最后通过基于Spark分布式框架的高效时序重构系统的设计完成了以上研究的验证。实验表明,与以往研究结果相比,该时序重构算法效率提升明显,为时域天文学中的天文时序数据分析的开展打下了良好的基础。Time series reconstruction is a crucial data processing step in time domain astronomy and serves as the foundation for fitting light curves and conducting time domain analysis.For many large-field time domain surveys,it is necessary to complete this computational process within a single exposure cycle.With the rapid increase in astronomical data,existing methods for astronomical data processing struggle to simultaneously meet the accuracy and efficiency requirements of time-series reconstruction.The memory-based computing general-purpose distributed framework,Spark,holds the potential to improve the efficiency of this process.However,applying Spark directly often encounters issues.MapReduce distributed models like Hadoop and Spark require relatively independent tasks among distributed cluster nodes and minimal data transfer across nodes during execution.Otherwise,frequent communication becomes an efficiency bottleneck for the application of the model.However,due to the presence of boundary problems in cross-matching,it is inevitable to transmit newly added data at the boundaries,severely restricting the concurrency of the model and reducing the acceleration ratio in practical parallel model applications.Therefore,we propose a non-blocking asynchronous execution flow,where each distributed process handles continuous processing exclusively for independent sky regions.The delayed batch appending of additional identification tasks from block-edge newly added celestial bodies in other nodes is determined based on the progress of each process.This ensures that identification calculations are not omitted,thereby improving concurrent efficiency while maintaining algorithm accuracy.Additionally,a research study was conducted on different join strategies between two tables,examining them from both theoretical and experimental perspectives.Furthermore,a join-free strategy was proposed.Finally,the design of an efficient time-series reconstruction system based on the Spark distributed framework validates the aforementioned re
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.106.206