机构地区:[1]华东师范大学数据科学与工程学院,上海200062 [2]贵州大学计算机科学与技术学院,贵阳550025 [3]湖北大学计算机与信息工程学院,武汉430062
出 处:《计算机学报》2025年第3期632-649,共18页Chinese Journal of Computers
基 金:国家自然科学基金项目(62137001);上海市教委数字化转型创新研究项目(40400-22201)资助。
摘 要:基准测试是指通过设计科学的测试方法、测试工具和测试系统,实现对一类测试对象的某项性能指标进行定量的和可对比的测试。随着人工智能时代的到来,诸如ImageNet、DataPerf等这类新型的AI基准测试数据集逐步成为学术界和工业界的共识性标准。当前,关于开源生态的研究大多基于某一项具体的研究点展开分析而缺少对开源生态基准体系的构建,一个开源项目处于怎样的发展位置、企业开源程序办公能力处于行业什么位置、开发者活跃度、项目影响力等基础数据与评价,都是数据使用方迫切需要的开源领域知识。为了解决开源领域“有数据无基准”的局面,本文提出一种面向开源生态可持续发展的数据科学基准测试体系(OpenPerf)。该体系自下而上主要包含数据科学任务类基准、指数类基准以及标杆类基准,旨在为学术界、工业界提供不同的基准参考。本文定义了9个数据科学任务类基准,给出了3项典型的数据科学任务类基准测试结果、2项指数类基准以及1项标杆类基准,其中2项指数类基准被中国电子技术标准化研究院作为开源社区治理的评估标准。数据科学任务类基准主要应用于学术界,为不同研究方向的研究者提供自己擅长的研究领域的基准。指数类基准主要面向企业界,企业界可以通过影响力和活跃度等基准数据了解当前企业开源程序办公能力所处的行业位置以及旗下开源项目所处的发展位置。标杆类基准是一种可测量的业界最佳水平的成绩,用来比较参考尺度。最后,通过3个应用在阿里、蚂蚁以及华东师范大学等国内知名公司和高校的实际案例验证了OpenPerf在推动开源生态可持续发展中所起到的关键作用。Benchmarking refers to the quantitative and comparable evaluation of specific performance metrics for a category of test subjects,achieved through scientifically designed test methods,tools,and systems.With the advent of the artificial intelligence era,new AI benchmarking datasets,such as ImageNet and DataPerf,have gradually become consensus standards in both academia and industry.Currently,research on the open-source ecosystem largely focuses on specific research points,lacking a comprehensive framework for open-source ecosystem benchmarks.Data consumers in the open-source domain urgently need foundational metrics and evaluations,such as a project's development stage,an enterprise's open-source program capabilities within the industry,developer activity,and project influence.To address the"data-rich but benchmark-poor"situation in the open-source field,this paper proposes a data science benchmark system for the sustainable development of the open-source ecosystem,termed OpenPerf.This system adopts a bottom-up approach and primarily includes data science task-based,index-based,and benchmark-based categories,aiming to provide diverse benchmark references for academia and industry.This paper define nine task-based data science benchmarks,including open-source behavior data completion and prediction,automated open-source bot identification and classification,sentiment classification of open-source comment texts,risk prediction in open-source software supply chains,open-source project influence ranking,prediction of archived projects,open-source network influence index prediction,anomaly detection in open-source communities,and open-source project recommendation based on link prediction.We present results for three representative task-based benchmarks(open-source behavior data completion and prediction,automated opensource bot identification and classification,and open-source project recommendation based on link prediction),two index-based benchmarks(influence and activity),and one benchmark-based reference.Notably,t
关 键 词:基准测试 开源生态 可持续发展 基准任务 应用案例
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...