检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘耀[1] 童昕 陈一风 LIU Yao;TONG Xin;CHEN Yifeng(Information Technology Support Center,Institute of Scientific and Technical Information of China,Beijing 100038,China;School of Software and Microelectronics,Peking University,Beijing 102600,China)
机构地区:[1]中国科学技术信息研究所信息技术支持中心,北京100038 [2]北京大学软件与微电子学院,北京102600
出 处:《计算机应用》2023年第6期1768-1778,共11页journal of Computer Applications
基 金:国家社会科学基金资助项目(21BTQ011)。
摘 要:算法平台作为自动机器学习的实现方式近年来受到广泛关注,然而这些平台的业务流程均需要人工搭建,且这些平台存在模型调用不灵活以及无法针对特定业务定制化的自动算法构建的问题。针对这些问题,提出了一种面向业务需求的算法路径自组配模型。首先,基于图卷积网络(GCN)与word2vec表示对代码的序列特征与结构特征同时建模;然后,进一步通过聚类模型发现算法集合中的功能,并基于得到的功能子集为子集间算法组件的路径发现作准备;最后,基于先验知识训练得到关系发现模型与排序模型,挖掘候选代码组件的自组织路径,从而实现算法代码自组配。使用所提评价指标进行对比分析,所提模型的最好结果为0.8,而Okapi BM25+word2vec基线模型的最好结果为0.21。所提模型在一定程度上解决了传统代码表示方法中代码结构与语义信息缺失的问题,并为精细化算法流程自组织和算法管道自动构建的研究奠定了基础。The algorithm platform,as the implementation way of automatic machine learning,has attracted the wide attention in recent years.However,the business processes of these platforms need to be built manually,and these platforms are faced with inflexible model calling and the incapability of customized automatic algorithm construction for specific business requirements.To address these problems,an algorithm path self-assembling model for business requirements was proposed.Firstly,the sequence features and structural features of code were modeled simultaneously based on Graph Convolutional Network(GCN)and word2vec representation.Secondly,functions in the algorithm set were further discovered through a clustering model,and the obtained function subsets were used for the preparation of the path discovery of algorithm components between subsets.Finally,based on the relationship discovery model and ranking model trained with prior knowledge,the self-assembled paths of candidate code components were mined,thus realizing the algorithm code self-assembling.Using the proposed evaluation indicators for comparison and analysis,the best result of the proposed algorithm path self-assembling model is 0.8,while that of the baseline model Okapi BM25+word2vec is 0.21.To a certain extent,the proposed model solves the problem of missing code structure and semantic information in traditional code representation methods and lays the foundation for the research of refinement of algorithm process self-assembling and automatic construction of algorithm pipelines.
关 键 词:自然语言处理 排序学习 代码解析 代码资源结构化 代码表示
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147