检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙雪凯 蒋烈辉[1] SUN Xuekai;JIANG Liehui(State Key Laboratory of Mathematical Engineering andAdvanced Computing,PLA Information Engineering University,Zhengzhou 450001,China)
机构地区:[1]信息工程大学数学工程与先进计算国家重点实验室,郑州450001
出 处:《计算机科学》2023年第5期64-71,共8页Computer Science
摘 要:对代码进行分析研究具有很多的应用场景,例如代码抄袭检测、软件漏洞搜索等。随着人工智能的发展,神经网络技术被广泛应用于代码分析和研究。然而,现有的方法要么简单地将代码视为普通的自然语言处理,要么使用太过复杂的规则对代码进行采样,前者的处理方式容易造成代码关键信息的丢失,而后者会造成算法过于复杂,模型的训练需要花费较长的时间。Alon等提出了一种名为Code2vec的算法,该算法采用了一种简单且有效的代码表示方法,相比之前的代码分析方法有着显著的优势,但Code2vec算法仍存在一些局限性。因此,在其基础上提出了一种基于神经网络的代码嵌入方法,该方法的主要思想是将代码函数表示为代码的嵌入向量。首先将一个代码函数分解为一系列抽象语法树路径,然后通过神经网络去学习如何表示每一条路径,最后将所有路径聚合成一个嵌入向量来表示当前的代码函数。文中实现了一个基于该方法的原型系统,实验结果表明,相比Code2vec,所提算法的结构更加简单、训练速度更快。There are many application scenarios for code analysis and research,such as code plagiarism detection and software vulnerability search.With the development of artificial intelligence,neural network technology has been widely used in code analysis and research.However,the existing methods either simply treat the code as ordinary natural language processing,or use much more complex rules to sample the code.The former processing method is easy to cause the loss of key information of the code,while the latter can make the algorithm to be too complicated,and the training of the model will take a lot of time.Alon proposed an algorithm named Code2vec,which has significant advantages compared with previous code analysis methods.But the Code2vec still has some limitations.Therefore,a code embedding method based on neural network is proposed.The main idea of this method is to express the code function as the code embedding vector.First,a code function is decomposed into a series of abstract syntax tree paths,then a neural network is used to learn how to represent each path,and finally all paths are aggregated into an embedding vector to represent the current code function.A prototype system based on this method is implemented in this paper.Experimental results show that compared with Code2vec,the new algorithm has the advantages of simpler structure and faster training speed.
关 键 词:神经网络 代码嵌入 代码分析 抽象语法树 代码分类
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222