检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张锦 胡子达 陆玟冰 杨定康 李强 罗元盛[2] ZHANG Jin;HU Zi-da;LU Wen-bing;YANG Ding-kang;LI Qiang;LUO Yuan-sheng(School of Information Science and Engineering,Hunan Normal University,Changsha 410006,China;School of Computer and Communication Engineering,Changsha University of Science&Technology,Changsha 410006,China)
机构地区:[1]湖南师范大学信息科学与工程学院,湖南长沙410006 [2]长沙理工大学计算机与通信工程学院,湖南长沙410006
出 处:《计算机技术与发展》2023年第10期143-149,共7页Computer Technology and Development
基 金:国防科技重点实验室基金项目(2021-KJWPDL-17);国防科工局国防基础科研计划(WDZC20205500119);湖南省自然科学基金(2021JJ30456)。
摘 要:Scratch作为图形化编程中的热门课程吸引了广大中小学生,而对于学生所做的作品与标准作品之间差异性的评定通常是靠教师通过人工对比检查,对于教师不仅工作量大且耗费巨大精力,因此对于Scratch作品相似性的识别就可以辅助教师快速检测学生作品,从而提高教学效率。针对该问题,提出Siamese-BERT模型对两个Scratch作品之间的相似度进行检测。首先,对Scratch源文件进行解析提取原始积木块序列,根据积木块逻辑特征提出一种积木块重构算法,将原始积木块序列排序成Token序列,将Token序列作为CBOW(Continuous Bag of Words)模型的输入文本进行预训练,从而得到Scratch的词向量模型;再使用Siamese神经网络框架结合BERT(Bidirectional Encoder Representation from Transformers)模型组合训练,最终输入到余弦相似度函数进行相似度计算。数据集来自于长沙市Scratch培训机构的培训作品和学生的练习作品,在该数据集上,Siamese-BERT模型准确度能达到0.82,对比其它的文本相似度模型,Siamese-BERT模型在Scratch作品相似度检测上更加准确。As a popular course in graphic programming,Scratch has attracted a large number of primary and secondary school students,and the evaluation of the difference between the projects made by students and the standard projects is usually made by the teacher through manual comparison and inspection,which is not only a heavy workload for teachers,but also a huge energy consumption.Therefore,the recognition of similarities in Scratch projects can assist teachers to quickly detect students'projects,thus improving teaching efficiency.To solve this problem,the Siamese-BERT model is proposed to detect the similarity between two Scratch projects.Firstly,the Scratch source file is analyzed to extract the sequence of original building blocks,and a building block reconstruction algorithm is proposed according to the logical characteristics of building blocks to sort the sequence of original building blocks into Token sequence.Token sequence is used as input text of CBOW(Continuous Bag of Words)model for pre-training,so as to obtain Scratch word vector model.Then,Siamese neural network framework is used for combined training with BERT(Bidirectional Encoder Representation from Transformers)model,and finally input into cosine similarity function for similarity calculation.The data set comes from the training projects of Scratch training institution in Changsha City and the practice projects of students.On this data set,the accuracy of Siamese-BERT model can reach 0.82.Compared with other text similarity models,the Siamese-BERT model is more accurate in the similarity detection of Scratch projects.
关 键 词:Scratch图形化编程 Siamese-BERT模型 连续词袋模型 Siamese神经网络 BERT模型 余弦相似度
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.139.234.41