基于Scratch作品相似度的检测研究

Research on Similarity Detection of Project Based on Scratch

作　　者：张锦胡子达陆玟冰杨定康李强罗元盛[2] ZHANG Jin;HU Zi-da;LU Wen-bing;YANG Ding-kang;LI Qiang;LUO Yuan-sheng(School of Information Science and Engineering,Hunan Normal University,Changsha 410006,China;School of Computer and Communication Engineering,Changsha University of Science&Technology,Changsha 410006,China)

机构地区：[1]湖南师范大学信息科学与工程学院,湖南长沙410006 [2]长沙理工大学计算机与通信工程学院,湖南长沙410006

出　　处：《计算机技术与发展》2023年第10期143-149,共7页Computer Technology and Development

基　　金：国防科技重点实验室基金项目(2021-KJWPDL-17);国防科工局国防基础科研计划(WDZC20205500119);湖南省自然科学基金(2021JJ30456)。

摘　　要：Scratch作为图形化编程中的热门课程吸引了广大中小学生,而对于学生所做的作品与标准作品之间差异性的评定通常是靠教师通过人工对比检查,对于教师不仅工作量大且耗费巨大精力,因此对于Scratch作品相似性的识别就可以辅助教师快速检测学生作品,从而提高教学效率。针对该问题,提出Siamese-BERT模型对两个Scratch作品之间的相似度进行检测。首先,对Scratch源文件进行解析提取原始积木块序列,根据积木块逻辑特征提出一种积木块重构算法,将原始积木块序列排序成Token序列,将Token序列作为CBOW(Continuous Bag of Words)模型的输入文本进行预训练,从而得到Scratch的词向量模型;再使用Siamese神经网络框架结合BERT(Bidirectional Encoder Representation from Transformers)模型组合训练,最终输入到余弦相似度函数进行相似度计算。数据集来自于长沙市Scratch培训机构的培训作品和学生的练习作品,在该数据集上,Siamese-BERT模型准确度能达到0.82,对比其它的文本相似度模型,Siamese-BERT模型在Scratch作品相似度检测上更加准确。As a popular course in graphic programming,Scratch has attracted a large number of primary and secondary school students,and the evaluation of the difference between the projects made by students and the standard projects is usually made by the teacher through manual comparison and inspection,which is not only a heavy workload for teachers,but also a huge energy consumption.Therefore,the recognition of similarities in Scratch projects can assist teachers to quickly detect students'projects,thus improving teaching efficiency.To solve this problem,the Siamese-BERT model is proposed to detect the similarity between two Scratch projects.Firstly,the Scratch source file is analyzed to extract the sequence of original building blocks,and a building block reconstruction algorithm is proposed according to the logical characteristics of building blocks to sort the sequence of original building blocks into Token sequence.Token sequence is used as input text of CBOW(Continuous Bag of Words)model for pre-training,so as to obtain Scratch word vector model.Then,Siamese neural network framework is used for combined training with BERT(Bidirectional Encoder Representation from Transformers)model,and finally input into cosine similarity function for similarity calculation.The data set comes from the training projects of Scratch training institution in Changsha City and the practice projects of students.On this data set,the accuracy of Siamese-BERT model can reach 0.82.Compared with other text similarity models,the Siamese-BERT model is more accurate in the similarity detection of Scratch projects.

关键词：Scratch图形化编程 Siamese-BERT模型连续词袋模型 Siamese神经网络 BERT模型余弦相似度

分类号：TP399[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Scratch作品相似度的检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Scratch作品相似度的检测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索