检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:魏芳芳 魏顺平[1] 睢世杰 WEI Fangfang;WEI Shunping;SUI Shijie(The Open University of China, Beijing 100039;Fujian Nanwei Software Co., Ltd, Fuzhou Fujian 350000)
机构地区:[1]国家开放大学,北京100039 [2]福建南威软件有限公司,福建福州350000
出 处:《天津电大学报》2019年第2期1-5,共5页Journal of Tianjin Radio and Television University
基 金:国家开放大学科研课题“面向在线教育的学习分析云平台的构建与应用”(课题批准号:G18F0023Y)成果
摘 要:文本作为一种占比80%的信息存储形式,对文本信息中重复数据的识别尤为关键,如何进行文本重复记录检测,检测文本之间是否存在抄袭现象,成为自然语言处理领域研究热点。以国家开放大学Moodle学习平台发帖重复记录检测的数据为依据,研究了文本类重复记录检测方法,通过以今日头条新闻为样本进行重复记录检测,算法准确率为93.1%,召回率为95.9%,验证了该方法的可行性。然后应用于Moodle学习平台发帖数据的平台内部、平台与外部数据重复记录检测,可有效发现重复发帖,为管理者和教师提供了有价值的反馈。As an information storage form, a text accounts for 80% of the total information. It is particularly important to recognize duplicate data in text information. How to detect duplicate records and detect whether there is plagiarism between texts has become a hot research topic in the field of natural language processing. Starting from duplicate record data of Moodle platform deployed by the Open University of China, this paper studies the detection method of duplicate records of text information. Through the detection of duplicate records of news source from Headlines as sample, the accuracy rate of the algorithm is 93.1% and the recall rate is 95.9%. The experiment verifies the feasibility of this method, and then applies it to the detection of duplicate records of internal, systematic and external data of Moodle platform. This method provides valuable feedback for managers and teachers.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.195