检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王晓笛[1] 祝娜[1] 白如江[1] 王效岳[1]
机构地区:[1]山东理工大学科技信息研究所
出 处:《图书情报工作》2014年第12期130-135,共6页Library and Information Service
基 金:国家社会科学基金项目"学术文献‘意抄’检测研究"(项目编号:12CTQ032);山东理工大学人文社会科学发展基金项目"Web信息检索与智能挖掘"研究成果之一
摘 要:利用语义角色标注技术对文献进行标注,以句子为最小单位进行文献的语义相似度检测。提取文献中所有词语的上位词,为每篇文献形成句子-词-语义角色-上位词四部图。语义相似的句子对比参照四部图确定,最终计算出两篇文献相似句子的Jaccard系数作为两篇文献的语义相似度。实验结果表明,所识别出的语义相似度较字粒度Jaccard系数法、词粒度Jaccard系数法、Winnowing Jaccard系数法等高出13%,然而受语料库限制,本方法还有很大的提升空间。In recent years, several academic misconducts have caught the attention of both the academic community and departments concerned which makes similarity detection a hot research point. To cope with semantic plagiarism, researchers begin to study the semantic information. This paper proposes a literature semantic similarity detection method based on semantic role labeling. First a paper is labeled using a SRL tool. Sentence granularity is used. Hypernyms were extracted using a semantic dictionary. Every paper is represented by a sentence-term-semantic role-hypernym 4-partite graph. Sentence comparison refers to the 4-partite graph. Jaccard coefficient is computed to represent the similarity between two papers. Due to the confinement of SRL tools, the result of semantic similarity detection is not agreeable. Even so it is still 13% higher than other methods.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249