检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张均胜[1] 孙晓平[2] 刘志辉[1] Zhang Junsheng;Sun Xiaoping;Liu Zhihui(Institute of Scientific and Technical Information of China,Beijing 100038;KL-IIP,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)
机构地区:[1]中国科学技术信息研究所,北京100038 [2]中国科学院计算技术研究所智能信息实验室,北京100190
出 处:《情报学报》2023年第1期59-73,共15页Journal of the China Society for Scientific and Technical Information
基 金:中国科学技术信息研究所创新研究基金项目“互联网虚假科技信息识别方法研究”(MS2021-05),“科技论文原创性与新颖性评估方法研究”(MS2022-05)。
摘 要:随着互联网虚假信息日益泛滥,自动识别虚假信息成为互联网信息治理的迫切需求。互联网上虚假信息伴随新事件不断产生,导致识别虚假信息的有监督统计机器学习模型需要不断更新迭代。每次迭代更新都需要构建新的训练集,以便新的虚假信息能在训练集中得以体现。为此,本研究提出一种动态迭代更新训练集构筑机器学习模型的虚假信息识别方法,设计基于核密度估计的迭代聚类方法对虚假信息数据集进行迭代聚类。在每一个自动得到的聚类中,按比例分别选取训练集样本和测试集样本构造分类器的训练样本集和测试样本集,使新产生事件的样本能够在训练集中得到体现。研究结果显示,基于核密度估计的迭代聚类方法划分数据集训练得到的虚假信息分类器,与随机划分数据集策略相比,能够显著提升虚假信息分类准确度。With increasing proliferation of misinformation on the Internet,automatic identification of misinformation has become an urgent need for information governance.Misinformation on the Internet is constantly generated with new events,thereby resulting in the need for iterations and updates in the machine learning model to identify such misinformation.A new training data set should be constructed for each iteration update,so that the new misinformation can be reflected in the training set.Therefore,this study proposes a misinformation recognition method of dynamically and iteratively updating the training set to build a machine learning model,and iteratively clustering the misinformation data set based on kernel density estimation.In each cluster,training set and test set samples are selected to construct the corresponding classifier training data set and test data set;this enables the samples of new events to be reflected in the training set.The experimental results show that the misinformation classifier trained by the iterative clustering method based on kernel density estimation can significantly improve the accuracy of false information classification compared with the random data set division strategy.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222