检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:洪良怡 朱松林 王轶骏[1] 薛质[1] Hong Liangyi;Zhu Songlin;Wang Yijun;Xue Zhi(School of Electric Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;Nantong Public Security Bureau,Nantong 226001,Jiangsu,China)
机构地区:[1]上海交通大学电子信息与电气工程学院,上海200240 [2]江苏省南通市公安局,江苏南通226001
出 处:《计算机应用与软件》2023年第2期320-325,330,共7页Computer Applications and Software
基 金:国家重点研发计划项目“网络空间安全”重点专项(2016QY01W0202)。
摘 要:在海量暗网网页中筛选敏感主题内容对执法部门具有重要意义。通过对Freenet等暗网网页文本特点和类别进行深入分析,提出基于TextCNN的暗网网页主题分类模型。模型根据暗网网页非标准化的语言特点进行数据预处理;使用预训练的词向量获得网页内容的表示,通过不同大小的卷积核进行卷积操作获得特征图像,使用最大池化函数获得最终的特征向量;对卷积网络进行正则化处理,使用softmax函数预测类别概率。实验结果表明,采用该方法精确率为86.01%,召回率为78.97%,Macro-F1值为82.33%,高于机器学习模型,能够有效解决暗网网页分类问题。It is critical for law enforcement departments to extract contents of specific topic from enormous amount of darknet webpages.After in-depth analysis on webpage texts of Freenet and other darknets,a darknet webpage topics classification model based on TextCNN is proposed.The model preprocessed the data according to the non-standardized language characteristics of darknet webpages,and then represented webpage tokens with pretrained word embeddings.The feature image was obtained by convolution operation with convolution kernels of different sizes,and the final feature vector was obtained by using the maximum pooling function.The convolution network was regularized,and the category probability was predicted by using Softmax function.The experimental results show that the model achieves precision at 86.01%,recall score at 78.97%and Macro-F1 score at 82.33%,higher than machine learning models,which can effectively solve the classification problem of darknet webpages.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49