检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]湖南工业大学计算机与通信学院,湖南株洲412008
出 处:《计算机技术与发展》2013年第4期135-138,共4页Computer Technology and Development
基 金:湖南省国际合作基金项目(2011WK3032)
摘 要:为了提高文本信息过滤的效率,提出一种基于文本信息的三层过滤系统。系统分为横向二部分、纵向三层次的结构,在信息过滤时第一层采用基于IP、URL地址的过滤方式;第二层为关键词频与权重的统计,对信息标题、关键词及正文内容三部分分别计算统计值;第三层为内容特征分析过滤,同时引入分词、关键词权重计算、VSM与主题倾向分析技术,保证不良信息识别的高效与准确。实验表明系统具有较好的过滤效果,查全率和查准率明显优于KNN方法,在实时信息过滤时能及时阻止不良信息的传播。In order to improve the efficiency of text information filtering, a system of three-layer filtration based on text message is put forward. The system is divided into horizontal two parts and vertical three-tier structure, the first layer of information filtering is based on IP and URL address filtering, the second layer is based on the statistics of keyword frequency and weights, including information title, keywords and text content three parts to calculate the statistical value. The third layer is based on analysis of filter content features, while the split words, keywords weighting, VSM and theme tendency analysis is led in the system, to ensure the efficiency and accuracy of the bad information to identify. The experiments are shown that the system has a better filtering effect of the recall and precision significantly than the KNN method, timely to prevent the spread of bad information in real time information filtering.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.217