检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王艺皓 丁洪伟[1] 王丽清[1] 李波[1] 李浩[2] Wang Yihao;Ding Hongwei;Wang Liqing;Li Bo;Li Hao(School of Information Science and Engineering,Yunnan University,Kunming 650500,Yunnan,China;Office of Science and Technology,Yunnan University,Kunming 650500,Yunnan,China)
机构地区:[1]云南大学信息学院,云南昆明650500 [2]云南大学科技处,云南昆明650500
出 处:《计算机应用与软件》2022年第7期241-246,274,共7页Computer Applications and Software
基 金:国家自然科学基金项目(61862064,61461053,61461054);云南大学服务云南行动计划(C176240501007);省教育厅产业化扶持项目(2016CYH03)。
摘 要:通过对老挝文语言特点的分析,提出一种基于确定有穷自动机和决策树的老挝文敏感信息过滤算法。将老挝文进行词汇划分和编码化处理,合理地解决老挝文与汉语书写上的差异性以及计算机读取存储出现乱码的问题;结合决策树的特点,构建老挝文敏感信息决策树,该树不依赖于词典,且可以实现实时更新;基于确定有穷自动机模型实现了老挝文敏感信息的检测和过滤,同时也实现了实时报警。实验表明,该过滤算法针对老挝文有较高的工作效率,同时也取得了较好的查全率和查准率。This paper analyzes the characteristics of Lao language and proposes a Lao sensitive information filtering algorithm based on the deterministic finite automaton and decision tree. The vocabulary division and coding processing of Lao language reasonably solved the differences between Lao and Chinese writing and the problems of garbled codes in computer reading and storage. Combining the characteristics of the decision tree, we constructed a decision tree for Lao language sensitive information, which did not depend on the dictionaries and could be updated in real time. On the basis of the deterministic finite automaton model, the Lao language sensitive information was detected and filtered, and real-time alarm was also realized. Experiments show that the filtering algorithm has a higher working efficiency for Lao language, and has also achieved better recall and precision.
关 键 词:确定有穷自动机 决策树 敏感信息过滤 老挝文过滤 网络舆情
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.248.144