基于决策树和DFA的老挝文敏感信息过滤算法  被引量:4

LAOS SENSITIVE INFORMATION FILTERING ALGORITHM BASED ON DECISION TREE AND DFA

在线阅读下载全文

作  者:王艺皓 丁洪伟[1] 王丽清[1] 李波[1] 李浩[2] Wang Yihao;Ding Hongwei;Wang Liqing;Li Bo;Li Hao(School of Information Science and Engineering,Yunnan University,Kunming 650500,Yunnan,China;Office of Science and Technology,Yunnan University,Kunming 650500,Yunnan,China)

机构地区:[1]云南大学信息学院,云南昆明650500 [2]云南大学科技处,云南昆明650500

出  处:《计算机应用与软件》2022年第7期241-246,274,共7页Computer Applications and Software

基  金:国家自然科学基金项目(61862064,61461053,61461054);云南大学服务云南行动计划(C176240501007);省教育厅产业化扶持项目(2016CYH03)。

摘  要:通过对老挝文语言特点的分析,提出一种基于确定有穷自动机和决策树的老挝文敏感信息过滤算法。将老挝文进行词汇划分和编码化处理,合理地解决老挝文与汉语书写上的差异性以及计算机读取存储出现乱码的问题;结合决策树的特点,构建老挝文敏感信息决策树,该树不依赖于词典,且可以实现实时更新;基于确定有穷自动机模型实现了老挝文敏感信息的检测和过滤,同时也实现了实时报警。实验表明,该过滤算法针对老挝文有较高的工作效率,同时也取得了较好的查全率和查准率。This paper analyzes the characteristics of Lao language and proposes a Lao sensitive information filtering algorithm based on the deterministic finite automaton and decision tree. The vocabulary division and coding processing of Lao language reasonably solved the differences between Lao and Chinese writing and the problems of garbled codes in computer reading and storage. Combining the characteristics of the decision tree, we constructed a decision tree for Lao language sensitive information, which did not depend on the dictionaries and could be updated in real time. On the basis of the deterministic finite automaton model, the Lao language sensitive information was detected and filtered, and real-time alarm was also realized. Experiments show that the filtering algorithm has a higher working efficiency for Lao language, and has also achieved better recall and precision.

关 键 词:确定有穷自动机 决策树 敏感信息过滤 老挝文过滤 网络舆情 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象