一种基于状态预测的多线程数据过滤算法  

An accelerated regular expression matching algorithm based on multi-threading and state prediction

在线阅读下载全文

作  者:杨嘉佳 李正 郑儿 姚旺君 赵静 关健 Yang Jiajia;Li Zheng;Zheng Er;Yao Wangjun;Zhao Jing;Guan Jian(The Sixth Research Institute of China Electronics Corporation,Beijing 100083,China)

机构地区:[1]中国电子信息产业集团有限公司第六研究所,北京100083

出  处:《电子技术应用》2024年第12期87-91,共5页Application of Electronic Technique

摘  要:数据过滤算法在大数据处理领域有着重要的作用。基于正则表达式匹配技术的数据过滤算法凭借强大的特征表达能力适合于处理大规模复杂数据。然而,传统的正则表达式匹配过程为串行匹配,造成性能低,无法满足现代数据处理的需求。针对传统正则表达式匹配性能低的问题,提出一种基于多线程和状态预测的正则表达式加速匹配算法,称之为μFA:基于向量指令执行字符值比较,获取可直接跳过的信任字符数。同时,基于多线程加速和状态猜测技术,实现字符串的分段匹配处理,通过圈定字符危险区域,研判各分段最终匹配结果的正确性。实验结果表明,μFA算法的吞吐率是原始DFA算法的10.12~91.36倍、?FA算法的1.08~2.97倍。Data filtering algorithms play a crucial role in the field of big data processing.Data filtering algorithms based on regu-lar expression matching technology are suitable for processing large-scale complex data due to their powerful feature expression capabilities.However,the traditional regular expression matching process is serial matching,resulting in low performance that cannot meet the needs of modern data processing.To address the issue of low performance in traditional regular expression match-ing,an accelerated regular expression matching algorithm based on multithreading and state prediction is proposed,namedμFA.This algorithm performs character value comparison based on vector instructions to obtain the number of trusted characters that can be skipped directly.Simultaneously,it utilizes multithreading acceleration and state prediction techniques to achieve seg-mented matching processing of strings.By delimiting dangerous character regions,it determines the correctness of the final matching results for each segment.Experimental results show that the throughput is 10.12 to 91.36 times higher than the original DFA algorithm and 1.08 to 2.97 times higher than the BFA algorithm.

关 键 词:正则表达式匹配 状态预测 数据过滤 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象