ßFA:一种基于向量指令集的高性能数据处理算法  被引量:2

ßFA:a high-performance data processing algorithm based on vector instruction set

在线阅读下载全文

作  者:杨嘉佳 关健 李正 于增明 姚旺君 Yang Jiajia;Guan Jian;Li Zheng;Yu Zengming;Yao Wangjun(The Sixth Research Institute of China Electronics Corporation,Beijing 100083,China)

机构地区:[1]中国电子信息产业集团有限公司第六研究所,北京100083

出  处:《电子技术应用》2024年第11期85-88,共4页Application of Electronic Technique

摘  要:正则表达式匹配技术在数据清洗、解析提取等数据处理任务方面发挥重大作用。然而,由于匹配过程中存在数据强依赖关系和内存访问不可预测等问题,造成匹配性能较低。针对此问题,提出一种基于向量指令集的高性能正则表达式数据处理算法,称之为ßFA:通过向量指令一次性从内存读出若干连续字符,并与最常被访问状态对应的非信任字符集进行向量匹配,利用内置函数定位首个非信任字符的位置,获得可直接跳过的字符数,从而实现匹配性能的加速。实验结果表明,ßFA算法的吞吐率优于原始DFA算法和αFA算法,是原始DFA算法的4.67~60倍以及ɑFA算法的4.37~7.82倍。Regular expression matching technology plays a significant role in data processing tasks such as data cleaning,pars‐ing,and extraction.However,due to issues such as strong data dependency and unpredictable memory access in the matching pro‐cess,the matching performance is relatively low.In response to this problem,this paper proposes a high-performance regular ex‐pression data processing algorithm based on vector instruction set,which is calledßFA.By using vector instructions to read a se‐quence of consecutive characters at once,and performing vector matching with the non-trusted character set corresponding to the most frequently accessed state,built-in functions can be utilized to find the position of the first non-trusted character,thus obtain‐ing the number of characters that can be skipped directly,thereby accelerating the matching performance.Experimental results show that the throughput of theßFA algorithm is superior to the original DFA algorithm and theαFA algorithm,being 4.67~60 times faster than the original DFA algorithm and 4.37~7.82 times faster than theαFA algorithm.

关 键 词:正则表达式匹配 向量指令集 高性能数据处理 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象