Unstructured Big Data Threat Intelligence Parallel Mining Algorithm  

在线阅读下载全文

作  者:Zhihua Li Xinye Yu Tao Wei Junhao Qian 

机构地区:[1]School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122,China [2]School of IoT Engineering,Jiangnan University,Wuxi 214122,China

出  处:《Big Data Mining and Analytics》2024年第2期531-546,共16页大数据挖掘与分析(英文)

摘  要:To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.

关 键 词:unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象