Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks  被引量:4

在线阅读下载全文

作  者:Sudhir Kumar Patnaik C.Narendra Babu Mukul Bhave 

机构地区:[1]Department of Computer Science and Engineering,M.S.Ramaiah University of Applied Sciences,Bangalore 560054,India [2]Gibraltar India Solutions LLP,Bangalore 560103,India [3]Department of Computer Science and Engineering,M.S.Ramaiah University of Applied Sciences,Bangalore 560054,India.

出  处:《Big Data Mining and Analytics》2021年第4期279-297,共19页大数据挖掘与分析(英文)

摘  要:Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained.

关 键 词:adaptive web scraping deep learning Long Short-Term Memory(LSTM) Web data extraction You only look once(Yolo) 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP391.41[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象