基于DOM和神经网络的网页净化应用  被引量:2

Application Research of Web Page Purification Based on DOM and Neural Network

在线阅读下载全文

作  者:李剑 

机构地区:[1]南昌陆军学院战斗实验室,江西南昌330103

出  处:《电子科技》2012年第1期105-107,共3页Electronic Science and Technology

摘  要:为能够高效地把网页中的噪音信息过滤掉,采用基于改进的DOM树和BP神经网络的网页净化方法。根据DOM树和网页内容的特征,用HTMLParser建立内容块树,把网页中的内容按照一定的相关性分割成多个子块,从而把整个内容块的处理简化为处理各个子块。由统计可知,子内容块的内容具有明显的数值特征,可以该特征作为BP神经网络的学习来源。这样可把网页的净化问题转化成通过学习建立过滤模型的问题。实验结果证明,该方法在有主题的中文网页应用上取得了理想的效果。In order to remove the noisy information existing in web pages effectively, this paper proposes a method of web page purification based on the improved DOM tree and BP neural network. The establishment of a block tree by the DOM tree and web content using HTMLParser can split the whole content into several sub-block trees according to their relations, thus simplifying the processing of the whole block into the processing of sub blocks. Statistic data shows that the content of the sub block has evident numerical characteristics, so the sub block can be used as the learning source of BP. In this way, the purification of web pages is converted into establishing a purifying model through learning. Experimental results show that this method can achieve satisfactory results in the application to Chinese web pages with themes.

关 键 词:网页净化 DOM树 内容块 神经网络 

分 类 号:TP393.07[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象