搜索引擎中基于状态的Ajax动态网页提取研究  被引量:9

ON EXTRACTING STATE-BASED AJAX DYNAMIC WEBPAGE IN SEARCH ENGINE

在线阅读下载全文

作  者:陈莉莉[1] 张丽[1] 刘正龙[1] 

机构地区:[1]四川托普信息技术职业学院计算机系,四川成都611743

出  处:《计算机应用与软件》2013年第7期217-220,共4页Computer Applications and Software

摘  要:Ajax(Asynchronous JavaScript and XML)动态网页的提取是目前搜索引擎研究的热点和难点。在分析已有Ajax动态网页提取方法的局限后,针对使用最广泛的基于DOM(Document Object Model)树的提取方法存在空间浪费和信息丢失的问题,引入状态S的形式化定义,提出基于状态的页面元素、事件与函数绑定关系的提取算法AjaxCrawling,并说明算法提取得到的资源库在搜索引擎中的有效性。通过比较实验,得出AjaxCrawling具有保证提取到的信息的完整性和节约存储空间的优势。The extraction of Ajax ( Asynchronous JavaScript and XML) dynamic webpage is the focus and difficulty in search engine study at present. After analysing the limitations of exiting Ajax dyn^tmic webpage extracting methods and aiming at the problems of space waste and information loss in most widely used extraction method based on DOM (document object model) tree, we introduce the formal definition of state S and propose the AjaxCrawling--a state-based extraction algorithm for binding relations between the page elements, events and functions. The effectiveness of the resource library extracted by the algorithm in search engine is also explained in the paper. Through comparison experiment, it is concluded that the AjaxCrawling has the advantages of reducing the storage space and ensuring the integrity of the extracted information.

关 键 词:AJAX技术 动态网页 提取 DOM树 状态 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象