搜索引擎中基于状态的Ajax动态网页提取研究被引量：9

ON EXTRACTING STATE-BASED AJAX DYNAMIC WEBPAGE IN SEARCH ENGINE

出　　处：《计算机应用与软件》2013年第7期217-220,共4页Computer Applications and Software

摘　　要：Ajax(Asynchronous JavaScript and XML)动态网页的提取是目前搜索引擎研究的热点和难点。在分析已有Ajax动态网页提取方法的局限后,针对使用最广泛的基于DOM(Document Object Model)树的提取方法存在空间浪费和信息丢失的问题,引入状态S的形式化定义,提出基于状态的页面元素、事件与函数绑定关系的提取算法AjaxCrawling,并说明算法提取得到的资源库在搜索引擎中的有效性。通过比较实验,得出AjaxCrawling具有保证提取到的信息的完整性和节约存储空间的优势。The extraction of Ajax （ Asynchronous JavaScript and XML） dynamic webpage is the focus and difficulty in search engine study at present. After analysing the limitations of exiting Ajax dyn^tmic webpage extracting methods and aiming at the problems of space waste and information loss in most widely used extraction method based on DOM （document object model） tree, we introduce the formal definition of state S and propose the AjaxCrawling--a state-based extraction algorithm for binding relations between the page elements, events and functions. The effectiveness of the resource library extracted by the algorithm in search engine is also explained in the paper. Through comparison experiment, it is concluded that the AjaxCrawling has the advantages of reducing the storage space and ensuring the integrity of the extracted information.

关键词：AJAX技术动态网页提取 DOM树状态

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

搜索引擎中基于状态的Ajax动态网页提取研究被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

搜索引擎中基于状态的Ajax动态网页提取研究 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

搜索引擎中基于状态的Ajax动态网页提取研究被引量：9