检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]四川托普信息技术职业学院计算机系,四川成都611743
出 处:《计算机应用与软件》2013年第7期217-220,共4页Computer Applications and Software
摘 要:Ajax(Asynchronous JavaScript and XML)动态网页的提取是目前搜索引擎研究的热点和难点。在分析已有Ajax动态网页提取方法的局限后,针对使用最广泛的基于DOM(Document Object Model)树的提取方法存在空间浪费和信息丢失的问题,引入状态S的形式化定义,提出基于状态的页面元素、事件与函数绑定关系的提取算法AjaxCrawling,并说明算法提取得到的资源库在搜索引擎中的有效性。通过比较实验,得出AjaxCrawling具有保证提取到的信息的完整性和节约存储空间的优势。The extraction of Ajax ( Asynchronous JavaScript and XML) dynamic webpage is the focus and difficulty in search engine study at present. After analysing the limitations of exiting Ajax dyn^tmic webpage extracting methods and aiming at the problems of space waste and information loss in most widely used extraction method based on DOM (document object model) tree, we introduce the formal definition of state S and propose the AjaxCrawling--a state-based extraction algorithm for binding relations between the page elements, events and functions. The effectiveness of the resource library extracted by the algorithm in search engine is also explained in the paper. Through comparison experiment, it is concluded that the AjaxCrawling has the advantages of reducing the storage space and ensuring the integrity of the extracted information.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147