基于属性值序列图模型的deep Web新数据发现策略被引量：3

Deep Web new data discovery strategy based on the graph model of data attribute value lists

作　　者：鲜学丰[1,2,3] 崔志明[1,2] 赵朋朋[2] 方立刚[1,3] 杨元峰[1,3] 顾才东[1,3]

机构地区：[1]江苏省现代企业信息化应用支撑软件工程技术研发中心,江苏苏州215104 [2]苏州大学智能信息处理及应用研究所,江苏苏州215006 [3]苏州市职业大学计算机工程学院,江苏苏州215104

出　　处：《通信学报》2016年第3期20-32,共13页Journal on Communications

基　　金：国家自然科学基金资助项目(No.61440053;No.61472268;No.41201338);江苏省自然科学基金资助项目(No.BK2012164);苏州市科技计划基金资助项目(No.SYG201342;No.SYG201343;No.SS201344)~~

摘　　要：针对数据源新产生数据记录的增量爬取问题,提出了一种deep Web新数据发现策略,该策略采用一种新的属性值序列图模型表示deep Web数据源,将新数据发现问题转化为属性值序列图的遍历问题,该模型仅与数据相关,与现有查询关联图模型相比,具有更强的适应性和确定性,可适用于仅仅包含简单查询接口的deep Web数据源。在此模型的基础上,发现增长节点并预测其新数据发现能力;利用互信息计算节点之间的依赖关系,查询选择时尽可能地降低查询依赖带来的负面影响。该策略提高了新数据爬取的效率,实验结果表明,在相同资源约束前提下,该策略能使本地数据和远程数据保持最大化同步。A novel deep Web data discovery strategy was proposed for new generated data record in resources. In the approach, a new graph model of deep Web data attribute value lists was used to indicate the deep Web data source, an new data crawling task was transformed into a graph traversal process. This model was only related to the data, compared with the existing query-related graph model had better adaptability and certainty, applicable to contain only a simple query interface of deep Web data sources. Based on this model, which could discovery incremental nodes and predict new data mutual information was used to compute the dependencies between nodes. When the query selects, as much as possible to reduce the negative impact brought by the query-dependent. This strategy improves the data crawling efficiency. Experimental results show that this strategy could maximize the synchronization between local and remote data under the same restriction.

关键词：DEEP WEB 新数据发现数据获取

分类号：TP392[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于属性值序列图模型的deep Web新数据发现策略被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于属性值序列图模型的deep Web新数据发现策略 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于属性值序列图模型的deep Web新数据发现策略被引量：3