基于格空间的受限Deep Web数据抽取算法  被引量:3

Data Extraction from Limited Deep Web Based on Latticial Space

在线阅读下载全文

作  者:张卓[1] 李石君[1] 张乃洲[1,2] 田建伟[1] 

机构地区:[1]武汉大学计算机学院,武汉430079 [2]湖北大学知行学院计算机科学系,武汉430072

出  处:《模式识别与人工智能》2011年第1期130-137,共8页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金资助项目(No.60970018)

摘  要:将返回结果受限的Deep Web数据源中预测查询结果大小并且抽取的问题转化为概念覆盖问题.首先证明由属性及属性组合产生的集合划分之间为容差关系,进而又证明其构成一个完全格,并且与概念格同态.使用概念间的偏序关系来刻画属性间的相关性,使用概念内涵为查询属性,概念外延为返回结果的预测,基于外延的势剪枝后的概念格为搜索空间,最终提出一种基于格空间的Deep Web数据抽取算法.实验由可控实验和实际应用实验组成,结果证明该算法理论正确性和现实应用的可行性及有效性.In the situation of crawling Deep Web database that limits the number of results, the problem of appropriately predicting the results size of queries can be modeled as a set covering problem with condition of limited set size. This problem is modeled as a concept covering problem. Firstly, the relation among all couples composed by a query and its result is proved as tolerance. Secondly, set of them is proved as a complete lattice which is homomorphism to the concept lattice from the same source. Therefore, the order relation between concepts can be utilized to describe correlation between queries. The intent of a concept can be considered as a query, thus the result size is forecasted by cardinality of the concept extent. A lattice-based algorithm is proposed for data extraction from limited Deep Web database, called Ladeldew. Semi-lattice pruned based on the cardinality of extent is exploited by Ladeldew as search space. The new search space is iteratively generated from new data until nothing can be extracted. Both controlled and real experiments are implemented to evaluate Ladeldew, and the results verify its theoretical correction and realistic application.

关 键 词:数据抽取 容差关系 形式概念分析 概念格 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象