检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张卓[1] 李石君[1] 张乃洲[1,2] 田建伟[1]
机构地区:[1]武汉大学计算机学院,武汉430079 [2]湖北大学知行学院计算机科学系,武汉430072
出 处:《模式识别与人工智能》2011年第1期130-137,共8页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金资助项目(No.60970018)
摘 要:将返回结果受限的Deep Web数据源中预测查询结果大小并且抽取的问题转化为概念覆盖问题.首先证明由属性及属性组合产生的集合划分之间为容差关系,进而又证明其构成一个完全格,并且与概念格同态.使用概念间的偏序关系来刻画属性间的相关性,使用概念内涵为查询属性,概念外延为返回结果的预测,基于外延的势剪枝后的概念格为搜索空间,最终提出一种基于格空间的Deep Web数据抽取算法.实验由可控实验和实际应用实验组成,结果证明该算法理论正确性和现实应用的可行性及有效性.In the situation of crawling Deep Web database that limits the number of results, the problem of appropriately predicting the results size of queries can be modeled as a set covering problem with condition of limited set size. This problem is modeled as a concept covering problem. Firstly, the relation among all couples composed by a query and its result is proved as tolerance. Secondly, set of them is proved as a complete lattice which is homomorphism to the concept lattice from the same source. Therefore, the order relation between concepts can be utilized to describe correlation between queries. The intent of a concept can be considered as a query, thus the result size is forecasted by cardinality of the concept extent. A lattice-based algorithm is proposed for data extraction from limited Deep Web database, called Ladeldew. Semi-lattice pruned based on the cardinality of extent is exploited by Ladeldew as search space. The new search space is iteratively generated from new data until nothing can be extracted. Both controlled and real experiments are implemented to evaluate Ladeldew, and the results verify its theoretical correction and realistic application.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.42