检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:季光 Wang Guiling Han Yanbo
机构地区:[1]Institute of Computing Technology,Chinese Academy of Sciences [2]Graduate University of Chinese Academy of Sciences [3]Research Center for Cloud Computing,North China University of Technology
出 处:《High Technology Letters》2013年第2期203-207,共5页高技术通讯(英文版)
基 金:Supported by the National High Technology Research and Development Programme of China(No.2009AA01 Z141);the National Natural Science Foundation of China(No.60573117);Beijing Natural Science Foundation(No.4131001)
摘 要:To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.To extract structured data from a web page with customized requirements, a user labels some DOM elements on the page with attribute names. The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts, and are also utilized to retrieve attribute values. To turn the attribute values into a structured result, the attribute pattern needs to be induced. For this purpose, a space-optimized suffix tree called attribute tree is built to transform the document object model (DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order. The pattern is induced bottom-up on the attribute tree, and is further used to build the structured result. Experiments are conducted and show high perform- ance of our approach in terms of precision, recall and structural correctness.
关 键 词:web data extraction structured data user labeling CUSTOMIZATION data service
分 类 号:TP393.092[自动化与计算机技术—计算机应用技术] P315.69[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.0.207