检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]浙江财经学院信息学院,杭州310018 [2]复旦大学计算机与信息技术系
出 处:《计算机研究与发展》2005年第12期2184-2191,共8页Journal of Computer Research and Development
基 金:国家"八六三"高技术研究发展计划基金项目(2002AA231011);上海市重大科技基金项目(02DJ14013)
摘 要:从异构生物数据源抽取数据,建立查询分析平台是目前研究的热点,而抽取过程会涉及大量相互依赖的元数据,充分利用这种依赖关系可降低维护工作量·基于正则表达式(RE)提出了ReDE抽取方法:通过围绕RE组建立分析树,设计了基于RE的关系数据库模式生成算法和通用抽取与组装算法,其特点是:RE是惟一的元数据,易于管理和维护·该方法奠定了生物数据库辅助设计工具和高自动化抽取工具的基础,已用于构建国内第1个整合的生物信息在线数据仓库·Extracting data from heterogeneous biological data sources to build a query and analysis platform for biological scientists is currently a hot research topic. In general, data extraction process concerns many interdependent metadata. Making full use of dependencies among metadata to generate one metadata from another can reduce metadata maintenance overhead. However, many data extraction methods overlook these dependencies and require much effort to construct and maintain many metadata. In this paper, a regular expression (RE) based method named as ReDE is proposed to avoid this drawback: by building a parse tree for RE groups, an RE-based algorithm for generating relational database scheme and a general data extraction and assembling algorithm are designed. The novelty is that the RE is the only necessary metadata whose management and maintenance are relatively easy. This method can serve as the basis for building a biological database design-aiding tool and a high automatic tool for data extraction, and has been applied to extract data for the first online integrated biological data warehouse of China.
关 键 词:生物数据源 数据抽取 元数据 正则表达式 抽取算法
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200