基于B2B垂直搜索的网页信息抽取系统研究

Research on System of Web Information Extraction Based on B2B Vertical Search Engine

机构地区：[1]南海舰队司令部,广东湛江524001 [2]中国劳动关系学院,北京100048

出　　处：《计算机技术与发展》2013年第2期153-156,161,共5页Computer Technology and Development

基　　金：中央高校基本科研业务费专项基金项目(12zy019)

摘　　要：为了解决从网页中准确抽取产品信息这一B2B垂直搜索引擎的关键问题,以站点树为模型,首先分析了企业网站的结构特征,在此基础上构建了一个面向B2B垂直搜索引擎的网页信息抽取系统。该系统利用站点树在企业站点大量网页中识别出产品页,并进行去噪处理,然后使用基于规则的方法抽取产品页中包含的产品描述信息和参数信息。通过该系统抽取到的各类产品信息较为准确,且效率得到明显提高,适用于B2B垂直搜索引擎中对产品的描述、分类及搜索。To solve the problem of information extraction on web pages, which is one of the key technologies of B2B vertical search en- gine,taking website as model, structure of the corporation website is analyzed firstly,based on which a system of web information extrac- tion for B2B vertical search engine is constructed. The website tree is used in the system for identification and noise elimination of the product pages, and then description and parameter information of the products contained in product pages are extracted according to the rules. All kinds of information extracted accurately and efficiently by the system can be used for description, classification and searching of the products in B2B vertical search engine.

关键词：B2B垂直搜索网页信息抽取企业站点树去噪

分类号：TP393.09[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于B2B垂直搜索的网页信息抽取系统研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于B2B垂直搜索的网页信息抽取系统研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索