基于SVM预测的金融主题爬虫被引量：7

Financial topical crawler based on SVM prediction

作　　者：陈黎[1] 李志蜀[1] 琚生根[1] 唐小棚[1] 梁时木[1] 韩国辉[1]

出　　处：《四川大学学报（自然科学版）》2010年第3期493-497,共5页Journal of Sichuan University(Natural Science Edition)

基　　金：四川省科技厅公益性研究计划项目(2008SZ0049)

摘　　要：随着Internet上信息的爆炸,利用通用搜索引擎检索用户相关的信息变得越来越困难,而主题爬虫成为WEB上检索主题相关信息的重要工具.目前大部分基于分类器预测的主题爬虫的训练数据是不同类别网页的内容,但是在实际预测过程只能根据父网页中的一些链接信息进行预测,所以造成主题爬虫的预测的准确率较低.本文使用SVM分类器对标注了类别的URL以及上下文和锚文本进行训练,并分别使用了DF和信息增益两种不同的特征选择方法进行特征筛选,对影响分类器的各种因素进行了实验对比,并对分类器进行了在线的实验.实验证明这种方法在实际预测过程中效率很高.With the rapid growth of information and the explosion of web pages from the World Wide Web, it gets harder for general crawlers to retrieve the information relevant to a user. Topical crawlers are becoming important tools to gather web pages on a specific topic. Training set of topical crawler based on classifier prediction comes from different kinds of Web contents, but most of classifier can pre- dict according to some links information of parent Web pages in actual condition. As being different kinds of information between training and testing, the accuracy of this kind of classifier is low. SVM classifier is used in this paper to train the contexts and anchors of URLs, and train different information from different character selection methods, the DF and information gain to contrast experiment results based on all sorts of factors which will impact on classifier. It can validate that there is of very high accu- racy in actual prediction when classifier being on-line experiments.

关键词：主题爬虫分类器支持向量机特征选择金融

分类号：TP391.12[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SVM预测的金融主题爬虫被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SVM预测的金融主题爬虫 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于SVM预测的金融主题爬虫被引量：7