机构地区:[1]广东工业大学计算机学院,广州510006 [2]暨南大学信息科学技术学院计算机科学系,广州510632
出 处:《计算机科学》2017年第10期193-202,共10页Computer Science
基 金:广东省自然科学基金(2016A030313084;2016A030313700;2014A030313374);中央高校基本科研业务费专项资金资助项目(21615438);广东省科技计划项目(2015B010128007)资助
摘 要:股票价格及趋势预测是金融智能研究的热门话题。一直以来,各种各样的信息源被不断尝试用于股价预测,例如基本经济特征、技术指标、网络舆情、财务公告、财政新闻、金融研报等。然而,此类研究大多数只使用一种或两种信息源,使用3种及以上信息源的极为少见。信息源越多意味着能够提供更加丰富的信息内容和更多不同的信息层面。但是由于各种信源的本质不同,其对股票市场的影响程度不同,因此将多种信源融合起来进行股价预测并非易事。此外,多信源也增加了维度灾难的风险。基于信息融合的目的,尝试同时利用基本经济特征、技术指标、网络舆情3种信息源来进行股价预测。具体做法:先对不同类型的信息源数据进行针对性的处理,使其形成统一的数据集,然后使用SVM分类器建立预测模型。实验结果表明,在选用线性核函数和考虑非交易日数据时,使用这3种信源组合的预测模型的预测效果要比使用单一信源或者两两组合的预测效果好。此外,在收集数据时发现,在非交易日(例如周末或停牌期)虽没有买卖但网络舆情剧增。因此,在实验数据中添加了非交易日的舆情情感数据,分类精准度有所提高。研究结果表明,基于多信源融合的股价预测虽然困难,但是在适当地选择特征和针对性地进行数据预处理后会有较好的预测效果。Predicting stock price movement is a hot topic in the financial intelligence field.So far,people have continuously attempted to use various data sources in the stock price prediction,such as fundamental economic features,technical indicators,Internet public opinions,financial announcements,financial news,financial research reports and so on.However,most of the previous studies use only one or two distinct data sources to build prediction models.Few of them take advantage of three or more sources simultaneously.Undoubtedly,if more sources are provided,people can extract richer information content and consider more information levels.But,since the natures of various sources are distinct,and they have different effects on the stock market,it is not easy to converge several sources in predicting stock price.In addition,multisources naturally increase the risk of suffering the curse of dimensionality.Based on the idea of information fusion,this paper attempted to use three distinct sources to predict the stock price movement.The three sources are fundamental economic features,technical indicators and Internet public opinions.Our method firstly collects various source data,then implements the specific data preprocessing to form a unified data set,and finally uses the SVM classifier to build prediction models.Experimental results show that the preformance of prediction model based on the three sources is better than those which use a single source,or sources in pairs,when the linear core function for the SVM classifier is chosen and the data in the non-trading days are added.Besides,when collecting data,we found that the number of Internet public opinions rose sharply,although there were no transactions in the non-trading days(for example,weekends or the suspension period).Therefore,we added more text sentiment data showing the public opinions in the non-trading days and found that the prediction accuracy is improved.The study in this paper shows that although it is difficult to integrate multisources in the stock p
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...