基于Hadoop平台的主题概念股票挖掘系统应用研究  

Application of Thematic Concept Stock Detecting System Based on Hadoop Platform

在线阅读下载全文

作  者:丁俊[1] DING Jun(College of Computer and Art,Anhui Technical College of Industry and Economy,Hefei,Anhui 230051,China)

机构地区:[1]安徽工业经济职业技术学院计算机与艺术学院,安徽合肥230051

出  处:《西昌学院学报(自然科学版)》2021年第2期82-88,共7页Journal of Xichang University(Natural Science Edition)

基  金:安徽省高校自然科学研究重点项目(KJ2019A1049);2020年安徽省级精品线下开放课程《WEB程序设计(JSP)》(2020kfkc130)。

摘  要:针对目前资本市场上快速挖掘某种主题概念股票的需求,提出了一种新思路,该思路以上市公司的核心题材、主营收入和资本运作3项数据为基础,进行主题概念相关指数的分析和计算,最终以此指数作为标准推荐主题概念相关股票,并开发了一套数据抓取程序和Web应用程序。数据抓取程序利用定时组件Quartz从各大财经网站抓取全体上市公司已公开的各类基本信息,存入分布式文件系统HDFS中;Web应用程序接收用户输入的查询关键字组合,系统利用抓取的数据集从公司收入、投资和核心概念3方面分析和计算出公司与用户需要查询的关键字组合的相关指数,最后汇总为总相关指数,总相关指数越高的公司,其相关度越高,相关度越高的公司越有可能就是用户想要查找的相关主题概念公司。通过这3方面的结合,在公司的过去和未来,在定性和定量等多个方面都进行了相关度的挖掘,从而计算出来的相关性将更加可靠、准确。In response to the demand of promptly detecting thematic concept stocksin the current capital market,this paper proposes a new approach which analyzes and calculates the correlated index of the theme concept based on the data of the core concept,main business income and capital operation of the listed companies.The outcome of the calculation provides a standard for selecting thematic concept stocks.This paper also develops a data capture program for catching various basic information from all listed companies and saving the data in the distributed file system HDFS with timing components Quartz,and a Web application program which receives the query keyword combination from users and figures out correlated index of the query keyword combination between the demand users of and that of companies in terms of the company’s income,investment and core concept.At last,the program aggregatesall related index into the total correlation index.The higher the total correlation index is,the higher the correlation degree is,the more likely the company is to be the related thematic concept company that users want to search for.Through the combination of the three aspects,correlative degree is determined by the past and future of the company through qualitative and quantitative assessments,therefore the calculation is more accurate and reliable.

关 键 词:数据抓取 HADOOP 主题概念 股票挖掘 相关指数 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] F831.51[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象