电力企业互联网舆情数据规格化存储系统设计  被引量:1

Design of standardized storage system for internet public opinion data in power enterprises

在线阅读下载全文

作  者:韩维 孙林檀 吕静贤 陈龙 彭渤 潘宝玉 HAN Wei;SUN Lin-tan;LV Jing-xian;CHEN Long;PENG Bo;PAN Bao-yu(State Grid Customer Service Center,Tianjin 300000,China;Tianjin Richsoft Electric Power Information Technology Co.,Ltd.,Tianjin 300000,China)

机构地区:[1]国家电网有限公司客户服务中心,天津300000 [2]天津市普迅电力信息技术有限公司,天津300000

出  处:《信息技术》2023年第8期160-164,共5页Information Technology

摘  要:为避免舆情焦点损伤电力企业形象,设计了电力企业互联网舆情数据规格化存储系统,实时采集、处理并存储公共舆情数据。利用基于TF-IDF改进算法的聚焦网络爬虫爬取原始数据层的舆情数据,构建正则表达式描述语义模型;应用基于负载权重的负载均衡算法的存储负载均衡机制,根据负载差值计算存储节点执行任务的概率,更新存储节点的负载,实现各存储节点负载均衡。实验结果表明:读取时间均值为72.9ms,写入时间均值为425.3ms,数据读写效率高,各存储节点负载均衡,对大规模数据的处理及存储性能更佳。In order to prevent the focus of public opinion from damaging the image of power enterprises,and make sure that the public opinion data could be collected,processed and stored in real time,the standardized storage system of Internet public opinion data of power enterprises is designed.Based on the Focused web crawler based on TF-IDF improved algorithm crawls the public opinion data of the original data layer,a regular expression description semantic model is construsted,the storage load balancing mechanism of load balancing algorithm is applied based on load weight to calculate the probability of storage nodes performing tasks according to the load difference,update the load of storage nodes,and realize the load balancing of each storage node.The experiment results show that the average reading time is 72.9ms and the average writing time is 425.3ms.The data reading and writing efficiency is high,the load of each storage node is balanced,and the processing and storage performance of large-scale data is more outstanding.

关 键 词:电力企业 规格化存储 网络爬虫 负载均衡 

分 类 号:TN711[电子电信—电路与系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象