基于XML的电力营销数据智能抽取方法研究  被引量:3

Research on XML- Based Intelligent Extraction Method for Power Marketing Data

在线阅读下载全文

作  者:余向前 YU Xiangqian(State Grid Gansu Electric Power Company,Lanzhou 730030,China)

机构地区:[1]国网甘肃省电力公司,甘肃兰州730030

出  处:《自动化仪表》2023年第1期92-95,100,共5页Process Automation Instrumentation

摘  要:电力信息化的发展使得电力营销系统中的数据量不断增加,导致在数据抽取过程中的数据转换能力较差,从而造成抽取结果召回率偏高的情况。针对这一情况,利用可扩展标记语言(XML)的转换能力,设计了新的电力营销数据智能抽取方法。将电力营销数据规范为小范围数据链形式,并应用超文本敏感标题搜索(HITS)算法获取数据源。设定XML数据转换工具,利用XML定位描述符实现数据区域定位。在设定数据抽取规则与抽取内容的基础上,结合数据映射技术实现对电力营销数据的抽取。在性能测试过程中,将测试环境设定为平稳运行与数据入侵2种。通过对比结果可知,基于XML的抽取方法的召回率保持在7%以下,抽取耗时保持在800 ms以下,其值优于传统方法,充分证明了该方法的有效性。The development of power informatization has led to an increasing amount of data in the power marketing system, resulting in poor data conversion capability in the data extraction process, which causes a high recall rate of extraction results. To address this situation, a new intelligent extraction method for power marketing data is designed using the transformation capability of extensible markup language(XML). The power marketing data is standardized into the form of a small range of data chains and the hyperlink-induced topic search(HITS) algorithm is applied to obtain the data sources. XML data conversion tool is set, and XML location descriptors are used to realize data region location. Based on setting data extraction rules and extraction contents, the extraction of electricity marketing data is realized by combining data mapping technology. In the performance testing process, the testing environment is set to both smooth operation and data intrusion. The comparison results show that the recall rate of the XML-based extraction method is kept below 7% and the extraction elapsed time is kept below 800 ms, whose values are better than those of the traditional method, which fully proves the effectiveness of the method.

关 键 词:可扩展标记语言 电力营销数据 信息安全 数据抽取 数据转换 数据区域定位 抽取规则 数据映射 召回率 

分 类 号:TH-9[机械工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象