基于属性偏差控制的大数据挖掘方法研究  被引量:1

在线阅读下载全文

作  者:王茜[1] 孟翔鹏 李文举 张云鹏[1] 

机构地区:[1]沈阳飞机工业(集团)有限公司,沈阳110000

出  处:《科技创新与应用》2023年第12期63-66,共4页Technology Innovation and Application

摘  要:信息化时代为人类提供十分丰富的数据信息,以供人们在生产和生活中加以选择和使用。但是海量数据导致挖掘过程困难,耗费更多的时间、导致工作效率下降。为此,该文在传统决策树模型数据挖掘方法的基础上,将信息熵判断属性差异改进为利用信息熵增减偏差来判断属性差异。这种处理只保留和目标属性同向变化的属性,减少无效属性的参与。以客运飞机数据集合为对象展开挖掘实验,同时使用飞行记录情况、引擎情况和载客容量类别等属性。实验结果表明,与传统决策树模型数据挖掘算法相比,用该文提出的方法来构建的决策树更加精简,挖掘效率更高,执行速度更快。The information age provides human beings with a wealth of data and information for people to choose and use in production and life.However,the huge amount of data leads to the difficulty of the mining process,consumes more time and leads to the decline of work efficiency.Thus,based on the traditional data mining method of decision tree model,this paper improves the information entropy to judge the attribute difference using the increase and decrease of information entropy.This processing only retains attributes that change in the same direction as the target attribute,reducing the participation of invalid attributes.The data set of passenger aircraft is taken as the object to carry out mining experiments.At the same time,attributes such as flight records,engine conditions and passenger capacity categories are used.The experimental results show that,compared with the traditional decision tree model data mining algorithm,the decision tree constructed by the method proposed in this paper is more concise,more efficient and faster.

关 键 词:数据挖掘 决策树 偏差控制 客运飞机 多属性决策 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象