基于表示学习的高维光谱离群数据挖掘  被引量:1

High-dimensional Spectral with Outlier Data Mining based on Representation Learning

在线阅读下载全文

作  者:李林睿 常舒予 乔一鸣 LI Lin-rui;CHANG Shu-yu;QIAO Yi-ming(Nanjing University of Posts and Telecommucations,Nanjing 210023,China)

机构地区:[1]南京邮电大学,江苏南京210023

出  处:《电脑知识与技术》2021年第22期90-93,共4页Computer Knowledge and Technology

基  金:江苏省大学生创新创业训练计划项目(201910293065Y,SYB2019015)。

摘  要:LAMOST(郭守敬望远镜)提供了大量的天文光谱数据,而天体分类是天文学中得到广泛关注的问题,由于天体数量大,数据维度高,如何使用机器学习的方法对光谱进行处理,成为近些年的热点。针对天体分类问题,提出了HSODM(High-dimensional Spectral with Outlier Data Mining),这是一种改进的高维离群数据识别方法,其采用无监督学习方式,基于随机距离将大量高维光谱数据中的极少数未知天体或离群数据识别出来,便于后续天体分类、离群数据挖掘等相关处理。项目中运用数据预处理、主成分分析降维、长短期记忆神经网络模型建立与训练、参数调优、结果预测与分析,最终通过评估方法和数据可视化等手段对模型进行评价与展示。研究中提出的改进方法和优化的神经网络可以缩短训练时间,提高模型预测准确度。经过实验发现,改进方法对ROC(receiver operating characteristic)曲线面积、P-R曲线面积、F1分数和G-mean分数都有相应的提高。LAMOST(Large Sky Area Multi-Object Fiber Spectroscopy Telescope)Telescope provides a large amount of astronomical spectral data,and astronomical classification is a problem that has received widespread attention in astronomy.Due to the large number of celestial bodies and the high dimensionality of data,how to use machine learning methods to process spectra has become a problem in recent years.Hot spot.Aiming at the problem of celestial body classification,HSODM(High-dimensional Spectral with Outlier Data Mining)is proposed,which is an improved method for identifying high-dimensional outlier data.It uses an unsupervised learning method and combines a large number of high-dimensional spectral data based on random distance.A very small number of unknown celestial bodies or outlier data can be identified to facilitate subsequent celestial body classification,outlier data mining and other related processing.In the project,data preprocessing,principal component analysis and dimensionality reduction,long and short-term memory neural network model establishment and training,parameter tuning,result prediction and analysis are used in the project,and the model is finally evaluated and displayed by means of evaluation methods and data visualization.The improved method and optimized neural network proposed in the research can shorten the training time and improve the accuracy of model prediction.After experimentation,it is found that the improved method has corresponding improvement on ROC curve area,P-R curve area,F1 score and G-mean score.

关 键 词:表示学习 高维光谱 离群点检测 数据挖掘 分类 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象