一种高效的相似性度量方法及其分类效果研究  被引量:7

An efficient similarity measurement method and its classification effect

在线阅读下载全文

作  者:袁慧 谭章禄[1] 王福浩 YUAN Hui;TAN ZhangLu;WANG FuHao(School of Management,China University of Mining and Technology(Beijing),Beijing 100083,China;School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)

机构地区:[1]中国矿业大学(北京)管理学院,北京100083 [2]华北电力大学控制与计算机工程学院,北京102206

出  处:《中国科学:技术科学》2022年第7期1096-1110,共15页Scientia Sinica(Technologica)

基  金:国家自然科学基金(批准号:61471362)资助项目。

摘  要:高维数据分类在统计分析中具有重要意义.然而分类方法由于所依赖的度量距离仍面临噪声敏感性强、计算量大及精度低等问题而导致分类效果不佳.针对高维时序相似性度量的精度及效率的不足,基于欧式距离提出一种改进的相似性度量方法并用于提升分类效果.首先,采用离散小波变换(DWT)对序列进行分解重构,提出局部高频DWT方法以达到降维消噪的目的.然后,在距离函数的基础上结合波幅和秩相关系数的概念,从相对偏差与波动趋势一致性角度进行改进.采用1-最近邻技术(1-NN),比较所提方法与动态时间规整(DTW)、FastDTW、最长公共子序列(LCSS)度量方法的性能.基于40个UCR时间序列数据集的实验结果表明,相对于DTW,FastDTW,LCSS度量方法,所提方法下的1-NN分类准确率更具有优越性,置信度不低于85%,同时证实了所提相似性搜索方法在准确率及速度上均得到显著改善.该结论丰富了相似性度量理论基础,对数据挖掘技术在智能系统管理、时间序列统计上的应用具有重要的参考价值.High-dimensional data classification is extremely significant in statistical analysis.However,the classification method still suffers from high noise sensitivity,a large calculation amount,and low accuracy because of the measurement distance it relies on,resulting in poor classification results.We propose an improved similarity measurement method,which we used to improve the classification effect,aiming to address the low accuracy and efficiency of high-dimensional time-series data classification.We propose a similarity search algorithm based on Euclidean distance and a 1-nearest neighbor(1-NN)classification technical framework.First,we used discrete wavelet transform(DWT)to decompose and reconstruct the sequence.Thereafter,we developed a local high-frequency DWT method to achieve dimensionality reduction and noise reduction.We combined the concepts of volatility and rank correlation coefficient based on the distance function and improved relative deviation and volatility trend consistency.The experimental results based on 40 UCR time-series datasets revealed that the 1-NN classification accuracy method proposed in this paper is superior to the dynamic time warping,FastDTW,and longest common subsequence measurement methods,with a confidence level of more than 85%.It also confirmed that the accuracy and speed of the 1-NN classification framework significantly improved.The findings of this research add to the theoretical basis of similarity measurement and serve as a valuable reference for data mining applications in intelligent system management and time-series statistics.

关 键 词:时间序列分析 相似性度量 离散小波变换 K-NN分类 数据挖掘 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] O211.61[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象