RQIC:一种高效时序相似搜索算法  被引量:1

RQIC:An Efficient Similarity Searching Algorithm on Time Series

在线阅读下载全文

作  者:蒋涛[1] 冯玉才[1] 朱虹[1] 李国徽[1] 

机构地区:[1]华中科技大学计算机科学与技术学院,武汉430074

出  处:《计算机研究与发展》2009年第5期770-778,共9页Journal of Computer Research and Development

基  金:国家"八六三"高技术研究发展计划基金项目(2006AA01Z430;2007AA01Z309)~~

摘  要:索引大规模时序数据库是高效时序搜索中的关键问题.提出了一种新颖的索引方案RQI,它包括3种过滤策略:即first-k过滤、索引低边界和上边界以及三角不等式修剪.基本的思想为首先运用Haar小波变换计算每个时序的小波系数,利用前面的k个小波系数形成一个最小边界矩阵,以利用点过滤方法;然后将预先计算每个时序的低边界特征和上边界特征存放到索引当中;最后采用三角不等式来修剪不相似的序列并确保没有漏报.同时提出了一种新的低边界距离函数SLBS和聚类算法CSA.通过CSA可保持索引良好的聚类特征以提高点过滤方法的效率,从而引入了一种更好的算法RQIC.在合成数据集和实时数据集的大量对比实验表明,RQIC是有效的且具备较高的查询效率.Indexing large time series databases is crucial for efficient search of time series queries. An index scheme RQI (range query based on index) is introduced, which includes three filtering methods: first-k filtering, indexing lower bounding and upper bounding as well as triangle inequality pruning.The basic idea is calculating wavelet coefficients for each time series based on Haar wavelet transform.The first k coefficients are used to form a MBR (minimal bounding rectangle). Thus, the point filtering method can be used. In addition, lower bounding and upper bounding features of each time series are calculated in advance, which are stored into index structure. Finally, triangle inequality pruning method is adopted by calculating the distance between time series beforehand. Moreover, a new lower bounding distance function SLBS (symmetrical Iower bounding based on segment) and a novel clustering algorithm CSA (clustering based on segment approximation) are proposed. The aim is to further improve the search efficiency of point filtering method by keeping a good clustering trait of index structure, thereby obtaining a better algorithm RQIC (range query based on Index and clustering). A lot of compared experiments are conducted over both synthetic and real data sets for RQIC. The results show that RQIC provides perfect pruning ability and high search efficiency.

关 键 词:数据挖掘 算法 索引 聚类 时间序列 相似搜索 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象