改进的数据预处理算法及其应用  被引量:6

Improved Data Preprocessing Algorithm and Its Application

在线阅读下载全文

作  者:许必宵 陈升波 韩重阳[1] 马梦环 宫婧[1] 

机构地区:[1]南京邮电大学理学院,江苏南京210000

出  处:《计算机技术与发展》2015年第12期143-146,151,共5页Computer Technology and Development

基  金:国家自然科学基金资助项目(61373135);江苏省高校自然科学研究重大项目(12KJA52003);南京邮电大学大学生科技创新训练计划(STITP)(201410293023Z)

摘  要:聚类分析是数据挖掘领域一项重要的课题。针对重复数据与孤立数据的预处理可以优化聚类结果。重复数据处理方面,文中在传统的重复数据查找算法SNM的基础上加入了伸缩窗口与变化移动速度的思想,提高了查找的准确率与效率;孤立数据方面,文中提出基于层次聚类分簇搜寻算法,算法利用层次聚类将数据分成独立的簇再依次搜寻孤立点提高了查询速率,并加入恢复检验的部分恢复被误删的非孤立点提高查找的准确率。实验仿真中,首先抽取部分数据验证了改进后的数据预处理算法的准确性,然后将数据预处理算法用于处理移动用户消费数据后再对数据进行聚类分析,从而达到对客户的归属地信息识别的目的。实验结果表明,文中提出的预处理算法具有很高的准确率与效率。Clustering analysis is an important project in data mining. Data preprocessing for repeated data and isolated data can optimize the result of clustering. About repeated data processing, added the idea of elastic window and changeable movement speed in traditional SNM to improve the accuracy and efficiency of searching. About isolated data processing, proposed a searching algorithm based on hierar- chical clustering and searching in divided clusters. Algorithm utilizes hierarchical clustering to divide the data into several independent clusters and sequentially search isolated point to improve the query speed. Meanwhile, algorithm adds recovery partial to recover isolated points which are misestimated to improve the accuracy of searching. In the experiment part,first extract the partial data to verify the accu- racy of the data preprocessing algorithm, next applies the algorithm for processing data of a list of consumption of mobile customers. Then make use of processed data to cluster in order to identify customers' information on their hometown. The experimental results indicate that the preprocessing algorithm proposed is accurate and efficient.

关 键 词:数据预处理 SNM算法 层次聚类 聚类分析 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象