一种客户关系数据库相似重复记录清洗算法  被引量:3

A Cleaning Algorithm for Approximately Duplicated Records in Customer Relationship Database

在线阅读下载全文

作  者:郭文龙[1] 

机构地区:[1]福建江夏学院电子信息科学学院,福建福州350108

出  处:《衡水学院学报》2014年第1期15-17,共3页Journal of Hengshui University

基  金:福建省教育厅A类科技项目(JA12335)

摘  要:客户关系数据库中拥有大量的客户记录,其中许多记录构成相似重复记录,检测、清洗进而合并相似重复记录可以提高存储空间的利用率,还可以加快记录查询的速度.在研究客户记录的基础上,提出一种客户关系数据库相似重复记录清洗算法,算法首先对记录进行排序,设定属性权重和记录相似度闸值,通过计算相邻记录的相似度判定记录是否相似重复,最后对检测到的相似重复记录进行清洗与合并.Customer relationship database has a large number of customer records, many of which constitute approximately duplicated records. Detecting, cleaning and then merging approximately duplicated records can improve storage utilization, and can also improve the speed of searching records. Based on the research of customer records, an algorithm which is used to clean approximately duplicated records in customer relationship database is proposed. In this algorithm, first, records are sorted;the property weight and records similarity values are set. Then by calculating the similarity between adjacent records, approximate or duplicate records are judged. Finally the detected approximately duplicated records are cleaned and merged.

关 键 词:客户关系 相似重复记录 清洗 合并 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象