Markov逻辑网在重复数据删除中的应用  被引量:3

Markov Logic Networks with its application in De-duplication

在线阅读下载全文

作  者:张玉芳[1] 黄涛[1] 艾东梅[1] 熊忠阳[1] 唐蓉君[2] 

机构地区:[1]重庆大学计算机学院,重庆400044 [2]重庆大学网络中心,重庆400044

出  处:《重庆大学学报(自然科学版)》2010年第8期36-41,共6页Journal of Chongqing University

基  金:重庆市自然科学基金资助项目(CSTC2008BB2021);中国博士后科学基金资助项目(20070420711)

摘  要:为了解决和突破现阶段重复数据删除方法大多只能针对特定领域,孤立地解决问题的某个方面所带来的不足和局限,提出了基于Markov逻辑网的统计关系学习方法。该方法可以通过计算一个世界的概率分布来为推理服务,从而可将重复数据删除问题形式化。具体采用了判别式训练的学习算法和MC-SAT推理算法,并详细阐述了如何用少量的谓词公式来描述重复数据删除问题中不同方面的本质特征,将Markov逻辑表示的各方面组合起来形成各种模型。实验结果表明基于Markov逻辑网的重复数据删除方法不但可以涵盖经典的Fellegi-Sunter模型,还可以取得比传统的基于聚类算法和基于相似度计算的方法更好的效果,从而为Markov逻辑网解决实际问题提供了有效途径。In order to solve the limitation that the traditional De-duplications are mostly used for a specific field and only address one aspect of a problem, a scheme based on Markov Logic Networks (MLNs)is proposed, which is a new Statistical Relational Learning (SRL) model. With its advantage of computing the probability distribution of worlds to serve for the inference, the De-duplication is formalized. Discriminative learning algorithm is adopted for Markov Logic Networks weights, MC-SAT algorithm is adopted for inference. It shows how to capture the essential features of different aspects in De-duplication with a small number of predicate rules and also combines these rules together to compose all kinds of model. The experiment results prove that the method based on Markov Logic Networks not only covers the original Fellegi-Sunter model, but also achieves a better result than the traditional methods based on Clustering Algorithms and Similarity Measures in De-duplication. It reveals that the Markov Logic Networks can play an important part in practical application.

关 键 词:重复数据删除 MARKOV逻辑网 MARKOV网 统计关系学习 机器学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象