基于概率生成模型的网络数据分类方法  被引量:2

Classification in Networked Data Based on the Probability Generative Model

在线阅读下载全文

作  者:王桢文[1] 肖卫东[1] 谭文堂[1] 

机构地区:[1]国防科学技术大学信息系统工程重点实验室,长沙410073

出  处:《计算机研究与发展》2013年第12期2642-2650,共9页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61302144;61303062)

摘  要:利用实体之间的相互关系来对实体进行分类的网络数据分类是数据挖掘的一个重要研究内容.现有的网络数据分类方法普遍根据邻居节点的类别来对节点进行分类.这些方法在同质性程度较高的网络中达到了很高的分类精度.然而在现实世界中,存在许多同质性程度很低的网络.在低同质性网络中,大多数相连节点的类别不同,所以现有方法难以正确预测出节点的类别.因此,提出了一种新的网络数据分类方法.其主要思路是建立一个描述网络的概率生成模型.在这个概率生成模型中,将网络中的边作为观察变量,将未知类别节点的类别作为潜在变量.通过吉布斯采样方法对模型进行求解,计算出潜在变量的取值,从而得到未知类别节点的类别.在真实数据集上的对比实验表明,提出的分类方法在低同质性网络上有更好的分类性能.Classification in networked data, which classify entities based on their relationship information, is an important research issue of the data mining field. The previous methods usually assign a class to a node based on the classes of its neighbor nodes. These methods have high performance of classification in the networks with high . However, there are many networks with low homophily in the real world. In the networks with low homophily, there are a majority of connected nodes whose classes are different from each other. The previous methods cannot assign the correct classes to the nodes in such networks. Therefore, a novel method of classification in networked data is proposed in this paper. The main idea of the proposed method is to build a new generative model for networks, in which the edges of networks are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of latent variables can be calculated by fitting the generative model to the network. Consequently, the classes of the nodes whose classes are unknown are obtained. Experimental results on the real datasets show that the proposed method can provide better performance than the previous methods in the networks with low homophily.

关 键 词:网络数据 网络数据分类 节点分类 概率生成模型 同质性 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象