检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈富赞[1] 刘青[1] 李敏强[1] 寇纪淞[1]
出 处:《系统工程学报》2012年第1期129-136,共8页Journal of Systems Engineering
基 金:国家自然科学基金资助项目(71101103;70925005;61074152;70771074);教育部博士点专项科研基金资助项目(20100032120086;20090032110065;20100032110036)
摘 要:Web使用挖掘作为数据挖掘的一个重要任务,有助于了解用户群体的特征,从而为其提供个性化服务.提出了一种基于用户会话聚类的Web使用挖掘算法.首先,对Web日志预处理采用基于时间窗的用户会话识别方法,提出了一种基于三元组的用户会话表示方法,并在此基础上给出了基于网页语义相似性的会话处理方法,该方法能够在保持用户兴趣不变的情况下有效降低会话维度;其次,提出了一种基于时间及频次的用户会话相似性度量方法;最后,设计了一种两阶段PS-KM会话聚类算法,先用PSO方法进行全局搜索再转入基于K-means方法的局部聚类过程.仿真表明了算法的有效性.Web usage mining has been an important task of data mining. It helps to understand the user group's identity, thus provides personalized service. A novel web usage mining algorithm based on the user sessions clustering is proposed in this paper. Firstly, a time-based user session identification method is used for Web log preprocessing. Furthermore, a 3-tuple data structure is designed to represent web sessions, and a session dimensionality reduction method based on web page semantic similarity is proposed, which could deduce the length of the session effectively with user's interest retaining. Secondly, a new session similarity measure is designed based on both time and frequency. Finally, a two-stage PS-KM session clustering algorithm is proposed. The algorithm first uses PSO method to make a global search, and then uses the local clustering process based on the K-means method. Experimental results show that the algorithm has highly effective.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229