检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东北大学信息科学与工程学院,辽宁沈阳110004
出 处:《东北大学学报(自然科学版)》2010年第6期782-785,共4页Journal of Northeastern University(Natural Science)
基 金:国家自然科学基金资助项目(60773218);国家高技术研究发展计划项目(2009AA01Z122);辽宁省科学技术基金资助项目(20072031)
摘 要:博客聚类是处理博客信息的有效方法,提出基于评论修正的博客页面聚类算法.首先分析博客所包含的信息层次结构,然后利用博客页面的通用属性构建博客属性模型,基于博客属性模型对博客页面进行聚类,并且在初次聚类的基础上利用博文的评论对聚类结果进行修正.采用通用的熵和纯净度来衡量聚类结果,根据评论利用方式的不同,设计了两种实验方案:一个实验直接使用评论参与聚类,另一个将评论作为聚类后的修正手段.实验结果对比表明,在大多数情况下,利用评论作为修正手段的聚类效果要优于直接利用评论参与聚类.Public blog clustering is an effective way to process blog information.A public blog clustering algorithm was therefore proposed,based on the revision by comments.Analyzing the information hierarchy of public blog,a public blog attribute model based on the general attributes of blog pages was developed as a basis on which the public blog was clustered.Then,after the initial clustering,the comments on the clustered public blog were taken in to revise the clustered blog.The clustered results were evaluated with entropy and purity,and two testing schemes were designed according to different ways of taking the comments in.One was making the comments on public blog participate in clustering process directly,the other was making use of the comments after clustering to play the role of revision.Testing results showed that,in most cases,the latter was more effective than the former.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.119.110.128