基于层次空间聚类的表语义汇总算法被引量：2

Clustering-based Algorithms to Semantic Summarizing the Table with Multi-attributes'Hierarchical Structures

出　　处：《计算机科学》2012年第3期163-169,共7页Computer Science

基　　金：国家"十一五"部委预研基金(513150402)资助

摘　　要：通过数据概化,在多维属性的属性值概念分层上构造少量的具有抽象语义的元组来替换大量具有详细语义的原始元组,从而汇总数据表,这称作表语义汇总。给定原始数据表及其多维属性的属性值的概念分层,表语义汇总的目标是产生规定压缩率且保留尽可能多的语义信息的汇总表。现有算法采用在概化元组集合中寻找最佳概化元组组合的策略将其转换成Set-Covering问题来解决,尽管采取了多种优化策略(如预处理、分级处理)来提高效率,但仍存在转换开销大、算法框架复杂且不易扩展到高维属性等缺点。通过定义多维属性层次结构的度量空间将该问题转换为多维层次空间聚类问题并引入dewey编码来提高转换效率,提出了基于快速收敛的层次凝聚和基于层次空间分辨率调整的两种聚类算法来高效地建立语义汇总表。经真实数据集上的实验表明,新算法在执行效率和汇总质量上都优于现有方法。Table semantic summarization is to create a small size summary table by using a few general tuples to replace all tuples in the raw data table with the help of attributes concept hierarchies. The aim of summarization is to restrict the size of the summary table to a fixed value with the semantic information remained in it as more as possible. The existing method translates table summarization to a set-covering problem and spends much cost in the problem translation which makes it impractical. We defined the metric space of tuples with multi-attributes hierarchical structure and translated this problem to a clustering problem in a hierarchial space. We proposed two algorithms. One was hierarchial agglomerative method and the other was based on the idea of adjusting the resolution of the hierarchial space. The experiment on real life dataset shows that our methods are better than the existing one in both running time and summary quality

关键词：数据概化概念分层语义汇总层次聚类

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于层次空间聚类的表语义汇总算法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于层次空间聚类的表语义汇总算法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于层次空间聚类的表语义汇总算法被引量：2