检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]复旦大学计算机科学技术学院,上海201203
出 处:《小型微型计算机系统》2014年第7期1567-1572,共6页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61170007)资助
摘 要:Web不仅包含海量文本信息,还包含大量表格数据.与自由格式的文本信息相比,Web表格所包含的信息更为精练并且结构化,便于数据挖掘.Web表格挖掘已成为一个热点研究问题.为挖掘Web表格的行列结构、背景、颜色、文本字体、字号等视觉信息所蕴含的语义,一种图结构模型以及构造方法被提出.基于图模型,Web表格的视觉并列关系被形式化定义.一种自动抽取Web表格的视觉并列关系的算法被提出.实验表明所提取的视觉并列关系与语义相似度之间存在着显著的正相关性,Web表格视觉并列关系的提取将有助于其他语义分析工作.Web contains not only a large number of text data but also a huge amount of table data. In contrast to free-format text data,data of web tables is briefer and more structured,which makes it easier for mining. Therefore web table mining has become a hotspot. A graph model to represent various visual features of web tables including structure of rows and columns,background and color of cells,and font and size of text is proposed as well as its construction method. Based on the graph model,kinds of visually parallel relationships in web tables are formally defined. An automatic algorithm to extract these relationships from web tables is also provided. Finally,experiment results show that there is a significant correlation between the extracted visually parallel relationship and semantic relatedness,which means the visually parallel relationship mining in web tables may be conducive to other semantic analysis work.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.146.45