基于图模型的Web表格中视觉并列关系的研究  

A Study of Visually Parallel Relationships in Web Tables Based on Graph Models

在线阅读下载全文

作  者:李雯琴[1] 谢志鹏[1] 

机构地区:[1]复旦大学计算机科学技术学院,上海201203

出  处:《小型微型计算机系统》2014年第7期1567-1572,共6页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61170007)资助

摘  要:Web不仅包含海量文本信息,还包含大量表格数据.与自由格式的文本信息相比,Web表格所包含的信息更为精练并且结构化,便于数据挖掘.Web表格挖掘已成为一个热点研究问题.为挖掘Web表格的行列结构、背景、颜色、文本字体、字号等视觉信息所蕴含的语义,一种图结构模型以及构造方法被提出.基于图模型,Web表格的视觉并列关系被形式化定义.一种自动抽取Web表格的视觉并列关系的算法被提出.实验表明所提取的视觉并列关系与语义相似度之间存在着显著的正相关性,Web表格视觉并列关系的提取将有助于其他语义分析工作.Web contains not only a large number of text data but also a huge amount of table data. In contrast to free-format text data,data of web tables is briefer and more structured,which makes it easier for mining. Therefore web table mining has become a hotspot. A graph model to represent various visual features of web tables including structure of rows and columns,background and color of cells,and font and size of text is proposed as well as its construction method. Based on the graph model,kinds of visually parallel relationships in web tables are formally defined. An automatic algorithm to extract these relationships from web tables is also provided. Finally,experiment results show that there is a significant correlation between the extracted visually parallel relationship and semantic relatedness,which means the visually parallel relationship mining in web tables may be conducive to other semantic analysis work.

关 键 词:Web表格挖掘 视觉要素 图模型 视觉并列关系 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象