基于敏感点颜色聚类和行聚类筛选的文本提取  被引量:3

Text extraction based on clustering colors at sensible points and clustering text-lines for text-selection

在线阅读下载全文

作  者:刘琼[1] 周慧灿[1] 王耀南[2] 

机构地区:[1]湖南文理学院计算机科学与技术学院,湖南常德415000 [2]湖南大学电气与信息工程学院,长沙410082

出  处:《计算机应用》2010年第2期449-452,共4页journal of Computer Applications

摘  要:针对现有的文本提取算法不能适应复杂背景变化和文字本身的形状变化问题,提出一种基于敏感点颜色两级聚类和文本行聚类筛选的方法。新方法利用人眼视觉对颜色大幅度变化更敏感的特点,以敏感点的主要颜色作为聚类分析的依据,克服了现有阈值方法和聚类方法受背景颜色变化影响较大的问题。在此基础上,以文本行的空间排列特征为依据进进行文本行筛选,以克服一般方法容易受文字形状和尺寸变化影响的缺点。实验表明,新方法对于背景的复杂变化和文字的形状尺寸变化都具有很好的适应性。Since the existing text extraction methods can not adapt to the variation of complex background and shape, a new method was brought forward. It was founded on two-level color clustering of sensible points and text-line clustering. Because human vision perception is more sensitive to great change of colors, the new method only selected the main colors at sensible points to cluster. The strategy could solve the problems of the existing methods based on threshold and clustering which were greatly influenced by the variation in colors of complex background. And then, the text-lines were selected according to the fact that texts always align with each other in a. same text-line. That course can eliminate the influence of variation in shape and size of characters. Experimental results indicate that, the new method has good adaptability to complex change of background, and texts with different size and shape.

关 键 词:文本提取 K均值聚类 边缘密度 文本行聚类 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象