检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]沈阳建筑大学信息与控制工程学院,沈阳110168
出 处:《计算机应用研究》2007年第8期242-245,共4页Application Research of Computers
基 金:国家科技成果重点推广计划资助项目(2004EC000096)
摘 要:为了正确分离图文,提出一种基于字符群体特征的图文分离算法。该方法以直线识别得到的短直线为基础,对连通域进行限制长度的外轮廓提取;通过大小和密度判据捡出候选字符,并以字符串形式出现的群体特征吸收漏识的字符和符号,实现包含标注字符、标题栏及明细栏字符等各类字符与图形位图的分离。结果表明:该算法提高了字符特别是难检字符及符号判定的可靠性,保持了字符串的完整性,具有适应性强、效果好的特点。This paper presented a separation method based on feature of characters which always occur as a string. First, the algorithm began with short lines retrieving by line vectorization and extracts their corresponding outer contours restricted by length threshold. Then it used size and density criteria to find out character candidates from connected areas satisfied with con- tour length condition. Finally, absorbing missed characters and symbols in the string, separating text including annotation, headline and characters in title column and subsidiary column from graphics. Experiments show that the algorithm is strongly adaptable and more reliable to extract characters, especially for characters which difficultly judge by mathematically feature of connected area like “I” “i” “1”and so on, better keeping the integrality of string.
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15