基于深度学习与OCR识别技术的合同审核与标注方法  

Contract Review and Labeling Method Based on Deep Learning and OCR Recognition Technology

在线阅读下载全文

作  者:胡长生 HU Changsheng(School of intelligent industy,Fuzhou Software Technology Vocational College,Fuzhou,Fujian 350211,China)

机构地区:[1]福州软件职业技术学院智能产业学院,福建福州350211

出  处:《福建技术师范学院学报》2024年第5期30-37,共8页JOURNAL OF FUJIAN POLYTECHNIC NORMAL UNIVERSITY

摘  要:针对当前合同审核方法无法高精度识别合同内容,标注时间长的问题,提出了基于深度学习和OCR识别技术的合同审核与标注方法.基于OCR识别技术构建合同文本识别模型,利用OCR识别引擎将纸质文档中的文字转换成为黑白图像;然后对黑白合同文本图像进行二值化预处理,计算图像相似度;基于相似度梯度对图像进行标准差局部对比和赋值处理,以分割字符前景与页面背景,完成合同审核;构建基于深度学习的目标标注模型,确定各合同段落特征向量,将文字段落的特征向量分类转化为二次函数寻优问题,进行段落图像的特征分类优化;引入回归理论修正标注模型的损失函数,以缩小合同段落标注模型输出与预测结果之间的误差,完成合同标注.由实例分析结果可知,该方法能够通过对比定稿文件和用印文件获取详细的差异列表,且合同标注速度较快,合同文本的正确识别率较高.A contract review and labeling method based on deep learning and OCR recognition technology is proposed to address the problem of inaccurate recognition of contract content and long labeling time in current contract review methods.Based on OCR recognition technology,the contract text recognition model was constructed,and the OCR recognition engine was employed to convert the Chinese characters of paper documents into black and white images.Binary preprocessing of contract text images was conducted and the similarity between black and white images was calculated.Based on the similarity gradient,the standard deviation of the image was compared and assigned,the foreground of the characters and the background of the page were segmented,and the contract review was completed.The target labeling model based on deep learning was constructed to determine the feature vectors of each contract paragraph,and the feature vector classification of text paragraphs was transformed into a quadratic function optimization problem,and the feature classification optimization of paragraph images was carried out.The regression theory was introduced,the loss function of the labeling model was modified,the error between the output and the prediction results of the contract segment labeling model was reduced,and the contract labeling was completed.From the case analysis results,it can be seen that the proposed method can obtain a detailed list of differences by comparing the final document and the printed document,and reach faster speed in the contract labeling as well as higher correct recognition rate of the contract text.

关 键 词:深度学习 OCR识别技术 合同审核 合同标注 

分 类 号:O29[理学—应用数学] TP335[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象