融合语言模型的化验单文字识别矫正研究  

RESEARCH ON POST-PROCESSING OF TEST SHEET CHARACTER RECOGNITION BASED ON THE FUSION LANGUAGE MODEL

在线阅读下载全文

作  者:张煜楠 吕学强[1] 黄庆浩 游新冬 何健 董志安[2] 黄跃 Zhang Yunan;LüXueqiang;Huang Qinghao;You Xindong;He Jian;Dong Zhian;Huang Yue(Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Lawke Intelligent Medical Technology Co.,Ltd.,Beijing 100015,China;Institute of Internet Industry,Tsinghua University,Beijing 100084,China;Xuanwu Hospital,Capital Medical University,Beijing 100053,China)

机构地区:[1]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101 [2]北京洛奇智慧医疗科技有限公司,北京100015 [3]清华大学互联网产业研究院,北京100084 [4]首都医科大学宣武医院,北京100053

出  处:《计算机应用与软件》2023年第10期179-184,221,共7页Computer Applications and Software

基  金:国家自然科学基金项目(61671070);北京信息科技大学促进高校内涵发展科研水平提高项目(2019KYNH226);北京信息科技大学“勤信人才”培育计划项目资助项目(QXTCP B201908);北京成像技术高精尖创新中心项目(BAICIT-2016003);网络文化与数字传播北京市重点实验室开放基金项目(icdd201905)。

摘  要:针对自然场景下化验单文字识别容易出现混淆的问题,提出一种融合语言模型的自然场景下的化验单文字识别后处理矫正方法。该方法通过引入统计语言模型,对识别区域矩阵进行条件概率统计,预测符合医学词库的最佳识别结果,使用基于融合的编辑距离和最长公共子序列方法进行检验项名称矫正,根据检验项对应关系对其他指标进行矫正。引入该方法的后处理结果与不加后处理的识别结果相比,在医疗化验单的识别任务上,获得了准确率、召回率、F1值不同程度的提高。对比实验表明,该方法能够进一步提高文本框文字的识别精度,为后期化验单解读奠定了基础。To solve the problem that it is easy to be confused in the aspect of character recognition in laboratory sheet character recognition in natural scenes,this paper proposes a post-processing correction method for laboratory sheet character recognition in natural scenes that integrates language model.The statistical language model was introduced to carry out conditional probability statistics on the recognition region matrix to predict the recognition results that best conform to the medical thesaurus.The recognition results of inspection item were corrected based on the fusion editing distance and the longest common subsequence method.The other indexes were corrected according to the corresponding relationship of test items.Compared with the post-processing method without post-processing,the accuracy rate,recall rate and F1 value of the post-processing method were improved.Comparative experimental results show that the proposed method can further improve the recognition accuracy of text boxes and lay a solid foundation for the interpretation of later laboratory tests.

关 键 词:化验单 文字识别 语言模型 编辑距离 最长公共子序列 

分 类 号:TP319[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象