检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王娟 李宁[1] 姜雨彤[1] 田英爱[1] WANG Juan;LI Ning;JIANG Yu-tong;TIAN Ying-ai(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China)
机构地区:[1]北京信息科技大学网络文化与数字文化传播重点实验室,北京100101
出 处:《科学技术与工程》2021年第17期7208-7216,共9页Science Technology and Engineering
基 金:国家自然科学基金(61672105)。
摘 要:流式文档结构识别对于文档自动排版和优化、信息检索等领域有着重要作用。以往针对流式文档结构识别主要集中于学术论文领域,对于其他诸如公文、报告等多类型的文档结构识别研究较少。针对此现状,使用聚类的方法对文档进行分类,在此基础上提出了针对不同文档分类的、基于双向门控循环单元-条件随机场(bidirectional gated recurrent unit-conditional random field,BIGRU-CRF)的文档结构识别方法,以此来解决多类型文档结构识别的问题。实验结果表明,该方法不仅能够提高学术论文结构识别的效果,对其他类型的文档结构也能够进行较好地识别。Stream document structural recognition plays an important role in automatic document layout and optimization,information retrieval and other fields.In the past,it had been mainly focused on academic papers,but less research had been done on other types of documents including official documents and reports.Based on the current analysis and the clustering method to recognize documents,a document structure recognition method based on different document classification and bidirectional gated recurrent unit-conditional random field(BIGRU-CRF)was proposed to solve the problem of multi-type document structure recognition.It has been shown that this method can not only improve the recognition of the structure of academic papers,but also do better for other types of document structures.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.5.91