检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:彭玉芳 陈将浩[3] 何志强 Peng Yufang;Chen Jianghao;He Zhiqiang(School of Economics&Management,Nanjing Institute of Technology,Nanjing 211167,China;Department of Information Management,Nanjing University,Nanjing 210046,China;School of Mathematical Sciences,University of Science and Technology of China,Hefei 230026,China;Suzhou Research Institute,University of Science and Technology of China,Suzhou 215123,China)
机构地区:[1]南京工程学院经济与管理学院,江苏南京211167 [2]南京大学信息管理学院,江苏南京210046 [3]中国科学技术大学数学科学学院,安徽合肥230026 [4]中国科学技术大学苏州研究院,江苏苏州215123
出 处:《现代情报》2022年第2期55-69,共15页Journal of Modern Information
基 金:国家社会科学基金重大项目“南海疆文献资料整理中的知识发现与维权证据链建构研究”(项目编号:19ZDA347);南京大学2015年度研究生创新工程“跨学科科研创新基金”项目“民国档案文献中的环中国南海文化电函与报道研究”(项目编号:2015CW04);江苏省研究生培养创新工程项目“基于自动关联技术的南海问题证据链研究”(项目编号:KYLX15_0025)。
摘 要:[目的/意义]本文尝试从文献载体到文献内容(全文检索)再到数据层面的细粒度的南海证据性数据抽取。首先,能提高南海文献数字资源的检索性能;其次,为专业人员提供充足的证据材料;最后,为南海维权的证据链关联模型构建做好基础。[方法/过程]根据南海维权证据的特点,制定抽取规则。通过文本清洗、文本分段、段分句、分词把非结构化的数据转化成结构化数据。然后分别比较朴素贝叶斯、SVM、随机森林、DNN、TexCNN、Bi-LSTM、LightGBM和XGBoost的证据性数据抽取效果。最后为了进一步提高证据抽取的准确性,增加了“5W”规则过滤和人工校验。[结果/结论]实验结果表明,基于TensorFlow深度学习框架,构建DNN模型的证据性数据抽取效果较好,准确率达0.88。通过进一步融合“5W”规则过滤和人工校验,显著地提高了南海证据性数据抽取的准确率,本文的证据抽取的方法具有一定的可行性。[Purpose/Significance]The study attempts to extract the fine-grained evidence data of the South China Sea from the document carrier to the document content(full-text search)to the data level.Firstly,it can improve the retrieval performance of the digital resources of the South China Sea literature;secondly,it provides sufficient evidence materials for professionals;and finally,it provides a foundation for the construction of the evidence chain association model of the South China Sea rights protection.[Method/Process]According to the characteristics of the South China Sea rights protection evidence,the extraction rules were formulated.Unstructured data were transformed into structured data through text cleaning,text segmentation,paragraph segmentation,and word segmentation.Then the evidence data extraction effects of Naive Bayes,SVM,Random Forest,DNN,TextCNN,Bi-LSTM,LightGBM and XGBoost were compared respectively.Finally,in order to further improve the accuracy of evidence extraction,“5W”rule filtering and manual verification were added.[Result/Conclusion]The experimental results showed that based on the TensorFlow deep learning framework,the evidence data extraction effect of the DNN model was better,and the accuracy rate was 0.88.Through further integration of“5W”rule filtering and manual verification,the accuracy of evidence extraction was significantly improved.The method of evidence extraction from the South China Sea literature in this article has certain feasibility.
关 键 词:证据性数据抽取 TensorFlow 机器学习算法 深度学习算法 “5W”规则
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.196