面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论  被引量:18

Research on Knowledge Unit Representation and Extraction for Unstructured Abstracts of Chinese Scientific and Technical Literature: Ontology Theory Based on Knowledge Unit

在线阅读下载全文

作  者:郑梦悦 秦春秀[1] 马续补[1] 

机构地区:[1]西安电子科技大学经济与管理学院

出  处:《情报理论与实践》2020年第2期157-163,共7页Information Studies:Theory & Application

基  金:国家自然科学基金项目“知识社区中的资源语义空间及其检索研究”的成果,项目编号:71573199

摘  要:[目的/意义]近年来,科技文献资源呈爆炸性增长,海量科技文献中依旧存在大量非结构化摘要。非结构化摘要一方面不利于学者阅读与理解;另一方面不利于对摘要内部信息进行知识的自动化抽取和相应的检索。研究科技文献非结构化摘要的知识表示模型及其自动化抽取方法,对学者快速阅读和机器自动化处理具有重要意义。[方法/过程]文章在分析科技文献非结构化摘要结构的基础上,结合知识元本体理论,构建了一个面向科技文献非结构化摘要的知识元本体模型。通过分析非结构化摘要的写作特征,将文本按句子级划分为目的、方法、结果或结论三个要素,统计每个要素句中的线索词、句型和位置,建立相关规则库,根据本体模型和规则库构建相关抽取算法。最后,下载《计算机技术与发展》中的部分文献进行实验。[结果/结论]通过增加句型集和线索词集,完善了非结构化摘要的要素,构建了非结构化摘要知识元本体模型。实验结果表明,根据本文提出的模型能有效地对非结构化摘要中的知识元进行抽取。[局限]实验的不足之处是需要人工对摘要中的句型和线索词进行归纳总结。[Purpose/significance]In recent years,the resources of scientific and technological literature are increasing explosively,and there are still a large number of unstructured abstracts in the massive scientific literature.On the one hand,unstructured abstract is not conducive to the reading and understanding of scholars,and on the other hand,it is not conducive to the automatic extraction and corresponding retrieval of knowledge of the internal information of the abstract.It is of great significance for scholars to quickly read and automate the processing of knowledge representation models and their automated extraction methods for the unstructured abstracts of scientific literature.[Method/process]Based on the analysis of the unstructured abstract structure of scientific literature,this paper constructs a knowledge unit ontology model for the unstructured abstract of scientific literature based on the knowledge unit ontology theory.By analyzing the writing characteristics of unstructured abstracts,the text is divided into three units:purpose,method,result or conclusion according to the sentence level.The clue words,sentence patterns and positions in each element sentence are counted,and the relevant rule base is established.According to the ontology,the model and rule base construct a correlation extraction algorithm.Finally,download some of the literature in Computer Technology and Development for experimentation.[Result/conclusion]This paper improves the units of unstructured abstracts by adding sentence patterns and clues,and constructs an unstructured abstract knowledge unit ontology model.The experimental results show that the model proposed in this paper can effectively extract the knowledge units in the unstructured abstract.[Limitations]The shortcoming of the experiment is that the sentence patterns and clue words in the abstract need to be summarized manually.

关 键 词:科技文献 非结构化摘要 知识表示 知识抽取 知识元 本体模型 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象