基于维基百科的领域历史沿革信息抽取被引量：5

Information extraction of history evolution based on Wikipedia

机构地区：[1]内蒙古师范大学计算机与信息工程学院,呼和浩特010022

出　　处：《计算机应用》2015年第4期1021-1025,1044,共6页journal of Computer Applications

基　　金：内蒙古自然科学基金资助项目(2013MS0912)

摘　　要：针对在软件工程的教学过程中,由于领域概念种类多、演变快,导致学生理解记忆困难的问题,提出了通过抽取软件工程领域历史沿革主题信息构建知识库的方法。该方法首先结合自然语言处理技术与Web信息抽取技术从维基百科的自由文本中抽取实体与实体关系构建候选集;再利用关键词抽取方法 TextRank从候选集中抽取与历史沿革关系最密切的实体关系;最后以关键实体关系为核心,抽取邻近的时间实体与概念实体组成五元组构建了知识库。在抽取信息的过程中,结合文本的语义信息对TextRank算法进行了改进,提高了抽取的准确率。实验结果表明,该知识库能够将软件工程领域的概念按时序特征组织在一起,验证了所提方法的有效性。The domain concepts are complex, various and hard to capture the development of concepts in software engineering. It＇s difficult for students to understand and remember. A new effective method which extracts the historical evolution information on software engineering was proposed. Firstly, the candidate sets included entities and entity relationships from Wikipedia were extracted with the Nature Language Processing（ NLP） and information extraction technology. Secondly, the entity relationships which being closest to historical evolution from the candidate sets were extracted using TextRank; Finally, the knowledge base was constructed by quintuples composed of the neighboring time entities and concept entities with concerning the key entity relationship. In the process of information extraction, TextRank algorithm was improved based on the text semantic features to increase the accuracy rate. The results verify the effectiveness of the proposed algorithm, and the knowledge base can organize the concepts in software engineering field together according to the characteristics of time sequence.

关键词：软件工程历史沿革信息抽取关键词抽取 TextRank

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于维基百科的领域历史沿革信息抽取被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于维基百科的领域历史沿革信息抽取 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于维基百科的领域历史沿革信息抽取被引量：5