学术文本的结构功能识别——基于段落的识别  被引量:39

The Structure Function Recognition of Academic Text——Paragraph-based Recognition

在线阅读下载全文

作  者:黄永[1] 陆伟[1] 程齐凯[1] 桂思思[1] 

机构地区:[1]武汉大学信息管理学院信息检索与知识挖掘研究所,武汉430072

出  处:《情报学报》2016年第5期530-538,共9页Journal of the China Society for Scientific and Technical Information

基  金:国家自然科学基金面上项目"面向词汇功能的学术文本语义识别与知识图谱构建"(项目编号:71473183);教育部人文社会科学基地重大项目"面向细粒度的网络信息检索模型及框架构建研究"(项目编号:10JJD630014)的研究成果之一

摘  要:学术文本的结构功能识别是学术文本章节层次的文本分类问题,其本质就是识别章节的结构功能。本文将基于段落的学术文本结构功能识别分为两个子问题:段落位置识别及基于段落投票的章节结构功能识别。在自动构建的大规模数据集上的实验结果表明,虽然基于段落的结构功能识别效果不如基于章节整体内容的识别,但仍然取得了不错的效果。本文结合实验结果着重分析了影响基于段落的识别效果的两个重要因素:段落长度及章节中段落数量,并在最后对学术文本结构功能识别的三个层次做了总结,指出了拟进一步探讨的问题和方向。The structure function recognition of academic text is a text categorization problem on section level, of which essence is to recognize the structure function of sections. In this paper, we have divide the paragraph-based recognition into two subtasks: the recognition of paragraph position and the structure function recognition based on majority voting by paragraphs in sections. Experiments were conducted on datasets constructed automatically. Though the results were not as good as the recognition based on section content, it proved that it is feasible to recognize structure function based on paragraph. Also we analyzed the reasons from the aspects of the length of paragraph and the number of paragraphs in sections. Finally, we summarized the research works of structure function recognition briefly and some potential application are recommended.

关 键 词:结构功能 文本分类 文本挖掘 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象