考虑文本空间结构的单篇文本特征词排序方法  被引量:2

A word ranking method considering the text space structure for a single document

在线阅读下载全文

作  者:魏伟[1,2] 孟祥主[3] 郭崇慧 WEI Wei;MENG Xiangzhu;GUO Chonghui(Center for Energy,Environment&Economy Research,Zhengzhou University,Zhengzhou 450001,China;Institution of Systems Engineering,Dalian University of Technology,Dalian 116024,China;School of Computer Science and Technology,Dalian University of Technology,Dalian 116024,China)

机构地区:[1]郑州大学能源-环境-经济研究中心,郑州450001 [2]大连理工大学系统工程研究所,大连116024 [3]大连理工大学计算机科学与技术学院,大连116024

出  处:《系统工程理论与实践》2020年第5期1293-1303,共11页Systems Engineering-Theory & Practice

基  金:国家自然科学基金(71771034);揭阳市科技计划项目(2017xm041)。

摘  要:特征选择是文本挖掘领域中重要的基础性工作,能够为后续文本挖掘任务的顺利实施提供良好的数据处理方法和技术支持,而特征词排序是特征选择的关键环节.结合文本统计信息和结构信息以及流形排序思想,提出了一种新的特征词排序方法.通过构造原始文本中潜在的能够反映文本语义和结构信息的条件共现度词网络作为特征词间的流形结构,并以特征词的词频统计信息作为特征词初始权重,结合流形排序思想以及图学习理论进行特征词间的相似性学习,进而实现对特征词重要性排序.分别在公共语料集和补充语料集上与其它多种特征词排序方法进行数值实验对比,实验结果验证了方法的有效性.该方法拓宽了流形排序思想和图学习理论在文本挖掘领域的应用,也给单篇文本特征词排序提供了新的方法和策略.Feature selection is an important basic work in the field of text mining,which can provide reliable data processing methods and technical support for the implementation of subsequent text mining tasks smoothly.At the same time,feature word ranking is the key part of feature selection.In this research,we propose a word ranking method based on manifold ranking in combination with the textual statistics and structural information.Combining with the idea of manifold ranking,we construct the text’s conditional co-occurrence degree word network,which can reflect the semantic and structural information of text,and the network is treated as the potential manifold structure.Taking the term frequency as the original ranking result,and then the words’ weights and ranking are reevaluated and optimized by using the similarity learning of words with the graph learning theory and manifold ranking theory.Numerical experiments are compared with other word ranking methods on both public datasets and supplementary corpus,which all verify the effectiveness of the proposed method.In addition,this method broadens the application of graph learning theory in the field of text mining,and it also provides a new method and strategy for word ranking in single document.

关 键 词:特征选择 特征词排序 词频 流形排序 图学习 条件共现度 

分 类 号:P181[天文地球—天文学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象