基于PageRank算法的文本关键词权重计算研究  

Research on Term's Weight Calculation Based on PageRank Algorithm

在线阅读下载全文

作  者:王庆福[1] 

机构地区:[1]辽宁行政学院,沈阳110161

出  处:《网络新媒体技术》2015年第3期37-41,共5页Network New Media Technology

摘  要:关键词的权值计算绝大多数都是将关键词当作独立的部分,忽略关键词间关联性。试图从关键词间关联性出发,提出关键词的权值受到其他关键词的相互贡献作用,以PageRank算法中对于网页权值的迭代计算为理论基础,提出一种基于关键词间相互投票的权值迭代计算模型,将关键词抽象为模型中各个节点,关键词的初始权值采用经典的TF-IDF方法。将改进的关键词权值计算方法应用于Reuters21578 Top10和20Newsgroup数据集上,实验结果表明,新的算法能够较为明显地差异化关键词之间权值,达到区分文本中关键词重要程度的作用。The weight calculation of terms in text which mainly regards terms as a separate part, ignoring the correlation among terms. A kind of theory, which is based on correlation among terms, proposed about the term' s weight could acquire contribution from other terms. The model of weight iterative calculation based on vote among terms is proposed under the foundation of PageRank algorithm on web page weight iterative calculation. Each of term is represented as node in the model, the initial weight of the node is obtained by TF - IDF method. The experimental results on open Reuters21578 ToplO and 20Newsgroup datasets show that the improved algorithm could differentiate terms through weight significantly in order to distinguish the features in text.

关 键 词:词项权重 投票模型 迭代收敛 权值差异化 特征项区分 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象