检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王庆福[1]
机构地区:[1]辽宁行政学院,沈阳110161
出 处:《网络新媒体技术》2015年第3期37-41,共5页Network New Media Technology
摘 要:关键词的权值计算绝大多数都是将关键词当作独立的部分,忽略关键词间关联性。试图从关键词间关联性出发,提出关键词的权值受到其他关键词的相互贡献作用,以PageRank算法中对于网页权值的迭代计算为理论基础,提出一种基于关键词间相互投票的权值迭代计算模型,将关键词抽象为模型中各个节点,关键词的初始权值采用经典的TF-IDF方法。将改进的关键词权值计算方法应用于Reuters21578 Top10和20Newsgroup数据集上,实验结果表明,新的算法能够较为明显地差异化关键词之间权值,达到区分文本中关键词重要程度的作用。The weight calculation of terms in text which mainly regards terms as a separate part, ignoring the correlation among terms. A kind of theory, which is based on correlation among terms, proposed about the term' s weight could acquire contribution from other terms. The model of weight iterative calculation based on vote among terms is proposed under the foundation of PageRank algorithm on web page weight iterative calculation. Each of term is represented as node in the model, the initial weight of the node is obtained by TF - IDF method. The experimental results on open Reuters21578 ToplO and 20Newsgroup datasets show that the improved algorithm could differentiate terms through weight significantly in order to distinguish the features in text.
关 键 词:词项权重 投票模型 迭代收敛 权值差异化 特征项区分
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.255.7