基于潜在语义索引的文本特征词权重计算方法  被引量:17

Text term weighting approach based on latent semantic indexing

在线阅读下载全文

作  者:李媛媛[1] 马永强[1] 

机构地区:[1]西南交通大学信息科学与技术学院,成都610031

出  处:《计算机应用》2008年第6期1460-1462,1466,共4页journal of Computer Applications

摘  要:潜在语义索引具有可计算性强,需要人参与少等优点。对其中重要的优化过程——权重计算,进行了深入分析。针对目前应用最广泛的TF-IDF方法中,采用线性处理的不合理性以及难以突出对文本内容起关键性作用的特征的缺点,提出了一种基于"Sigmiod函数"和"位置因子"的新权重方案。突出了文本中不同特征词的重要程度,更有利于潜在语义空间的构造。通过实验平台"中文潜在语义索引分析系统"的测试结果表明,该权重方法更利于基于潜在语义的检索性能的提高。Latent Semantic Indexing (LSI) is a new document retrieval model that has been developed during the last ten years. It is easy to compute and requires less human intervention. Term weighting, which is a difficult problem and of great importance in LSI, was studied in detail. In view of the most popular term weighting algorithms, TF-IDF, which is unreasonable to make use of linear and unable to emphasize the significance of key terms which contribute mainly to the content of a text, a new weighting design based on Sigmiod function and location factor was proposed. The new method highlights the importance of the different terms in documents and is in more favor of constructing the latent semantic space. It was tested in the experimental platform named "Chinese LSI Retrieval Analysis System", and the results show that the new method enhances the performance of LSI information retrieve.

关 键 词:潜在语义索引 Sigmiod函数 位置因子 权重算法 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象