检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西南交通大学信息科学与技术学院,成都610031
出 处:《计算机应用》2008年第6期1460-1462,1466,共4页journal of Computer Applications
摘 要:潜在语义索引具有可计算性强,需要人参与少等优点。对其中重要的优化过程——权重计算,进行了深入分析。针对目前应用最广泛的TF-IDF方法中,采用线性处理的不合理性以及难以突出对文本内容起关键性作用的特征的缺点,提出了一种基于"Sigmiod函数"和"位置因子"的新权重方案。突出了文本中不同特征词的重要程度,更有利于潜在语义空间的构造。通过实验平台"中文潜在语义索引分析系统"的测试结果表明,该权重方法更利于基于潜在语义的检索性能的提高。Latent Semantic Indexing (LSI) is a new document retrieval model that has been developed during the last ten years. It is easy to compute and requires less human intervention. Term weighting, which is a difficult problem and of great importance in LSI, was studied in detail. In view of the most popular term weighting algorithms, TF-IDF, which is unreasonable to make use of linear and unable to emphasize the significance of key terms which contribute mainly to the content of a text, a new weighting design based on Sigmiod function and location factor was proposed. The new method highlights the importance of the different terms in documents and is in more favor of constructing the latent semantic space. It was tested in the experimental platform named "Chinese LSI Retrieval Analysis System", and the results show that the new method enhances the performance of LSI information retrieve.
关 键 词:潜在语义索引 Sigmiod函数 位置因子 权重算法
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229