潜在语义分析权重计算的改进  被引量:19

A Modified Weight Function in Latent Semantic Analysis

在线阅读下载全文

作  者:刘云峰[1] 齐欢[1] Xiang’en Hu Zhiqiang Cai 

机构地区:[1]华中科技大学系统工程研究所 [2]University of Memphis,Institute of Intelligent Systems,USA,Tennessee,Memphis,TN 38152

出  处:《中文信息学报》2005年第6期64-69,共6页Journal of Chinese Information Processing

摘  要:自从潜在语义分析方法诞生以来,被广泛应用于信息检索、文本分类、自动问答系统等领域中。潜在语义分析的一个重要过程是对词语文档矩阵作加权转换,加权函数直接影响潜在语义分析结果的优劣。本文首先总结了传统的、已成熟的权重计算方法,包括局部权重部分和词语全局权重部分,随后指出已有方法的不足之处,并对权重计算方法进行扩展,提出文档全局权重的概念。在最后的实验中,提出了一种新的检验潜在语义分析结果优劣的方法———文档自检索矩阵,实验结果证明改进后的权重计算方法提高了检索效率。Since the first paper about Latent Semantic Analysis (LSA) was published, LSA has been applied to many fields, such as information retrieval, text classification, automatic question answering, etc.. One important factor that affects the quality of LSA is the weighting scheme to the term - document matrix. In this paper, we first summarize the traditional and well - studied methods of weighting, including local weighting and global weighting. We then point out some inadequacy of original methods, modify these methods, and present the concept of global weighting of document. In the last part of this paper, we construct an experiment to compare the results of LSA with different types of weighting, in which we present a new measure to evaluate the result of LSA. We call this new measure self- indexing matrix. The result of the experiment confirms that the modified method of weighting can improve the efficiency of retrieval.

关 键 词:计算机应用 中文信息处理 潜在语义分析 权重 文档全局权重 文档自检索矩阵 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象