检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘云峰[1] 齐欢[1] Xiang’en Hu Zhiqiang Cai
机构地区:[1]华中科技大学系统工程研究所 [2]University of Memphis,Institute of Intelligent Systems,USA,Tennessee,Memphis,TN 38152
出 处:《中文信息学报》2005年第6期64-69,共6页Journal of Chinese Information Processing
摘 要:自从潜在语义分析方法诞生以来,被广泛应用于信息检索、文本分类、自动问答系统等领域中。潜在语义分析的一个重要过程是对词语文档矩阵作加权转换,加权函数直接影响潜在语义分析结果的优劣。本文首先总结了传统的、已成熟的权重计算方法,包括局部权重部分和词语全局权重部分,随后指出已有方法的不足之处,并对权重计算方法进行扩展,提出文档全局权重的概念。在最后的实验中,提出了一种新的检验潜在语义分析结果优劣的方法———文档自检索矩阵,实验结果证明改进后的权重计算方法提高了检索效率。Since the first paper about Latent Semantic Analysis (LSA) was published, LSA has been applied to many fields, such as information retrieval, text classification, automatic question answering, etc.. One important factor that affects the quality of LSA is the weighting scheme to the term - document matrix. In this paper, we first summarize the traditional and well - studied methods of weighting, including local weighting and global weighting. We then point out some inadequacy of original methods, modify these methods, and present the concept of global weighting of document. In the last part of this paper, we construct an experiment to compare the results of LSA with different types of weighting, in which we present a new measure to evaluate the result of LSA. We call this new measure self- indexing matrix. The result of the experiment confirms that the modified method of weighting can improve the efficiency of retrieval.
关 键 词:计算机应用 中文信息处理 潜在语义分析 权重 文档全局权重 文档自检索矩阵
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229