检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]安徽大学计算机科学与技术学院,安徽合肥230601
出 处:《计算机技术与发展》2014年第1期133-135,共3页Computer Technology and Development
基 金:安徽省自然科学基金资助项目(11040606M133)
摘 要:传统的搜索引擎返回的数据太过庞大,很多情况下用户不能快速地找到自己要的答案。在这种情况下,文中引入FAQ系统。FAQ中如何找到最佳匹配答案,是文中的研究重点。改进了传统的VSM模型,使得它能更好地体现问题中词的权重。重点引入了LDA模型,并用计算机故障领域内的文档资料对它进行训练,得到主题-词的概率分布。通过主题-词中词的概率分布,计算词与词的相关度,提出通过词与词间相关度计算句子与句子间相似度的算法。对两个算法进行综合,得到最终的相似度算法。文中对FAQ进行整理,得到了FAQ问答系统的雏形。通过实验分析,说明相似度算法有很好的效果。The data returned by the traditional search engine is too large, users cannot quickly find the answer they want sometimes. In this case,introduce FAQ system. How to find the best match in the FAQ system is the focus. An improved VSM model is presented in this pa- per. This new model is used in order to reflect the weight of the terms in question better. LDA, which was trained with documentation within the domain of computer malfunction generates a probability distribution of topic-term by which the relevance between words is calculated. Then the algorithm of calculating similarity between sentences by calculating relevance between words was presented. Com- bined with the above two algorithm, get the final similarity algorithm. FAQ is collected and rudiment of FAQ answering system is imple- mented in this paper. The algorithm used is proved well by the experiments.
关 键 词:VSM 相似度计算 LDA(Latent DIRICHLET Allocation) 主题-词分布
分 类 号:TP31[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28