检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机工程与应用》2010年第28期135-137,159,共4页Computer Engineering and Applications
基 金:江苏省自然科学基金No.BK20080544~~
摘 要:研究了基于关键词倒排表的中文网页快速检索方法。在建立大量网页语料库的前提下,利用关键词词典和优化后的前向最大切词算法脱机生成网页关键词特征向量,然后对网页特征向量作维数压缩生成压缩格式的网页特征表,最后利用网页特征表根据关键词在所有网页中出现的频率统计生成关键词倒排文件。实验中,通过对比访问网页库、特征表和倒排文件三种不同的数据来源,分别实现了中文网页的关键词检索,比较了三种数据源检索的实时性。实验表明,基于关键词的倒排表检索算法大大优于其他两种方法,具有很好的实时性。The paper studies fast retrieval technique of Chinese webpage based on inverted Keywords.Under the premise of establishing a large of webpage corpus,the webpage keyword feature vectors are generated by using the keyword dictionary and the optimized forward largest segmentation algorithm in the status of offline.Then a compressed format of the webpage feature table is produced by dimension reducing on the feature vectors.Finally,an inverted keyword file is established according to the frequency of the keywords reference in all of the webpage and the webpage feature table.In the experiment,by contrastively accessing three data sources,namely the original webpage database,the feature table and the inverted file,the retrievals of the Chinese webpage keywords are implemented respectively,and comparison of the three retrieval methods are given on testing the real-time ability.The experiment shows that,the inverted file retrieval algorithm based on keywords is enormously superior on real-time to the other two methods.
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222