基于自动查询扩展的专利文档检索方法  被引量:2

A patent retrieval method based on automatic query expansion

在线阅读下载全文

作  者:羊帅 王锋[1] 林兰芬[1] 朱晓伟[1] 谢非[1] 

机构地区:[1]浙江大学计算机科学与技术学院,杭州310027

出  处:《中国科技论文》2013年第10期1057-1063,共7页China Sciencepaper

基  金:高等学校博士学科点专项科研基金资助项目(20110101110065);浙江省创新团队计划资助项目(2009R50015)

摘  要:针对现有专利检索中的用户意图理解及查询扩展不足问题,提出了一种基于自动查询扩展的专利文档检索方法。首先结合专利文档特点,采用基于改进TF-IDF公式的专利领域词表提取方法,构建专利领域词表。在检索阶段,对查询输入串进行分析得到查询关键词汇,同领域词表相结合,确定查询所在领域及查询扩展难度。利用基于伪相关反馈的自动查询扩展技术,根据伪相关文档的术语分布差异分析,生成查询扩展项并排序,最后将扩展项与原始查询条件相结合,重新组成查询条件,完成专利查询。实验结果表明,该方法具有较高的召回率和平均准确率。Existing patent retrieval methods cannot effectively capture user's query intents due to the lack in query expansion. To solve this problem, we propose a novel patent retrieval method based on automatic query expansion. Considering the characteris- tics of patent documents, an improved TF-IDF scheme is first adopted to extract patent domain terms and build the domain vocab- ularies. At the retrieval stage, query inputs are analyzed to extract key words, and then the field of query and the difficulty of query expansion are determined based on domain vocabularies. Furthermore, according to the term distribution variation analysis on pseudo related documents, the pseudo relevance feedback (PRF)-based automatic query expansion techniques are utilized to generate and rank the candidate expansion terms. At last, the expansion terms are combined with original query conditions to compose the final query conditions for searching. The comparative experiment results show that our method achieves better recall and average precision.

关 键 词:人工智能 专利检索 领域词表 查询扩展 伪相关反馈 

分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象