检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]河北大学计算机科学与技术学院,河北保定071002
出 处:《河北大学学报(自然科学版)》2017年第6期652-661,共10页Journal of Hebei University(Natural Science Edition)
基 金:国家自然科学基金资助项目(61375075);河北省教育厅河北省高等学校科学技术研究重点项目(ZD2017208)
摘 要:现有全文检索技术多是以文本信息为处理对象,对于以数学表达式为主要成分的科技文档检索还处在探索阶段.为了使用户可以方便地以数学公式作为查询语言对科技文档进行检索,提出了一种基于数学表达式特征的科技文档检索模型.首先通过将公式解析为二叉树得到数学表达式的子式信息,利用数学表达式及子式构造检索特征向量;在索引阶段,利用所提取的文档特征向量构建分层结构的索引表;在匹配阶段,对文档向量采用tf-idf进行加权操作,利用余弦相似度对检索向量和文档向量进行相似度计算,得到一个有序的文档检索结果.实验选取了来自不同领域的期刊、学术网站以及公共数据集的5 017篇科技文档,其中包含了96 362条数学公式,平均检索时间为0.428s,表明该模型达到了实现较高效率科技文档检索的目标.The existing full-text retrieval techniques are mostly targeting the text information.While the retrieval of the scientific documents with mathematical expressions as the main components is still in the exploration stage.In order to make the users can easily use the mathematical formula as the query language to retrieve the scientific and technical documents,a new scientific document retrieval model based on mathematical expression features was proposed.Firstly,the formulas were resolved into the subformulas forming the binary trees,which are used to generate the retrieval feature vectors.In the index phase,the index table with the hierarchical structure was constructed using the extracted document feature vectors.In the retrieval phase,the document vectors were weighted by tf-idf.The similarity between the retrieval vector and the document vector was calculated by using the cosine similarity,and an ordered document retrieval result was obtained.The experiment data was selected from different journals,academic website and public data set of 5 017 science and technology documents which contain 96 362 mathematical formulas.The average retrieval time was 0.428 s,which indicates that the proposed model achieved the goal of realizing mathematical expression retrieval with high efficiency.
关 键 词:科技文档 数学表达式 检索 索引 匹配 二叉树 特征
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145