检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京大学计算机研究所 [2]北京大学文字信息处理技术国家重点实验室,北京100871
出 处:《软件学报》2006年第5期991-1000,共10页Journal of Software
摘 要:XML文档作为一种新的数据形式,成为当前的研究热点.XML文档间相似度的计算是XML文档分析、管理及文本挖掘的基础.结构链接向量模型(structuredlinkvectormodel,简称SLVM)是一种综合考虑XML文档结构信息与内容信息进行XML文档相似度量的方法.体现XML文档结构单元关系的核矩阵在结构链接向量模型中扮演着重要角色.为自动捕获XML文档结构单元关系,提出了两种核矩阵的学习算法,分别是基于支持向量机(supportvectormachine,简称SVM)的回归学习算法和基于矩阵迭代的学习算法.相似搜索实验对比结果表明,基于核矩阵学习方法的XML文档相似度量方法的准确性明显优于其他方法.进一步实验表明,基于矩阵迭代学习的核矩阵学习算法与基于支持向量机的回归学习算法相比,不仅具有更高的准确性,而且所需训练文档更少、计算代价更小.XML document as a new data model has been analyses, management and text mining for XML documents a hot research area. Similarity measure is a basic of Structured Link Vector Model (SLVM) is a document model for the XML documents' similarity measure based on both the content and structure. The kernel matrix, which describes the relations between the structure units, plays an important role in the SLVM, In the paper, two algorithms are derived to learn the kernel matrix for capturing the relations between the structure units: one is based on the support vector machine and the other is based on matrix iterative analysis, For the performance evaluation, the proposed similarity measure is applied to similarity search. The experimental results show that the similarity measure based on kernel matrix learning outperform significantly the traditional measures. Furthermore, comparing with the kernel matrix leaning algorithm based on the support vector machine (SVM)'s regression, the kernel matrix leaning algorithms based on matrix iterative analysis not only acquires higher precision but also needs less training documents and cost.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.27