基于模糊路径匹配的XML文档分类研究

RESEARCH ON XML CLASSIFICATION BASED ON FUZZY PATH MATCHING

出　　处：《计算机应用与软件》2015年第10期113-115,126,共4页Computer Applications and Software

基　　金：云南省教育厅基金项目(2011Y010)

摘　　要：XML是互联网上信息表示和数据交互的重要标准,文档分类是解决从海量信息中获取有效信息的重要方法,提出一种基于模糊路径匹配的XML文档分类方法。首先去除对分类没有影响的信息;然后采用一种混合的XML文档相似性计算方法,将XML文档表示为路径的集合。为了提高效率,删除了文档中重复出现的路径后进行模糊匹配,用匈牙利算法计算出文档间的相似度;最后使用改进的K近邻算法进行文档的分类。使用自动生成及真实的文档集进行实验,结果表明:两组文档分类的正确率均可以达到100%。XML is an important standard of information representation and data exchange over Internet,document classification is an important way to get useful information from mass of information solutions,in this paper we propose a method of XML document classification which is based on fuzzy matching path.First,it removes the information that has no influence on the classification;Then it uses a mixed computation method of XML document similarity,expresses the XML document as a collection of paths;In order to improve the efficiency, the method deletes the recurring paths in the document and carries out fuzzy matching,and employs Hungarian algorithm to calculate the similarity between documents;Finally it uses the improved k-nearest neighbour algorithm to classify documents.The automatically generated documentation sets and real data sets are used in the experiment,and results show the accuracy of document classification in both sets could all reach 100%.

关键词：XML 分类相似性路径语义

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模糊路径匹配的XML文档分类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模糊路径匹配的XML文档分类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索