一种大规模文本分类大间隔近邻算法  被引量:1

A Large Margin Nearest Neighbor Algorithm of Large- scale Text Classification

在线阅读下载全文

作  者:朱茜[1] 覃华[1] 冯志新 陈晨[1] 

机构地区:[1]广西大学计算机与电子信息学院,广西南宁530004 [2]广西通信规划设计咨询有限公司,广西南宁530007

出  处:《计算机与现代化》2016年第6期68-72,共5页Computer and Modernization

基  金:国家自然科学基金资助项目(61363027);教育部人文社会科学研究规划基金资助项目(11YJAZH080)

摘  要:大间隔近邻算法(Large Margin Nearest Neighbor,LMNN)具有较强学习能力和泛化能力,在分类领域有广泛的应用。但将其用于大规模文本分类问题时,LMNN算法中的半定规划问题规模会随着数据规模增大而急剧膨胀,导致求解困难。针对此问题,引入胡贝尔损失函数把LMNN算法的半定优化模型分解为2个低阶的连续优化子模型,降低算法的计算复杂度,提高计算效率。在舆情分类数据集上的实验结果表明,本文算法与传统大间隔近邻算法相比,精度提高了4.5%,分类时间节省了47.1%,故采用分解降阶法来改进LMNN算法的性能是可行的,更适用于大规模文本分类。The large margin nearest neighbor algorithm has strong learning ability and generalization ability,which is widely used in the field of classification. But it will sink into difficulties when the semidefinite programming( SDP) scale of the LMNN algorithm expands rapidly as the data increasing used to solve the large- scale text classification problem. To solve this problem,we introduced the Huber loss function,which divided the Semidefinite Optimization Model of LMNN algorithm into two low- level continuous optimization sub- models,and finally reduced the computation complexity of the algorithm and improved its efficiency. The experimental results on the classification data set of public opinion show that the precision of the proposed algorithm was improved 4. 5%,and the classification time saved 47. 1% compared with the traditional one. It also can prove that adopting the low- level decomposition reduction method to improve the performance of the LMNN algorithm is feasible and more suitable for large- scale text classification.

关 键 词:半定规划 大间隔近邻 胡贝尔损失函数 大规模文本分类 泛化能力 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象