检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广西大学计算机与电子信息学院,广西南宁530004 [2]广西通信规划设计咨询有限公司,广西南宁530007
出 处:《计算机与现代化》2016年第6期68-72,共5页Computer and Modernization
基 金:国家自然科学基金资助项目(61363027);教育部人文社会科学研究规划基金资助项目(11YJAZH080)
摘 要:大间隔近邻算法(Large Margin Nearest Neighbor,LMNN)具有较强学习能力和泛化能力,在分类领域有广泛的应用。但将其用于大规模文本分类问题时,LMNN算法中的半定规划问题规模会随着数据规模增大而急剧膨胀,导致求解困难。针对此问题,引入胡贝尔损失函数把LMNN算法的半定优化模型分解为2个低阶的连续优化子模型,降低算法的计算复杂度,提高计算效率。在舆情分类数据集上的实验结果表明,本文算法与传统大间隔近邻算法相比,精度提高了4.5%,分类时间节省了47.1%,故采用分解降阶法来改进LMNN算法的性能是可行的,更适用于大规模文本分类。The large margin nearest neighbor algorithm has strong learning ability and generalization ability,which is widely used in the field of classification. But it will sink into difficulties when the semidefinite programming( SDP) scale of the LMNN algorithm expands rapidly as the data increasing used to solve the large- scale text classification problem. To solve this problem,we introduced the Huber loss function,which divided the Semidefinite Optimization Model of LMNN algorithm into two low- level continuous optimization sub- models,and finally reduced the computation complexity of the algorithm and improved its efficiency. The experimental results on the classification data set of public opinion show that the precision of the proposed algorithm was improved 4. 5%,and the classification time saved 47. 1% compared with the traditional one. It also can prove that adopting the low- level decomposition reduction method to improve the performance of the LMNN algorithm is feasible and more suitable for large- scale text classification.
关 键 词:半定规划 大间隔近邻 胡贝尔损失函数 大规模文本分类 泛化能力
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117