检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张洁[1,2] 盛夏 张朋 秦威[1] 赵新明[1] ZHANG Jie;SHENG Xia;ZHANG Peng;QIN Wei;ZHAO Xinming(Institute of Intelligent Manufacturing and Information Engineering,Shanghai Jiao Tong University, Shanghai 200240;College of Mechanical Engineering, Donghua University, Shanghai 201620)
机构地区:[1]上海交通大学机械与动力工程学院,上海200240 [2]东华大学机械工程学院,上海201620
出 处:《机械工程学报》2019年第17期133-144,共12页Journal of Mechanical Engineering
基 金:国家自然科学基金资助项目(U1537110,51435009)
摘 要:现代化制造车间无时无刻不在产生大量数据,其中绝大部分以无标签结构化原始数据的形式存储在现代化制造企业的工业大数据平台中。这些制造数据一方面具有很大的潜在价值,另一方面因为其具有高噪声、高冗余性的特点,难以直接分析与利用。因此,针对制造过程原始数据的特点,以去除制造数据冗余性、挖掘原始数据局部结构为目的,提出一种两阶段无监督特征选择方法。该方法的第一阶段采用遗传算法产生的原始数据的低维子集作为径向基神经网络(Radial basis fuctionneural network, RBFNN)的输入,利用RBFNN复现原始数据的全部维度,并以降维率及复现精度作为遗传算法(Geneticalgorithm, GA)的适应度函数,通过GA多次迭代学习高维特征的低维表示,删除原始数据集中的冗余特征与噪声特征。第二阶段采用拉普拉斯特征得分(Laplacian score, LS)逐维评价剩余特征对于反映数据局部几何结构的作用,挖掘对改善分类性能更有效的特征。通过与LS等无监督特征选择算法对比,验证了提出的两阶段无监督特征选择方法能够有效降低制造数据的冗余性,并提高数据的分类性能。In a modernized manufacturing workshop, myriads of data are incessantly produced and a large part of those are stored in the industrial big data platform of the modem manufacturing enterprise in the form of structuralized unlabeled raw data. Those manufacturing data are of great latent exploitative value, because of their characteristics of high-noise and high-redundancy, however, direct analysis and utilization of them are impossible. Aiming at reducing the redundancy of manufacturing procedural data and excavating their local structure, a two-stage unsupervised feature selection method is proposed. In the first stage of the method, subset of the original feature set generated by genetic algorithm(GA) is utilized as the input features of radius basis function neural network(RBFNN), to reconstruct the unabridged original feature set. The ratio of dimensionality reduction and reconstructional accuracy are calculated jointly as the fitness function of GA, which is optimized by iteration to learn a low-dimensional representation of high-dimensional features, removing redundant and noisy features of the origin feature set. In the second stage, Laplacian score(LS) is employed to evaluate the locality preserving power of the remaining features, unearthing features which are prone to improving the performance of classification. By comparing with other unsupervised feature selection method, the method proposed here is proven more effective in reducing the redundancy of manufacturing data and simultaneously enhancing the performance of classification.
关 键 词:无监督特征选择 遗传算法 径向基神经网络 拉普拉斯得分 制造过程数据
分 类 号:TG156[金属学及工艺—热处理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.230.241