机构地区:[1]中南民族大学生物医学工程学院,武汉430074 [2]认知科学国家民委重点实验室,武汉430074 [3]医学信息分析及肿瘤诊疗湖北省重点实验室,武汉430074 [4]武汉理工大学信息工程学院,武汉430070
出 处:《分析化学》2022年第9期1415-1424,I0015-I0019,共15页Chinese Journal of Analytical Chemistry
基 金:国家自然科学基金项目(Nos.61501526,61178087);中南民族大学中央高校基本科研业务费专项资金项目(No.CZQ22006)资助。
摘 要:血红蛋白浓度是人体的一项重要生理指标,其异常将导致多种疾病。近红外光谱分析技术可以快速、无试剂检测人体中血红蛋白的含量。但是,红外光谱重叠严重、有效信息弱、易受外界噪声干扰,因此通常需要对光谱数据进行数据集划分和预处理,建立定量模型,以去除干扰信息对预测模型的不良影响。如何选择最佳划分方法、最佳划分比例和最佳预处理方法仍是一个问题。针对此问题,本研究以190份不同浓度血红蛋白血液样本和150份不同浓度血红蛋白仿体溶液样本的近红外光谱数据为研究对象,研究等间隔划分法、K_S法(Kennard Stone)、SPXY法(Sample set partitioning based on joint x-y distances method)以及双向算法(Duplex)在41种不同划分比例下偏最小二乘(Partial least squares,PLS)模型的预测能力;将小波变换(Wavelet transform,WT)、标准正态变量变换(Standard normal variate,SNV)、直接正交信号校正(Direct orthogonal signal correction,DOSC)、S_G(Savitzky Golay)一阶求导这4种单独预处理方法(考虑顺序)组成65种预处理方法组合,研究这65种预处理组合对PLS定量分析模型预测精度的影响。实验结果表明:两种数据集的PLS模型最优数据集划分方法均为SPXY法,血液样本最佳划分比例为0.48,仿体溶液最佳划分比例为0.90。65种预处理方法中,血液样本的最佳预处理组合为S_G1+WT,此时预测集相关系数(Correlation coefficient of prediction set,R_(p))为0.9808,预测集均方根误差(Root mean square error of prediction set,RMSEP)为0.2701;仿体溶液样本的最佳预处理组合为SNV+WT,此时R_(p)为0.9952,RMSEP为3.8154。预处理组合时,两种算法叠加的效果最好。本研究结果为此类光谱数据的处理提供了一种新的思路和方法。Hemoglobin is an important physiological index of human body.Abnormal concentration of hemoglobin will lead to various diseases.Near infrared spectroscopy can be use to detect hemoglobin content in human body quickly and without reagent.However,the infrared spectrum overlaps seriously,the effective information is low,and it is vulnerable to external noise.Therefore,it is usually necessary to divide and pretreat the spectral data,and then establish quantitative model,so as to remove the adverse effects of interference information on the prediction model.But how to choose the best partition method and the best partition proportion and how to choose the best pretreatment methods are still problems.To solve these issues,by taking the spectral data of 190 blood samples with different concentrations of hemoglobin and 150 imitation solution samples with different concentrations of hemoglobin as the research object,partial least squares(PLS)model predictability with different dataset partitioning methods including equal interval division method,kennard stone(K_S),sample set partitioning based on joint X-Y distances method(SPXY)and duplex algorithm(Duplex)under 41 different partitioning proportions were studied in this work.Pretreatments including wavelet transform(WT),standard normal variable(SNV),direct orthogonal signal correction(DOSC),and S_G(savitzky Golay)first-order derivation form 65 pretreatment combinations(considering order),and the influence of these 65 pretreatment combinations on the prediction accuracy of PLS quantitative analysis model were studied.Experimental results indicated that the optimal dataset partitioning method of PLS model of the two datasets was SPXY method,in which the optimal division proportion of blood sample was 0.48,and the optimal division proportion of imitation solution was 0.90.Among the 65 pretreatment methods,the best pretreatment combination of blood samples was S_G1+WT,in which the correlation coefficient of prediction set(R_(p))was 0.9808,and the root mean square error of pred
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...