机构地区:[1]电子科技大学附属医院·四川省人民医院呼吸与危重症医学科,四川省成都市610072 [2]电子科技大学附属医院·四川省人民医院护理部,四川省成都市610072 [3]电子科技大学医学院,四川省成都市610072 [4]电子科技大学附属医院·四川省人民医院药学部,四川省成都市610072 [5]电子科技大学医学院,个体化药物治疗四川省重点实验室,四川省成都市610072
出 处:《中国全科医学》2022年第2期217-226,共10页Chinese General Practice
基 金:国家自然科学基金资助项目(72004020);干部保健科研课题川干研(2021-219)。
摘 要:背景气流受限程度是评价慢性阻塞性肺疾病(COPD)患者疾病进展的关键指标。然而由于检查禁忌、依从性等问题,导致部分患者难以开展相关检查,无法评价疾病严重程度。目的建立并评估基于机器学习算法的COPD患者重度气流受限风险预警模型。方法采用横断面设计调查2019年1月至2020年6月四川省某三甲医院的COPD住院患者,收集患者一般临床指标与肺功能检查数据。将数据按8∶2比例随机分为训练集和测试集,在训练集中使用4种缺失值填充方法、3种特征筛选方法、17种机器学习和1种集成学习算法构建216种风险预警模型。采用ROC曲线下面积(AUC)、准确率、精确率、召回率和F1值评价模型的预测性能,分别使用十折交叉验证法和Bootstrapping算法进行内部验证和外部验证。使用测试集数据进行模型测试和选择。使用后验法进行样本量验证。结果共纳入418例患者,其中212例(50.7%)患者存在重度以上气流受限风险。经4种缺失值处理和3种特征筛选后,共获得12个处理后的数据集及12种影响气流受限因素的重要性排序,结果显示,呼吸困难指数评分(mMRC)等级、年龄、体质指数(BMI)、吸烟史(有、无)、慢性阻塞性肺疾病评估表(CAT)评分、呼吸困难(有、无)在变量特征排序中居于前列,是构造模型的关键指标,对结果预测有重要作用。其中,采取不填充、Lasso筛选方法后,mMRC等级、吸烟史(有、无)、呼吸困难(有、无)为位居前3位的预测因子,mMRC等级占特征重要性的54.15%。使用不填充、Boruta筛选方法后,CAT评分、年龄、mMRC等级为位居前3位的预测因子,CAT评分占特征重要性的26.64%。使用17种机器学习和1个集成学习算法对12个数据集分别建模,共得216个预测模型。17种机器学习算法十折交叉验证结果显示,不同算法预测性能比较,差异有统计学意义(P<0.05),随机梯度下降算法的平均AUC最大,Background The degree of airflow limitation is a key indicator of the progression degree in COPD patients.However,problems such as contraindications to testing and compliance make it difficult for some patients to undergo the relevant tests and evaluate the severity of the disease.Objective To develop and evaluate a machine learning algorithm-based early warning model for the risk of severe airflow limitation in COPD patients.Methods A cross-sectional design was used to investigate COPD inpatients in a tertiary hospital in Sichuan Province from 2019-01 to 2020-06.General clinical indexes and pulmonary function test data were collected.The data were randomly divided into training and test sets in the ratio of 8∶2,and 216 risk warning models were constructed in the training set using four missing value filling methods,three feature screening methods,17 machine learning and one integrated learning algorithm.The area under the ROC curve(AUC),accuracy,precision,recall and F1 score were used to evaluate the predictive performance of the model;and the ten-fold cross-validation method and Bootstrapping were used for internal and external validation,respectively.The test set data was used for model testing and selection,the posterior method was used for sample size verification.Results A total of 418 patients were included,of which 212(50.7%)patients were at risk of severe airflow limitation.After four missing value treatments and three feature filters,a total of 12 processed datasets and the importance ranking of 12 factors affecting airflow limitation were obtained,and the results showed that modified medical research council dyspnea scale grade(mMRC),age,body mass index(BMI),smoking history(yes,no),chronic obstructive pulmonary disease assessment test(CAT)score,and dyspnea(yes,no)were at the forefront inthe ranking of variable features and were key indicators for constructing the model,which had an important role in predicting the outcome.Using unfilled,Lasso screening,mMRC grade,smoking history(yes,no),and dyspnea(y
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...