检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨荣[1] 陈誉 高红梅[1] 陈先来[3,4] YANG Rong;CHEN Yu;GAO Hongmei;CHEN Xianlai(Xiangya Hospital,Central South University,Changsha 410078,China;Xiangya School of Medicine,Central South University,Changsha 410013,China;Information Security and Big Data Research Institute,Central South University,Changsha 410083,China;National Engineering Laboratory for Medical Big Data Application Technology,Central South University,Changsha 410083,China)
机构地区:[1]中南大学湘雅医院,湖南长沙410078 [2]中南大学湘雅医学院,湖南长沙410013 [3]中南大学信息安全与大数据研究院,湖南长沙410083 [4]中南大学医疗大数据应用技术国家工程实验室,湖南长沙410083
出 处:《中国医学物理学杂志》2019年第9期1095-1102,共8页Chinese Journal of Medical Physics
基 金:国家重点研发计划“精准医学研究”重点专项(2016YFC0901705);国家社会科学基金(13BTQ052)
摘 要:目的:利用临床数据,通过机器学习建立辅助筛选模型,以提高胃癌早期诊断水平。方法:以5585例胃癌(ICD编码为C16*,A组)患者为研究对象,并从57657例非胃部恶性肿瘤(ICD编码为C*,除C16*外)中随机选择6000例(B组),从47225例健康体检者中随机选择6000例非恶性肿瘤(C组),作为对照。从临床数据中抽取人口学(性别、年龄)、实验室检测(血常规检测、血脂/肝功能、肿瘤相关标志物、Hp等)等信息。利用Pearson相关性分析,对各指标与诊断之间的相关性进行分析。采用独立样本t检验,检测各指标的组间差异性。选择性别、年龄、癌胚抗原(CEA)、粪隐血(FOB)等53项指标作为决策变量,采用决策树算法C5.0,建立胃癌辅助筛查模型。结果:年龄、CEA、CA153等指标与胃癌显著相关(P<0.05)。在A组-B组、B组-C组、A组-C组中,存在组间差异性的指标不相同。通过数据挖掘,得到了包含51条规则的胃癌筛查模型。模型中重要性位于前10的指标依次为CA199、CA153、CEA等。对于训练集、测试集,模型的准确率分别为89.58%、89.14%,曲线下面积为0.809。结论:通过临床数据分析,可以确定胃癌早期诊断的重要指标。利用数据挖掘方法,基于临床数据可以建立胃癌筛查辅助模型,对于胃癌筛查具有良好的辅助价值。Objective To establish an auxiliary screening model based on clinical data and machine learning for improving the early diagnosis of gastric cancer.Methods A total of 5 585 cases of gastric cancer(ICD code:C16*,group A)were selected as research subjects.In addition,6 000 cases(group B)from 57 657 cases of non-gastric malignant tumors(ICD code:C*,except C16*)and 6 000 cases of non-malignant tumors(group C)from 47 225 healthy persons were randomly selected as controls.Demographical information(gender,age),laboratory tests(routine blood test,blood lipid/liver function,tumor-related markers,Hp,etc.)were extracted from clinical data.Pearson's correlation analysis was used to analyze the relationship between each indicator and diagnosis;and independent sample t test was performed for detecting the differences in indicators among different groups.Atotal of 53 indicators such as gender,age,carcinoembryonic antigen(CEA),fecal occult blood were selected as decision variables.An auxiliary model was established for gastric cancer screening by decision tree algorithm C5.0.Results The indicators such as age,CEA and CA153 were significantly correlated with gastric cancer(P<0.05).For the inter-group of group A and B,group B and C,group Aand C,the indicators with inter-group differences were different.Amodel with 51 rules for gastric cancer screening was obtained by data mining.The top 10 indicators ranked by importance in the model were as follow:CA199,CA153,CEA,etc.The accuracy of the model was 89.58%for training set and 89.14%for test set.The area under curve was 0.809 for the model.Conclusion Through the analysis of clinical data,the important indicators for the early diagnosis of gastric cancer can be determined.An auxiliary model for gastric cancer screening can be established based on clinical data using data mining.The established model has excellent assistant value for gastric cancer screening.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.143