检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨医科大学卫生统计学教研室,150001
出 处:《中国卫生统计》2007年第2期151-154,共4页Chinese Journal of Health Statistics
基 金:国家自然科学基金资助(30371253);黑龙江省重点项目(GB04C30202)
摘 要:目的给出一种新的随机森林算法,它能在建模过程中自动对变量进行筛选,建立“最优”判断模型。方法采用变量重要性评分和逐步迭代算法选择有作用的变量;通过实际基因表达数据考核其应用效果,并使用R语言编程做模拟试验验证其有效性。结果三种疾病基因表达数据的判别模型,在包含很少量的基因情况下便获得了理想的分类效果;模拟试验则显示在类间区分度较大的情况下,随机森林逐步判别分析的效果明显,能有效地将有作用的变量保留在模型中,提高模型的判别效果;在类间区分度不够大的情况下分类效果提高不明显。结论随机森林逐步判别分析可以有效地应用于基因表达数据的基因筛选和分类研究,但要特别注意由随机波动对分析结果造成的影响。Objective We promote a new arithmetic of random forests, which selects variables automatically during the model formation and establishes the optimal discriminant model. Methods The arithmetic chooses variables based on the values of variable importance and the stepwise iteration. The method is applied to real gene datasets and we validate its effect using R language via simulated tests. Results The optimal discriminant model performs well in real gene expression data with small number of genes being selected. The simulated tests show that the higher the value of ROC area, the better results of discrimination Random Forests achieve, and the differential variables still remain in the model to promote the effect of classification. Conclusion The stepwise discriminant analysis of Random Forests can be effectively applied to the research of gene selection and classification. Meanwhile, we should pay close attention to the impact produced by random fluctuation on the results.
分 类 号:R195[医药卫生—卫生统计学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.228