基因表达数据的随机森林逐步判别分析方法被引量：14

The Stepwise Discriminant Analysis of Random Forests Used in Gene Expression Data

出　　处：《中国卫生统计》2007年第2期151-154,共4页Chinese Journal of Health Statistics

基　　金：国家自然科学基金资助(30371253);黑龙江省重点项目(GB04C30202)

摘　　要：目的给出一种新的随机森林算法,它能在建模过程中自动对变量进行筛选,建立“最优”判断模型。方法采用变量重要性评分和逐步迭代算法选择有作用的变量;通过实际基因表达数据考核其应用效果,并使用R语言编程做模拟试验验证其有效性。结果三种疾病基因表达数据的判别模型,在包含很少量的基因情况下便获得了理想的分类效果;模拟试验则显示在类间区分度较大的情况下,随机森林逐步判别分析的效果明显,能有效地将有作用的变量保留在模型中,提高模型的判别效果;在类间区分度不够大的情况下分类效果提高不明显。结论随机森林逐步判别分析可以有效地应用于基因表达数据的基因筛选和分类研究,但要特别注意由随机波动对分析结果造成的影响。Objective We promote a new arithmetic of random forests, which selects variables automatically during the model formation and establishes the optimal discriminant model. Methods The arithmetic chooses variables based on the values of variable importance and the stepwise iteration. The method is applied to real gene datasets and we validate its effect using R language via simulated tests. Results The optimal discriminant model performs well in real gene expression data with small number of genes being selected. The simulated tests show that the higher the value of ROC area, the better results of discrimination Random Forests achieve, and the differential variables still remain in the model to promote the effect of classification. Conclusion The stepwise discriminant analysis of Random Forests can be effectively applied to the research of gene selection and classification. Meanwhile, we should pay close attention to the impact produced by random fluctuation on the results.

关键词：随机森林基因表达数据判别分析基因筛选

分类号：R195[医药卫生—卫生统计学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基因表达数据的随机森林逐步判别分析方法被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基因表达数据的随机森林逐步判别分析方法 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基因表达数据的随机森林逐步判别分析方法被引量：14