基于PCA和LDA方法的肿瘤基因表达谱数据分类  被引量:2

Classification of cancer gene expression profile based on PCA and LDA

在线阅读下载全文

作  者:李志文[1] 蔡先发[1,2] 韦佳[2] 周怡[1] 

机构地区:[1]广东药学院医药信息工程学院,广州510006 [2]华南理工大学计算机科学与工程学院,广州510006

出  处:《北京生物医学工程》2014年第1期47-51,共5页Beijing Biomedical Engineering

基  金:华南理工大学中央高校基本科研业务费专项资金(2009ZM0189)资助

摘  要:目的基因芯片技术对医学临床诊断、治疗、药物开发和筛选等技术的发展具有革命性的影响。针对高维医学数据降维困难及基因表达谱样本数据少、维度高、噪声大的特点,维数约减十分必要。基于主成分分析(principal component analysis,PCA)和线性判别分析(1inear discriminant analysis,LDA)方法,有效解决了基因表达谱数据分类问题,并提高了识别率。方法分别引入PCA和LDA方法对基因表达谱数据进行降维,然后用K近邻(K-nearest neighbor,KNN)作为分类器对数据进行分类,并分别在乳腺癌和卵巢癌质谱数据上。结果在两类癌症质谱数据上应用PCA和LDA方法能够有效提取分类特征信息,并在保持较高分类正确率的前提下大幅度降低医学数据的维数。结论利用维数约减的方法对癌症基因表达谱数据进行分类,可辅助临床医生发现新的疾病特征,提高疾病诊断的正确率。Objective Gene chip technology has a revolutionary influence on clinical diagnosis, treatment,drug development and screening. To resolve the difficulty of high medical data' s feature reduction and small sample, high dimensions and great noise of gene expression profile, feature reduction is extremely necessary. The experimental results demonstrate that principal component analysis (PCA) and linear discriminant analysis (LDA) classification methods can effectively resolve the problem of classification of gene expression profile while maintaining higher classification accuracy. Methods PCA and LDA methods were used to extract the features and reduce the dimensions, then K-nearest neighbor (KNN) was used as a classifier. Results The experimental results on breast cancer and ovarian cancer datasets demonstrated that PCA and LDA classification methods could effectively extract feature information and greatly reduce the dimensions of medical data while maintaining high classification accuracy. Conclusions The application of feature reduction methods for gene expression data classification of cancer can assist clinicians to discover new disease characteristics andimprove diagnosis accuracy.

关 键 词:主成分分析 线性判别分析 基因表达数据分类 维数约减 

分 类 号:R318.04[医药卫生—生物医学工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象