检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王树林[1] 王戟[1] 陈火旺[1] 李树涛[2] 张波云[1]
机构地区:[1]国防科技大学计算机学院 [2]湖南大学电气与工程学院,长沙410082
出 处:《计算机学报》2008年第4期636-649,共14页Chinese Journal of Computers
基 金:湖南省自然科学杰出青年基金(06JJ1010)资助~~
摘 要:基于基因表达谱的肿瘤检测方法有望成为临床医学上一种快速而有效的肿瘤分子诊断方法,但由于基因表达谱数据存在维数过高、样本量很小以及噪音很大等特点,使得肿瘤信息基因选择成为一件有挑战性的工作.根据肿瘤基因表达谱样本集的特点,提出了一种以支持向量机分类性能为评估准则的寻找信息基因的启发式宽度优先搜索算法,其优点是能够同时搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集.实验采用了3种肿瘤样本集以验证新算法的可行性和有效性,对于急性白血病、难以分类的结肠癌和多肿瘤亚型的小圆蓝细胞瘤样本集,分别只需2,4和4个信息基因就能获得100%的4-折交叉验证识别准确率.与其它优秀的肿瘤分类方法相比,实验结果在信息基因数量及其分类性能方面具有明显的优越性.为避免样本集的不同划分对分类性能的影响,提出了一种能够更加客观地反映信息基因子集分类性能的全折交叉验证评估方法.The tumor diagnosis method based on gene expression profiles will be developed into the fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with huge amount of gene expression data, only a few of genes are related to tumor in gene expression profiles. Moreover, it is difficult to select informative genes related to tumor from gene expression profiles because of its characteristics such as high dimensionality, small sample set and many noises in gene expression profiles. According to its characteristic, a novel heuristic breadth-first search algorithm based on support vector machines is proposed, which can simultaneously find as many informative gene subsets as possible in which the number of informative genes is almost least but its classification performance is almost highest in spite of its time-consuming characteristic. Three tumor sample sets are examined by the novel approach and experiments show that the novel approach is feasible and effective in tumor classification. Experiment results show that 100% of 4-fold cross-validation accuracy has been achieved by only two, four and four genes for leukemia, colon tumor and SRBCT (Small Round Blue Cells Tumor) datasets, respectively, which is superior to the results of other tumor classification methods. To avoid the affect of different partition of sample set, the full-fold cross-validated method that can more objectively evaluate the classification performance of informative gene subset is proposed.
关 键 词:基因表达谱 肿瘤分类 信息基因选择 支持向量机 全折交叉验证方法
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222