基于随机森林的中医数据可视化研究  被引量:4

Data Visualization of Traditional Chinese Medicine Based on Random Forest

在线阅读下载全文

作  者:王华珍[1] 彭淑娟[1] 缑锦[1] 陈锻生[1] 

机构地区:[1]华侨大学计算机科学与技术学院,厦门361021

出  处:《系统仿真学报》2014年第11期2751-2756,共6页Journal of System Simulation

基  金:国家自然科学基金项目(61202298);福建省自然科学基金(2012J01274);华侨大学高层次人才科研项目(09BS515);厦门市科技计划项目(3502Z20123032)

摘  要:中医诊疗研究引入机器学习方法存在交互性差和特征值离散性两大缺陷。引入基于随机森林(Random Forest,RF)的可视化技术,对原始数据进行基于RF的特征变换,使样本在新特征空间的类可分性增强;采用主坐标分析法对变换后的数据进行降维,将高维数据的关系信息变换到适合人类视觉认知的低维空间里;在低维空间里采用散点图和平行坐标图对数据进行可视化。在中医慢性胃炎数据集上的实验结果表明,通过RF处理后,各类数据聚集在不同的区域空间中,呈现良好的可分性。这些图形图像视觉信息能帮助用户准确理解数据集的分布规律以及隐含的发展趋势,进而深入探讨这些信息蕴含的中医学意义。Machine learning methods involved in Traditional Chinese Medicine (TCM) diagnosis suffered from poor interactivity and discrete features. A data visualization method based on Random Forest (RF) was proposed. The RF was used for feature transformation, which enhanced the good separation of the dataset in new feature space. The principal coordinate analysis was applied for dimension reduction in the new feature space, converting the information of high-dimensional data into low-dimensional space to fit the ability of human visual perception. Two methods of data visualization, including scattering and parallel coordinates, were applied to plot visually with a little of dimensions. The experimental results on chronic gastritis dataset show, owing to the reprocess of RF, each class examples of the dataset are casted into different local space and shows more structured distribution. All the visualization information based on graphic figures helps users figure out the implicit distribution and the changing trends of the dataset, which then facilitates the further study of the corresponding significance as for TCM.

关 键 词:可视化 随机森林 中医 慢性胃炎 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象