基于属性加权的独依赖条件概率编码方法  

One Dependence Conditional Probability Encoding Method Based on Attribute Weighting

在线阅读下载全文

作  者:梁祖鹏 李秋德 胡思贵 

机构地区:[1]贵州大学数学与统计学院,贵州 贵阳 [2]贵州医科大学生物与工程学院,贵州 贵阳

出  处:《运筹与模糊学》2023年第1期74-87,共14页Operations Research and Fuzziology

摘  要:包含分类属性和数值属性的混合数据广泛存在于真实世界采集的数据或实验数据,在挖掘或分析这类数据前,通常需要将它们处理(转换/嵌入/表示/编码)为高质量的数值数据。条件概率编码方法(以属性条件独立假设为前提)在大多数情况下能取得不错的性能,但当它面对具有强属性关联的数据集时,性能并不理想。受独依赖值差度量的启发,将放宽属性条件独立的构想应用于条件概率编码方法。此外,还利用属性加权法来优化编码后的数据质量。融合上述这些方法,我们为混合数据的分类编码提出了一个属性加权的独依赖条件概率编码方法。实验结果表明,我们的编码方法可以显著性提高数据转换的质量,从而增强后续数据分析算法的性能。Mixed data containing categorical and numerical attributes are widely available in real-world or experimental data sets. Before mining or analyzing such data, it is typically necessary to process (transform/embed/represent) them into high-quality numerical data. Conditional probability transformation method (which is premised on the attribute conditional independence assumption) can provide acceptable performance in the majority of cases, but it is not satisfactory for data sets with strong attribute association. Inspired by the one dependence value difference metric method, the concept of relaxing the attributes conditional independence is applied to the conditional probability transformation method. In addition, an attribute weighting method is designed to optimize the quality of data encoding. Combining these methods, we propose an Attribute Weighted One Dependence Conditional Probability Encoding method for categorical encoding on mixed data. Extensive experimental results demonstrate that our method can significantly boost the quality of data encoding, hence enhancing the performance of subsequent data analysis algorithms.

关 键 词:混合数据分类 条件概率编码 独依赖值差度量 属性加权 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象