检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]贵州大学数学与统计学院,贵州 贵阳 [2]贵州医科大学生物与工程学院,贵州 贵阳
出 处:《运筹与模糊学》2023年第1期74-87,共14页Operations Research and Fuzziology
摘 要:包含分类属性和数值属性的混合数据广泛存在于真实世界采集的数据或实验数据,在挖掘或分析这类数据前,通常需要将它们处理(转换/嵌入/表示/编码)为高质量的数值数据。条件概率编码方法(以属性条件独立假设为前提)在大多数情况下能取得不错的性能,但当它面对具有强属性关联的数据集时,性能并不理想。受独依赖值差度量的启发,将放宽属性条件独立的构想应用于条件概率编码方法。此外,还利用属性加权法来优化编码后的数据质量。融合上述这些方法,我们为混合数据的分类编码提出了一个属性加权的独依赖条件概率编码方法。实验结果表明,我们的编码方法可以显著性提高数据转换的质量,从而增强后续数据分析算法的性能。Mixed data containing categorical and numerical attributes are widely available in real-world or experimental data sets. Before mining or analyzing such data, it is typically necessary to process (transform/embed/represent) them into high-quality numerical data. Conditional probability transformation method (which is premised on the attribute conditional independence assumption) can provide acceptable performance in the majority of cases, but it is not satisfactory for data sets with strong attribute association. Inspired by the one dependence value difference metric method, the concept of relaxing the attributes conditional independence is applied to the conditional probability transformation method. In addition, an attribute weighting method is designed to optimize the quality of data encoding. Combining these methods, we propose an Attribute Weighted One Dependence Conditional Probability Encoding method for categorical encoding on mixed data. Extensive experimental results demonstrate that our method can significantly boost the quality of data encoding, hence enhancing the performance of subsequent data analysis algorithms.
关 键 词:混合数据分类 条件概率编码 独依赖值差度量 属性加权
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49