检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]桂林理工大学信息科学与工程学院,广西桂林541004 [2]桂林理工大学理学院,广西桂林541004
出 处:《计算机应用》2012年第8期2250-2252,2257,共4页journal of Computer Applications
摘 要:为解决互信息(MI)在特征选取中的类别缺失和倾向低频词问题,提出LDA-σ方法。该方法使用潜在狄利克雷分配模型(LDA)提取潜在主题,以"词—主题"间互信息的标准差作为特征评估函数。在Reuters-21578语料集上提取特征词并进行分类,LDA-σ方法的微平均F1最高达0.9096;宏平均F1优于其他算法,最高达0.7823。实验表明,LDA-σ方法可用于文本特征选取。To solve the category-deficiency and the tendency of selecting low-frequency words in feature selection process based on Mutual Information (MI), the method named LDA-σ was presented. Firstly, the latent topics were extracted by the Latent Diriehlet Allocation (LDA) model, and then the standard deviation of "Word-Topic" MI was calculated as the feature evaluation function. When conducting feature selection and categorization in Reuters-21578, the micro average F1 of LDA-σ reached up to 0. 909 6, and the highest macro average FI of LDA-σ was 0. 782 3, which were higher than that of other algorithms. The experimental results indicate that LDA-σ can be applied to feature selection in text sets.
关 键 词:潜在狄利克雷分配模型 互信息 评价函数
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28