检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孟鑫淼 MENG Xinmiao(H3C Research Institute of Big Data,Zhengzhou 450001,China)
出 处:《现代电子技术》2020年第17期126-129,共4页Modern Electronics Technique
摘 要:文本数据具有规模大、特征维数高等特点,当前文本分类方法无法刻画文本变化特点,使得文本分类正确率低、误差大、分类时间长,为了获得理想的文本分类效果,设计基于大数据挖掘技术的文本分类方法。首先对当前文本分类的研究进展进行分析,找出导致当前文本分类效果差的原因;然后,提取文本分类原始特征,并引入核主成分分析算法对原始特征进行处理,降低特征维数,简化文本分类器的结构;最后,采用大数据挖掘技术构建文本分类器,并与其他文本分类方法进行对比测试。测试结果表明,所提方法可以更好地描述文本变化特点,能够对各种类型文本进行准确识别和分类,文本分类精度超过95%,明显高于当前其他文本分类方法,并且所提方法的文本分类时间显著减少,具有更好的文本分类效果。Text data are of characteristics of large scale and high feature dimension. The current text classification methods fail to depict the characteristics of text change,which results in low accuracy,large error and long duration of the classification.In order to get an ideal text classification effect,a text classification method based on big data mining technology is designed.The current research progress of text classification is analyzed to find out the reasons for the poor effect of current text classification. And then,the original features of text classification are extracted,and the kernel principal component analysis(KPCA)algorithm is introduced to process the original features,reduce the feature dimension and simplify the structure of text categorizer. Finally,the text categorizer is constructed with big data mining technology and compared with other text classifiers.The results of contrastive test show that the proposed method can better describe the characteristics of text change,and accurately recognize and classify various types of texts. The accuracy of text classification of the proposed method is above 95%,which is significantly higher than other current text classification methods. Moreover,the classification duration is significantly reduced and the classification effect is better.
关 键 词:大规模文本数据 高维特征 大数据挖掘技术 文本分类器 分类精度 分类时间
分 类 号:TN911.1-34[电子电信—通信与信息系统] TP391.9[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.62