检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:鲁博仁 胡世哲 娄铮铮[1] 叶阳东[1] LU Bo-ren;HU Shi-zhe;LOU Zheng-zheng;YE Yang-dong(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出 处:《计算机科学》2021年第3期220-226,共7页Computer Science
基 金:国家重点研发计划课题基金项目(2018YFB1201403);国家自然科学青年基金项目(61502434)。
摘 要:铁路文本分类对于我国铁路事业的发展具有重要的实用意义。现有的中文文本特征提取方法依赖于事先对文本的分词处理,然而面向铁路文本数据进行分词的准确率不高,导致铁路文本的特征提取存在语义理解不充分、特征获取不全面等局限性。针对以上问题,提出了一种字符级特征提取方法CLW2V(Character Level-Word2Vec),有效地解决了铁路文本中专业词汇丰富且复杂度高所导致的问题。与基于词汇特征的TF-IDF和Word2Vec方法相比,基于字符特征的CLW2V方法能够提取更为精细的文本特征,解决了传统方法依赖事先分词而导致的特征提取效果不佳的问题。在铁路安监发牌数据集上进行的实验验证表明,面向铁路文本分类的CLW2V特征提取方法优于传统的依赖分词的TF-IDF和Word2Vec方法。Railway text classification is of great practical significance to the development of China’s railway industry.Existing Chinese text feature extraction methods rely on word segmentation in advance.However,due to the low accuracy of word segmentation for railway text data,the feature extraction of railway text has limitations such as inadequate semantic understanding and incomplete feature acquisition.In view of the above problems,a character-level feature extraction method,CLW2V(Character Le-vel-Word2Vec),is proposed,which effectively solves the problem caused by the rich and high complexity of professional vocabulary in railway texts.Compared with the TF-IDF and Word2Vec methods based on lexical features,the CLW2V method based on character features extracts more refined text features,which solves the problem of poor feature extraction effect caused by the dependence on presegmentation in traditional methods.Experimental verification is carried out on the data set of railway safety supervision and licensing,which shows that the CLW2V feature extraction method for railway text classification is superior to the traditional TF-IDF and Word2Vec methods that rely on word segmentation.
分 类 号:U229[交通运输工程—道路与铁道工程] TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200