检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈平[1] 匡尧 胡景懿 王向阳 蔡静[1] CHEN Ping;KUANG Yao;HU Jingyi;WANG Xiangyang;CAI Jing(Department of Construction and Management,Wuhan Electric Power Technical College,Wuhan Hubei 430079,China;Department of Audit,State Grid Hubei Electric Power Company Limited,Wuhan Hubei 430072,China)
机构地区:[1]武汉电力职业技术学院建设及管理系,武汉430079 [2]国网湖北省电力有限公司审计部,武汉430072
出 处:《计算机应用》2020年第S01期109-112,共4页journal of Computer Applications
基 金:国网湖北省电力有限公司科学技术项目(SGHBJP00JGJS1900026)。
摘 要:针对电力审计领域的文本具有行业特征明显、文本特征相似度高、分类边界模糊的特性,提出了增强领域特征的电力审计文本分类方法。首先构建面向电力审计的专业词典,提出EF-Doc2VecC模型再联合专业词典增强文本的特征,最后送入BiLSTM分类器实现专业领域的文本分类。实验结果表明,针对专业性显著的电力审计类文本分类,EF-Doc2Vec模型,在召回率、特异性、准确率和F1值分类指标上比对照模型Doc2VecC分别高出4,2,2,2个百分点;针对通用领域文本分类,EF-Doc2VecC模型在召回率、差异性、准确率和F1值分类指标上比对照模型Doc2VecC高出3,3,4,4个百分点。另外,EF-Doc2VecC模型在电力审计类的文本分类性能分别比通用领域高出4,5,3,3个百分点。因此,提出的文本向量表示方法及文本分类方法,不仅能提升通用领域的文本分类性能,还能显著提升垂直领域的文本细粒度分类性能。For that texts in power audit field has features with obvious industry characteristics,high text feature similarity,and fuzzy classification boundaries,a power audit text classification method with enhanced domain features was proposed. Firstly,a professional dictionary in power audit field was built and the EF-Doc2VecC model was proposed and combined with the professional dictionary text feature to obtain the enhanced feature text. The experimental results show that for the text classification of power audit with significant specialty,the EF-Doc2VecC model is 4,2,2,and 2 percentage points higher than the general domain text in terms of recall,sensitivity,precision,and F1 value classification index. In the general field,the EF-Doc2VecC model is 3,3,4 and 4 percentage points higher than the comparison method Doc2VecC in those evaluation indexes. In addition,comparing the classification performance of this method in vertical and general domains,the text classification performance in power audit field is 4,5,3 and 3 percentage points higher than that in general domain,respectively. Therefore,the text vector representation method and text classification method proposed in this paper can not only improve the text classification performance in the general field,but also significantly improve the finegrained classification performance in the vertical field.
关 键 词:电力审计 文本分类 增强特征 Doc2VecC 双向长短期记忆模型
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117