检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王霄 万玉晴 Wang Xiao;Wan Yuqing(Taiji Computer Co.,Ltd.,Beijing 100102,China)
出 处:《计算机应用与软件》2024年第6期101-107,133,共8页Computer Applications and Software
基 金:国家重点研发计划项目(2018YFC0807700)。
摘 要:针对法院电子卷宗文本分类的主要问题,给出相应解决方案。提出卷宗文件的多维度语义表示方法,得到更准确全面的文本特征信息;使用基于高斯核的KELM(Kernel Extreme Learning Machine)学习文本分类器,获取全局最优解的同时大幅提高训练效率;使用基于RLS(Recursive Least Squares)的序列优化模型KOS-ELM,通过新样本对模型参数迭代更新,使分类模型具备在线自学习的能力,减少了对初始样本的依赖。对比实验证明,基于高斯核的KELM分类模型在正确率上比BP网络模型和LSSVM分别提高了2.66百分点和4.43百分点,但训练时间只有两者的1/6和1/10;采用多维度语义表示方法为模型提供输入,在正确率上比文本向量和词向量表示方法分别提高了8.84百分点和2.33百分点;采用基于RLS的序列优化模型KOS-ELM对弱分类器进行迭代优化,以4种不同步长迭代20次后,分类正确率均得到显著提升。This paper provides corresponding solutions to the main problems in the text classification of court electronic files.We propose a multi-dimensional semantic representation method for court case file to obtain more accurate and comprehensive text feature information.The Gaussian kernel-based kernel extreme learning machine(KELM)learning text classifier was used to get the global optimal solution while greatly improving the training efficiency.The sequence optimization model KOS-ELM based on recursive least squares(RLS)was used to iteratively update the model parameters through new samples.The solutions enabled the classification model to learn online by itself and reduce the dependence on the initial samples.Through comparative experiments,it was proved that the accuracy of the Gaussian kernel-based KELM classification model was 2.66 percentage points and 4.43 percentage points higher than that of the BP network model and LSSVM,but the training time was only 1/6 and 1/10 of the two.The multi-dimensional semantic representation method was used to provide input for the model,and the accuracy rate was 8.84 percentage points and 2.33 percentage points higher than the text vector and word vector representation methods respectively.The RLS-based sequence optimization model KOS-ELM was used to iteratively optimize the weak classifier.After 20 iterations with 4 different types of step-size,the classification accuracy was significantly improved.
关 键 词:法院电子卷宗 文本分类 语义表示 核极限学习机 递归最小二乘
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.120