检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]江西蓝天学院,南昌330098
出 处:《电子测量技术》2007年第10期111-114,共4页Electronic Measurement Technology
摘 要:利用文本挖掘来表达文本特征,由于文本表现出巨大的维数,从而导致处理过程计算复杂,因此,首先应该对文本进行降维处理。潜在语义分析理论(latent semantican alysis,LSA)作为一种文本聚类的方法,在有效提取文本信息表现出许多特有的优势,在多个领域中被引用。本文构建了中文法律案情文本分类系统,引入LSA方法进行文本向量空间的二次降维,并利用LSA方法处理后的特征集——文档矩阵代替原有矩阵,从而进一步删除噪声,加快分类系统的处理速度。文中给出了具体实现过程及实验数据,通过实验证明该方法能收到较好的效果。The text feature matrix has large dimensionality in expressing text feature using data mining, and leads to complex computation. So it is needed to reduce dimensionality before data mining. As text clustering method, latent semantic analysis(LSA)has advantage in text information extraction, and have been widely used in many fields. This paper established a primary automatic classification system for chinese legal text with quadratic dimension reduction method based on LSA. In the system LSA is used in increasing the speed of text classification processing with a feature set-text matrix treated by LSA replacing old one for farther denoising. The process of realization and the experiment data were given in this paper. Experiment results show that it has good effects.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.152.174