基于LSA的二次降维法在中文法律案情文本分类中的应用  被引量:8

Application of quadratic dimension reduction method based on LSA in classification of the chinese legal text

在线阅读下载全文

作  者:熊小梅[1] 刘永浪[1] 

机构地区:[1]江西蓝天学院,南昌330098

出  处:《电子测量技术》2007年第10期111-114,共4页Electronic Measurement Technology

摘  要:利用文本挖掘来表达文本特征,由于文本表现出巨大的维数,从而导致处理过程计算复杂,因此,首先应该对文本进行降维处理。潜在语义分析理论(latent semantican alysis,LSA)作为一种文本聚类的方法,在有效提取文本信息表现出许多特有的优势,在多个领域中被引用。本文构建了中文法律案情文本分类系统,引入LSA方法进行文本向量空间的二次降维,并利用LSA方法处理后的特征集——文档矩阵代替原有矩阵,从而进一步删除噪声,加快分类系统的处理速度。文中给出了具体实现过程及实验数据,通过实验证明该方法能收到较好的效果。The text feature matrix has large dimensionality in expressing text feature using data mining, and leads to complex computation. So it is needed to reduce dimensionality before data mining. As text clustering method, latent semantic analysis(LSA)has advantage in text information extraction, and have been widely used in many fields. This paper established a primary automatic classification system for chinese legal text with quadratic dimension reduction method based on LSA. In the system LSA is used in increasing the speed of text classification processing with a feature set-text matrix treated by LSA replacing old one for farther denoising. The process of realization and the experiment data were given in this paper. Experiment results show that it has good effects.

关 键 词:文本分类 二次降维 法律文本 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象