TGFCM:基于模糊聚类的中文文本挖掘的新方法  

TGFCM: A Novel Approach of Chinese Text Mining Based on Fuzzy Clustering

在线阅读下载全文

作  者:耿新青[1] 王正欧[1] 

机构地区:[1]天津大学系统工程研究所,天津300072

出  处:《计算机工程》2006年第5期7-9,共3页Computer Engineering

基  金:国家自然科学基金资助项目(60275020)

摘  要:提出一种新的动态模糊聚类的方法,针对传统的模糊聚类需要预先确定聚类数的问题,提出采用动态自组织映射神经网络来确定聚类数,并通过文本向量空间模型和TF?IDF方法来确定文本的特征向量,再将动态自组织映射神经网络得到的聚类数,用模糊C均值算法(FCM)函数处理,得到聚类的结果。该算法同仅用动态自组织映射神经网络算法的运行结果相比,具有运行聚类结果精度高的优点,模糊聚类更适合处理语义的多样性和文本归属的模糊性,实验验证了算法的有效性。A novel approach is presented. The main defect of traditional methods of fuzzy clustering is to known the number of clustering in advance. This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering. The text eigenvector is acquired based on the vector space model(VSM) and TF.IDF method. The result of clustering is attained by fuzzy C mean algorithm (FCM). The number of clustering acquired by the dynamic self-organizing maps is introduced into the fuzzy C mean algorithm (FCM). Compared to the dynamic self-organizing maps algorithm, the present algorithm possesses higher precision. The fuzzy clustering is suitable for dealing with the semantic variety and complexity, The example demonstrates the effectiveness of the present algorithm.

关 键 词:自组织映射网络 文本特征向量 模糊聚类 聚类数 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象