Document Clustering Using Semantic Cliques Aggregation  

Document Clustering Using Semantic Cliques Aggregation

在线阅读下载全文

作  者:Ajit Kumar I-Jen Chiang 

机构地区:[1]Goa Institute of Management, Ribandar, India [2]School of Management, Taipei Medical University, Taiwan [3]Institute of Biomedical Engineering, National Taiwan University, Taiwan

出  处:《Journal of Computer and Communications》2015年第12期28-40,共13页电脑和通信(英文)

摘  要:The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering.

关 键 词:Document Clustering SEMANTIC CLIQUE ASSOCIATION AGGREGATION THEME 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象