Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features  被引量:1

在线阅读下载全文

作  者:Fethi Fkih Mohammed Alsuhaibani Delel Rhouma Ali Mustafa Qamar 

机构地区:[1]Department of Computer Science,College of Computer,Qassim University,Buraydah,Saudi Arabia [2]MARS Research Lab LR17ES05,University of Sousse,Sousse,Tunisia

出  处:《Computers, Materials & Continua》2023年第6期5871-5886,共16页计算机、材料和连续体(英文)

摘  要:Text classification is an essential task for many applications related to the Natural Language Processing domain.It can be applied in many fields,such as Information Retrieval,Knowledge Extraction,and Knowledge modeling.Even though the importance of this task,Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases.This paper introduces a novel machine learning-based approach that exclusively uses hybrid(stylistic and semantic)features.First,we clean the Arabic documents and translate them to English using translation tools.Consequently,the semantic features are automatically extracted from the translated documents using an existing database of English topics.Besides,the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation.Therefore,we obtain 3 types of features:semantic,stylistic and hybrid.Using each time,a different type of feature,we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus.The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features(F1-score=0.88%).

关 键 词:Arabic text classification machine learning stylistic features semantic features TOPICS 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象