检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Mohammed Abdelmajeed Jiangbin Zheng Ahmed Murtadha Youcef Nafa Mohammed Abaker Muhammad Pervez Akhter
机构地区:[1]School of Computer Science,Northwestern Polytechnical University,Xi'an,710072,China [2]Department of Computer Science,Applied College,King Khalid University,Muhayil,6311,Saudi Arabia [3]Computer Science Department,National University of Modern Languages,Faisalabad,38000,Pakistan
出 处:《Computers, Materials & Continua》2025年第5期3471-3491,共21页计算机、材料和连续体(英文)
基 金:supported by the Deanship of Scientific Research at King Khalid University through Small Groups funding(Project Grant No.RGP1/243/45);The funding was awarded to Dr.Mohammed Abker.And Natural Science Foundation of China under Grant 61901388.
摘 要:Arabic Dialect Identification(DID)is a task in Natural Language Processing(NLP)that involves determining the dialect of a given piece of text in Arabic.The state-of-the-art solutions for DID are built on various deep neural networks that commonly learn the representation of sentences in response to a given dialect.Despite the effectiveness of these solutions,the performance heavily relies on the amount of labeled examples,which is labor-intensive to atain and may not be readily available in real-world scenarios.To alleviate the burden of labeling data,this paper introduces a novel solution that leverages unlabeled corpora to boost performance on the DID task.Specifically,we design an architecture that enables learning the shared information between labeled and unlabeled texts through a gradient reversal layer.The key idea is to penalize the model for learning source dataset specific features and thus enable it to capture common knowledge regardless of the label.Finally,we evaluate the proposed solution on benchmark datasets for DID.Our extensive experiments show that it performs signifcantly better,especially,with sparse labeled data.By comparing our approach with existing Pre-trained Language Models(PLMs),we achieve a new state-of-the-art performance in the DID field.The code will be available on GitHub upon the paper's acceptance.
关 键 词:Arabic dialect identification natural language processing bidirectional encoder representations from transformers pre-trained language models gradient reversal layer
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49