检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Rosel Oida-Onesa Melvin A.Ballera
机构地区:[1]Technologic al Institute of the Philippines Manila,Casal,Manila 1000,Philippines [2]Camarines Sur Polytechnic Colleges Nabua,Camarines Sur,Camarines Sur 4434,Philippines
出 处:《Data Intelligence》2024年第4期946-967,共22页数据智能(英文)
摘 要:Creating a parallel corpus for machine translation is a challenging and time-consuming task,especially in a linguistically diverse country like the Philippines,with 185 languages.Although a wealth of text is available,annotated data is scarce,particularly for languages like Bikol.Bikol is one of the major languages in the Philippines;however,its underrepresentation in the digital sphere is attributed to the absence of annotated data.This study outlines the development process of BFParCo,a proposed gold standard dataset for the Bikol and Filipino parallel corpus.The corpus underwent refinement through manual phrase alignment,translation,and evaluation.Subsequently,T5 and mT5 transformer models were fine-tuned with the parallel corpus and were evaluated using the BLEU metric.The results showed a notable improvement in Bilingual Evaluation Understudy(BLEU)score after fine-tuning,with an increase of 60.68 in BIK→FIL and 58.93 in FIL→BIK translations.Additionally,human evaluators comprehensively assessed the fine-tuned models'results using Multidimensional Quality Metrics and Scalar Quality Metrics error taxonomies.The fine-tuned models then were made publicly accessible through Hugging Face.This study represents a significant stride in advancing machine translation tools for Bikol and Filipino languages.
关 键 词:Natural language processing Language models Transfer learning Fine-tuning Low resource language Bikol FILIPINO
分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38