ViTAU:基于Vision transformer和面部动作单元的面瘫识别与分析  

ViTAU:Facial paralysis recognition and analysis based on vision transformer and facial action units

作  者:高嘉 蔡文浩 赵俊莉 段福庆[2] GAO Jia;CAI Wenhao;ZHAO Junli;DUAN Fuqing(College of Computer Science&Technology,Qingdao University,Qingdao 266071,China;School of Artificial Intelligence,Beijing Normal University,Beijing 100875,China)

机构地区:[1]青岛大学计算机科学技术学院,青岛266071 [2]北京师范大学人工智能学院,北京100875

出  处:《工程科学学报》2025年第2期351-363,共13页Chinese Journal of Engineering

基  金:山东省自然科学基金资助项目(ZR2024MF087);国家自然科学基金资助项目(62172247)。

摘  要:面部神经麻痹(Facial nerve paralysis,FNP),通常称为贝尔氏麻痹或面瘫,对患者的日常生活和心理健康产生显著影响,面瘫的及时识别和诊断对于患者的早期治疗和康复至关重要.随着深度学习和计算机视觉技术的快速发展,面瘫的自动识别变得可行,为诊断提供了一种更准确和客观的方式.目前的研究主要集中关注面部的整体变化,而忽略了面部细节的重要性.面部不同部位对识别结果的影响力并不相同,这些研究尚未对面部各个区域进行细致区分和分析.本项研究引入结合Vision transformer(ViT)模型和动作单元(Action unit,AU)区域检测网络的创新性方法用于面瘫的自动识别及区域分析.ViT模型通过自注意力机制精准识别是否面瘫,同时,基于AU的策略从StyleGAN2模型提取的特征图中,利用金字塔卷积神经网络分析受影响区域.这一综合方法在YouTube Facial Palsy(YFP)和经过扩展的Cohn Kanade(CK+)数据集上的实验中分别达到99.4%的面瘫识别准确率和81.36%的面瘫区域识别准确率.通过与最新方法的对比,实验结果展示了所提的自动面瘫识别方法的有效性.Facial nerve paralysis(FNP),commonly known as Bell’s palsy or facial paralysis,significantly affects patients’daily lives and mental well-being.Timely identification and diagnosis are crucial for early treatment and rehabilitation.With the rapid advancement of deep learning and computer vision technologies,automatic recognition of facial paralysis has become feasible,offering a more accurate and objective diagnostic approach.Current research primarily focuses on broad facial changes and often neglects finer facial details,which leads to insufficient analysis of how different areas affect recognition results.This study proposes an innovative method that combines the vision transformer(ViT)model with an action unit(AU)facial region detection network to automatically recognize and analyze facial paralysis.Initially,the ViT model utilizes its self-attention mechanism to accurately determine the presence of facial paralysis.Subsequently,we analyzed the AU data to assess the activity of facial muscles,allowing for a deeper evaluation of the affected areas.The self-attention mechanism in the transformer architecture captures the global contextual information required to recognize facial paralysis.To accurately determine the specific affected regions,we use the pixel2style2pixel(pSp)encoder and the StyleGAN2 generator to encode and decode images and extract feature maps that represent facial characteristics.These maps are then processed through a pyramid convolutional neural network interpreter to generate heatmaps.By optimizing the mean squared error between the predicted and actual heatmaps,we can effectively identify the affected paralysis areas.Our proposed method integrates ViT with facial AUs,designing a ViT-based facial paralysis recognition network that enhances the extraction of local area features through its self-attention mechanism,thereby enabling precise recognition of facial paralysis.Additionally,by incorporating facial AU data,we conducted detailed regional analyses for patients identified with facia

关 键 词:TRANSFORMER 面部动作单元 多分辨率特征图 生成器 热力图回归 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象