Efficient Facial Expression Recognition Transformer with Additively Comprised Class Attention Encoder

Jiasen Wang; Yuanjing Hu; Aibin Huang

doi:10.2352/J.ImagingSci.Technol.2025.69.1.010410

Back to articles

ACDM 2024 Digital Media Special Issue

Volume: 69 | Article ID: 010410

Efficient Facial Expression Recognition Transformer with Additively Comprised Class Attention Encoder

vision transformer class attention facial expression recognition squeeze and excitation

DOI : 10.2352/J.ImagingSci.Technol.2025.69.1.010410 Published Online : January 2025

Abstract

Facial Expression Recognition (FER) models based on the Vision Transformer (ViT) have demonstrated promising performance on diverse datasets. However, the computational cost of the transformer encoder poses challenges in scenarios where strong computational resources are required. The utilization of large feature maps enhances expression information, but leads to a significant increase in token length. Consequently, the computational complexity grows quadratically with the length of the tokens as O(N²). Tasks involving large feature maps, such as high-resolution FER, encounter computational bottlenecks. To alleviate these challenges, we propose the Additively Comprised Class Attention Encoder as a substitute for the original ViT encoder, which reduces the complexity of the attention computation from O(N²) to O(N). Additionally, we introduce a novel token-level Squeeze-and-Excitation method to facilitate the model’s learning of more efficient representations. Experimental evaluations on the RAF-DB and FERplus datasets show that our approach can improve running speed by at least 27% (for 7 × 7 feature maps) while maintaining comparable accuracy, and it performs more efficiently on larger feature maps (about 49% speedup for 14 × 14 feature maps, and triple the speed for 28 × 28 feature maps).

Journal Title : Journal of Imaging Science and Technology

Publisher Name : Society for Imaging Science and Technology

Downloads 17

Cite this article

Jiasen Wang, Yuanjing Hu, Aibin Huang, "Efficient Facial Expression Recognition Transformer with Additively Comprised Class Attention Encoder" in Journal of Imaging Science and Technology, 2025, pp 1 - 7, https://doi.org/10.2352/J.ImagingSci.Technol.2025.69.1.010410

Copy citation

Article timeline

received May 2024
accepted October 2024
PublishedJanuary 2025

articleview.keywords

Login or subscribe to view the content