×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

VISION TRANSFORMER USING FRACTIONAL GRADIENT ATTENTION TOOLS FOR ROBUST IMAGE CLASSIFICATION

By
S. Shalini Orcid logo ,
S. Shalini

Department of Computer Science, M.G.R College (Affiliated to Periyar University), Salem , Tamil Nadu , India

P.S. Eliahim Jeevaraj Orcid logo
P.S. Eliahim Jeevaraj
Contact P.S. Eliahim Jeevaraj

Bishop Heber College, (Affiliated to Bharathidasan University) , Tiruchirappalli, Tamil Nadu , India

Abstract

Vision Transformers (ViTs) have attained very promising results in the field of computer vision, but those models continue to face several critical issues such as gradient saturation and poor generalization on smaller datasets. The current attention mechanisms are inefficient to resolve issues by the leading to ineffective feature extraction and an incompetent optimization. To overcome these drawbacks, a new Fractional Gradient Attention (FGA) mechanism is proposed to ViTs with the idea of redefining gradient propagation during training with the help of a fractional calculus. The proposed approach modifies the backpropagation by employing a fractional-order derivative as the new method increases the sensitivity of the model to long-range dependencies and increases the stability of training. In this paper, the design of the architecture, the development of the fractional gradient, and the broad ablation tests of different parameters, including the fractional order (α), patch size and positional encoding are included and discussed.  From the detailed ablation study, the parameters of the model have been fixed as α = 0.7, patch size = 8×8 and six transformer encoder blocks with positional encoding enabled.  The efficiency of the proposed model is evaluated using Accuracy, Sensitivity, Specificity, Matthews Correlation Coefficient (MCC) and Geometric Mean (GM). Tests conducted on standard benchmark datasets such as CIFAR-10 and ImageNet-100 demonstrate a significant performance increase over the current state-of-the-art models: the ViT-FGA model with 96.8% accuracy, 96.2% sensitivity and 95.3% specificity more than EfficientNet-B0 (92.3%), ConvNeXt-T (92.7%), and plain ViTs (91.2%). These findings are evident that a useful addition to transformer-based vision models is a form of fractional gradients and this is very useful in both theoretical and application development in areas such as medical and hyperspectral imaging.

References

1.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. 2020 Oct 22.
2.
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image transformers & distillation through attention. In International conference on machine learning 2021 Jul 1 (pp. 10347-10357). PMLR.
3.
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision 2021 (pp. 10012-10022).
4.
Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451. 2020 Jan 13.
5.
Herrera-Alcántara O, Torres-Hernández A, González-Cortés JC, Castillo-Escobedo JA. Fractional derivative gradient-based optimizers for neural networks. Appl Sci. 2022 Nov;12(22):11575.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.