VISION TRANSFORMER USING FRACTIONAL GRADIENT ATTENTION TOOLS FOR ROBUST IMAGE CLASSIFICATION

S. Shalini; P.S. Eliahim Jeevaraj

doi:10.70102/afts.2026.1835.205

Original scientific article

Published: May 2026

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2026.1835.205

VISION TRANSFORMER USING FRACTIONAL GRADIENT ATTENTION TOOLS FOR ROBUST IMAGE CLASSIFICATION

Abstract

Vision Transformers (ViTs) have attained very promising results in the field of computer vision, but those models continue to face several critical issues such as gradient saturation and poor generalization on smaller datasets. The current attention mechanisms are inefficient to resolve issues by the leading to ineffective feature extraction and an incompetent optimization. To overcome these drawbacks, a new Fractional Gradient Attention (FGA) mechanism is proposed to ViTs with the idea of redefining gradient propagation during training with the help of a fractional calculus. The proposed approach modifies the backpropagation by employing a fractional-order derivative as the new method increases the sensitivity of the model to long-range dependencies and increases the stability of training. In this paper, the design of the architecture, the development of the fractional gradient, and the broad ablation tests of different parameters, including the fractional order (α), patch size and positional encoding are included and discussed. From the detailed ablation study, the parameters of the model have been fixed as α = 0.7, patch size = 8×8 and six transformer encoder blocks with positional encoding enabled. The efficiency of the proposed model is evaluated using Accuracy, Sensitivity, Specificity, Matthews Correlation Coefficient (MCC) and Geometric Mean (GM). Tests conducted on standard benchmark datasets such as CIFAR-10 and ImageNet-100 demonstrate a significant performance increase over the current state-of-the-art models: the ViT-FGA model with 96.8% accuracy, 96.2% sensitivity and 95.3% specificity more than EfficientNet-B0 (92.3%), ConvNeXt-T (92.7%), and plain ViTs (91.2%). These findings are evident that a useful addition to transformer-based vision models is a form of fractional gradients and this is very useful in both theoretical and application development in areas such as medical and hyperspectral imaging.

Keywords:

vision transformers,

fractional gradient based attention (FGA),

fractional calculus,

gradient propagation,

feature extraction.

References

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. 2020 Oct 22.

Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image transformers & distillation through attention. In International conference on machine learning 2021 Jul 1 (pp. 10347-10357). PMLR.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision 2021 (pp. 10012-10022).

Kitaev N, Kaiser Ł, Levskaya A. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451. 2020 Jan 13.

Herrera-Alcántara O, Torres-Hernández A, González-Cortés JC, Castillo-Escobedo JA. Fractional derivative gradient-based optimizers for neural networks. Appl Sci. 2022 Nov;12(22):11575.

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 35, 2026

PEDAGOGICAL IMPACT OF MULTI SENSORY VIRTUAL REALITY SIMULATIONS ON HISTORICAL PERSPECTIVE TAKING AND SPATIAL UNDERSTANDING IN SECONDARY EDUCATION SYMPHONY OF THE SEAS: UNVEILING METAGENOMIC BIODIVERSITY THROUGH 16S rRNA IN THE RED SEA RICE LEAF DISEASE DIAGNOSIS THROUGH DEEP LEARNING: AN INCEPTIONV3 APPROACH WITH SPATIAL ATTENTION FOR SUSTAINABLE AGRICULTURE AND FOOD SECURITY PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING CONSCIOUS CONSUMERISM CAUSE RELATED MARKETING IN GEN Z AWARENESS AND ACTION See full issue

About us

Editorial policy

VISION TRANSFORMER USING FRACTIONAL GRADIENT ATTENTION TOOLS FOR ROBUST IMAGE CLASSIFICATION

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 35, 2026

Citations

Disclaimer