,
Department of Computer Science, M.G.R College (Affiliated to Periyar University), Salem , Tamil Nadu , India
Bishop Heber College, (Affiliated to Bharathidasan University) , Tiruchirappalli, Tamil Nadu , India
Vision Transformers (ViTs) have attained very promising results in the field of computer vision, but those models continue to face several critical issues such as gradient saturation and poor generalization on smaller datasets. The current attention mechanisms are inefficient to resolve issues by the leading to ineffective feature extraction and an incompetent optimization. To overcome these drawbacks, a new Fractional Gradient Attention (FGA) mechanism is proposed to ViTs with the idea of redefining gradient propagation during training with the help of a fractional calculus. The proposed approach modifies the backpropagation by employing a fractional-order derivative as the new method increases the sensitivity of the model to long-range dependencies and increases the stability of training. In this paper, the design of the architecture, the development of the fractional gradient, and the broad ablation tests of different parameters, including the fractional order (α), patch size and positional encoding are included and discussed. From the detailed ablation study, the parameters of the model have been fixed as α = 0.7, patch size = 8×8 and six transformer encoder blocks with positional encoding enabled. The efficiency of the proposed model is evaluated using Accuracy, Sensitivity, Specificity, Matthews Correlation Coefficient (MCC) and Geometric Mean (GM). Tests conducted on standard benchmark datasets such as CIFAR-10 and ImageNet-100 demonstrate a significant performance increase over the current state-of-the-art models: the ViT-FGA model with 96.8% accuracy, 96.2% sensitivity and 95.3% specificity more than EfficientNet-B0 (92.3%), ConvNeXt-T (92.7%), and plain ViTs (91.2%). These findings are evident that a useful addition to transformer-based vision models is a form of fractional gradients and this is very useful in both theoretical and application development in areas such as medical and hyperspectral imaging.
This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.