A Novel Approach for Fish Species Classification Combining Convolutional Neural Networks and Vision Transformers

Mahdi Hamzaoui; Mohamed Ould-Elhassen Aoueileyine; Imen Filali; Ridha Bouallègue

doi:10.21203/rs.3.rs-8017135/v1

Abstract

<title>Abstract</title> Fish species classification plays an essential role in aquaculture management, marine biodiversity conservation and fisheries monitoring. Traditional methods rely heavily on manual identification, which is time-consuming, prone to human error and inefficient on a large scale. This paper proposes a new approach entitled DeepLIFT-ViT, which combines the Visual Geometry Group 16 (VGG16) and Vision Transformer (ViT) architectures to improve the accuracy and efficiency of image-based fish species classification. Unlike existing methods that rely solely on CNN-based or transformer-based models, our approach introduces a novel hybrid architecture that integrates interpretability-based saliency features with transformer-based attention mechanisms. The process of our approach begins with the VGG16 model pre-trained on the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) 2012 dataset, to extract deep visual features. The DeepLIFT interpretability technique is then used to generate heat maps highlighting salient areas of the image contributing to model predictions. These maps are then divided into patches. The patches are concatenated with those extracted from the original images to form combined vectors, which are fed into a ViT model containing an MLP (Multi-Layer Perceptron) head for final classification. The model was trained and evaluated on public datasets containing various fish species from different aquatic environments. Experimental results show that DeepLIFT-ViT outperforms existing state-of-the-art models in terms of classification accuracy, noise robustness and computational efficiency. With classification accuracy of up to 99%, this approach enhances the capabilities of automatic fish species recognition systems, offering a scalable solution for fisheries management and aquatic research.

Affiliated Institutions

Related Publications

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Yunpeng Chen , Tao Wang , Weihao Yu +5 more

Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT...

2021 2021 IEEE/CVF International Conferenc... 2067 citations

Video Swin Transformer

Ze Liu , Ning Jia , Yue Cao +4 more

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition ben...

2022 2022 IEEE/CVF Conference on Computer ... 1645 citations

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Chun-Fu Richard Chen , Quanfu Fan , Rameswar Panda

The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper...

2021 2021 IEEE/CVF International Conferenc... 1692 citations

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Wenhai Wang , Enze Xie , Xiang Li +6 more

Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for man...

2021 2021 IEEE/CVF International Conferenc... 4221 citations

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

Hu Cao , Yueyue Wang , Joy Chen +4 more

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architectu...

2023 Lecture notes in computer science 2757 citations

Publication Info

Year: 2025
Type: article
Citations: 0
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A Novel Approach for Fish Species Classification Combining Convolutional Neural Networks and Vision Transformers

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Mahdi Hamzaoui, 
                                
                                    Mohamed Ould-Elhassen Aoueileyine, 
                                
                                    Imen Filali
                                
                                et al.
                            
                            (2025). 
                            A Novel Approach for Fish Species Classification Combining Convolutional Neural Networks and Vision Transformers. 
                            
                            .
                            https://doi.org/10.21203/rs.3.rs-8017135/v1

Identifiers

DOI: 10.21203/rs.3.rs-8017135/v1