UNETR: Transformers for 3D Medical Image Segmentation

Abstract

Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard.

Keywords

Computer scienceSegmentationEncoderTransformerArtificial intelligenceConvolutional neural networkDeep learningImage segmentationRecurrent neural networkPattern recognition (psychology)Computer visionArtificial neural networkEngineering

Affiliated Institutions

Vanderbilt University US

Related Publications

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

Hu Cao , Yueyue Wang , Joy Chen +4 more

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architectu...

2023 Lecture notes in computer science 2757 citations

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sixiao Zheng , Jiachen Lu , Hengshuang Zhao +8 more

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolutio...

2021 3257 citations

SegFormer: Simple and Efficient Design for Semantic Segmentation with\n Transformers

Enze Xie , Wenhai Wang , Zhiding Yu +3 more

We present SegFormer, a simple, efficient yet powerful semantic segmentation\nframework which unifies Transformers with lightweight multilayer perception\n(MLP) decoders. SegFor...

2021 arXiv (Cornell University) 3103 citations

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations...

2019 IEEE Transactions on Medical Imaging 3567 citations

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Wenhai Wang , Enze Xie , Xiang Li +6 more

Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for man...

2021 2021 IEEE/CVF International Conferenc... 4221 citations

Publication Info

Year: 2022
Type: article
Pages: 1748-1758
Citations: 2272
Access: Closed

External Links

Download PDF (Free) View on DOI.org arXiv Semantic Scholar

Social Impact

Altmetric

UNETR: Transformers for 3D Medical Image Segmentation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2272

OpenAlex

262

Influential

2063

CrossRef

Cite This

APA Style

                            
                                    Ali Hatamizadeh, 
                                
                                    Yucheng Tang, 
                                
                                    Vishwesh Nath
                                
                                et al.
                            
                            (2022). 
                            UNETR: Transformers for 3D Medical Image Segmentation. 
                            2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
                            
                            , 1748-1758.
                            https://doi.org/10.1109/wacv51458.2022.00181

Identifiers

DOI: 10.1109/wacv51458.2022.00181
arXiv: 2103.10504

Data Quality

Data completeness: 84%