Abstract

In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs non-overlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

Keywords

DeblurringImage restorationComputer scienceTransformerEncoderArtificial intelligenceComputer visionPattern recognition (psychology)Image (mathematics)Image processingEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2022
Type
article
Pages
17662-17672
Citations
1731
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1731
OpenAlex
262
Influential
1562
CrossRef

Cite This

Zhendong Wang, Xiaodong Cun, Jianmin Bao et al. (2022). Uformer: A General U-Shaped Transformer for Image Restoration. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 17662-17672. https://doi.org/10.1109/cvpr52688.2022.01716

Identifiers

DOI
10.1109/cvpr52688.2022.01716
arXiv
2106.03106

Data Quality

Data completeness: 84%