Abstract

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at https://github.com/huawei-noah/Pretrained-IPT and https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/IPT

Keywords

Computer scienceTransformerArtificial intelligenceBenchmark (surveying)Deep learningImage processingMachine learningPattern recognition (psychology)Image (mathematics)VoltageEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2021
Type
article
Pages
12294-12305
Citations
1817
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1817
OpenAlex
159
Influential
1577
CrossRef

Cite This

Hanting Chen, Yunhe Wang, Tianyu Guo et al. (2021). Pre-Trained Image Processing Transformer. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 12294-12305. https://doi.org/10.1109/cvpr46437.2021.01212

Identifiers

DOI
10.1109/cvpr46437.2021.01212
arXiv
2012.00364

Data Quality

Data completeness: 88%