Abstract

We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the network, forcing the network to memorize the information throughout all the layers. Instead, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned affine transformation. Experiments on several challenging datasets demonstrate the superiority of our method compared to existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows users to easily control the style and content of image synthesis results as well as create multi-modal results. Code is available upon publication.

Keywords

Normalization (sociology)Computer scienceAffine transformationFidelityImage synthesisArtificial intelligenceModalImage (mathematics)Pattern recognition (psychology)Computer vision

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Pages
2332-2341
Citations
2662
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2662
OpenAlex
505
Influential
2034
CrossRef

Cite This

Taesung Park, Ming-Yu Liu, Ting-Chun Wang et al. (2019). Semantic Image Synthesis With Spatially-Adaptive Normalization. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2332-2341. https://doi.org/10.1109/cvpr.2019.00244

Identifiers

DOI
10.1109/cvpr.2019.00244
arXiv
1903.07291

Data Quality

Data completeness: 84%