Abstract

Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code is made available. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>

Keywords

Simple (philosophy)Representation (politics)Computer scienceEncoderArtificial intelligenceSimilarity (geometry)Feature learningCode (set theory)Machine learningSubject (documents)Deep learningTheoretical computer scienceNatural language processingImage (mathematics)Programming languageEpistemology

Affiliated Institutions

Related Publications

Publication Info

Year
2021
Type
article
Citations
3039
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3039
OpenAlex

Cite This

Xinlei Chen, Kaiming He (2021). Exploring Simple Siamese Representation Learning. . https://doi.org/10.1109/cvpr46437.2021.01549

Identifiers

DOI
10.1109/cvpr46437.2021.01549