Abstract
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code is made available. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
Keywords
Affiliated Institutions
Related Publications
Self-organisation: a derivation from first principles of a class of learning algorithms
A novel derivation of T. Kohonen's topographic mapping learning algorithm (Self-Organization and Associative Memory, Springer-Verlag, 1984) is presented. Thus the author prescri...
Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set
The application of the convolutional neural network has shown to greatly improve the accuracy of building extraction from remote sensing imagery. In this paper, we created and m...
Probabilistic visual learning for object detection
We present an unsupervised technique for visual learning which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density ...
Using locally weighted regression for robot learning
The use of locally weighted regression in memory-based robot learning is explored. A local model is formed to answer each query, using a weighted regression in which close point...
Exploring Strategies for Training Deep Neural Networks
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently...
Publication Info
- Year
- 2021
- Type
- article
- Citations
- 3039
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/cvpr46437.2021.01549