Abstract

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

Keywords

Artificial intelligencePascal (unit)Computer scienceContrast (vision)Unsupervised learningSegmentationFeature learningMachine learningRepresentation (politics)Pattern recognition (psychology)Transfer of learningEncoderNatural language processing

Affiliated Institutions

Related Publications

Publication Info

Year
2020
Type
article
Pages
9726-9735
Citations
11112
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

11112
OpenAlex
2037
Influential
8547
CrossRef

Cite This

Kaiming He, Haoqi Fan, Yuxin Wu et al. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 9726-9735. https://doi.org/10.1109/cvpr42600.2020.00975

Identifiers

DOI
10.1109/cvpr42600.2020.00975
arXiv
1911.05722

Data Quality

Data completeness: 84%