Abstract

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .

Keywords

Computer scienceArtificial neural networkArtificial intelligence

Affiliated Institutions

Related Publications

Squeeze-and-Excitation Networks

Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local ...

2018 2018 IEEE/CVF Conference on Computer ... 25361 citations

Publication Info

Year
2018
Type
preprint
Pages
7794-7803
Citations
10740
Access
Closed

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

10740
OpenAlex
991
Influential
8291
CrossRef

Cite This

Xiaolong Wang, Ross Girshick, Abhinav Gupta et al. (2018). Non-local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , 7794-7803. https://doi.org/10.1109/cvpr.2018.00813

Identifiers

DOI
10.1109/cvpr.2018.00813
arXiv
1711.07971

Data Quality

Data completeness: 84%