Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

Jierun Chen; Shiu-hong Kao; Hao He; Weipeng Zhuo; Wen Song; Chul‐Ho Lee; S.-H. Gary Chan

doi:10.1109/cvpr52729.2023.01157

Abstract

To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of re-duction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise con-volution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-lk, our tiny FasterNet-TO is 2.8×, 3.3×, and 2.4× faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being 2.9% more accurate. Our large FasterNet-L achieves impressive 83.5% top-1 accuracy, on par with the emerging Swin-B, while having 36% higher inference throughput on GPU, as well as saving 37% compute time on CPU. Code is available at https://github.com/JierunChen/FasterNet. © 2023 IEEE.

Keywords

FLOPSComputer scienceLatency (audio)Parallel computingReduction (mathematics)ThroughputComputationCode (set theory)Floating pointConvolution (computer science)Artificial neural networkDeep neural networksComputer engineeringAlgorithmArtificial intelligenceOperating system

Affiliated Institutions

Related Publications

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi , Cliff Young , Nishant Patil +73 more

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Proc...

2017 4222 citations

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

Péter Vajda , Yangqing Jia , Bichen Wu +7 more

Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (...

2019 1251 citations

GeePS

Henggang Cui , Hao Zhang , Gregory R. Ganger +2 more

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks w...

2016 296 citations

LTE-advanced: next-generation wireless broadband technology [Invited Paper

Amitava Ghosh , Rapeepat Ratasuk , Bishwarup Mondal +2 more

LTE Release 8 is one of the primary broadband technologies based on OFDM, which is currently being commercialized. LTE Release 8, which is mainly deployed in a macro/microcell l...

2010 IEEE Wireless Communications 948 citations

Publication Info

Year: 2023
Type: article
Pages: 12021-12031
Citations: 1668
Access: Closed

External Links

Download PDF (Free) View on DOI.org arXiv Semantic Scholar

Social Impact

Altmetric

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1668

OpenAlex

Influential

1583

CrossRef

Cite This

APA Style

                            
                                    Jierun Chen, 
                                
                                    Shiu-hong Kao, 
                                
                                    Hao He
                                
                                et al.
                            
                            (2023). 
                            Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks. 
                            2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
                            
                            , 12021-12031.
                            https://doi.org/10.1109/cvpr52729.2023.01157

Identifiers

DOI: 10.1109/cvpr52729.2023.01157
arXiv: 2303.03667

Data Quality

Data completeness: 88%