Abstract
Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation, multi-scale testing, multi-model integration and utilizing extra classifier. Extensive experiments on dataset VisDrone2021 show that TPH-YOLOv5 have good performance with impressive interpretability on drone-captured scenarios. On DET-test-challenge dataset, the AP result of TPH-YOLOv5 are 39.18%, which is better than previous SOTA method (DPNetV3) by 1.81%. On VisDrone Challenge 2021, TPH-YOLOv5 wins 5 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> place and achieves well-matched results with 1 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">st</sup> place model (AP 39.43%). Compared to baseline model (YOLOv5), TPH-YOLOv5 improves about 7%, which is encouraging and competitive.
Keywords
Affiliated Institutions
Related Publications
The Best Two Independent Measurements Are Not the Two Best
Consider an item that belongs to one of two classes, θ = 0 or θ = 1, with equal probability. Suppose also that there are two measurement experiments E <sub xmlns:mml="http://www...
Lower Bounds for the Partitioning of Graphs
Let a k-partition of a graph be a division of the vertices into k disjoint subsets containing m <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.or...
On the Capacity of Radio Communication Systems with Diversity in a Rayleigh Fading Environment
In this paper, we study the fundamental limits on the data rate of multiple antenna systems in a Rayleigh fading environment. With <tex xmlns:mml="http://www.w3.org/1998/Math/Ma...
Low-density parity-check codes
A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number <tex xmlns:mml="http://www....
Compressed sensing
Suppose x is an unknown vector in Ropf <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</sup> (a digital image or signal); we pla...
Publication Info
- Year
- 2021
- Type
- article
- Pages
- 2778-2788
- Citations
- 1875
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/iccvw54120.2021.00312
- arXiv
- 2108.11539