Abstract

Object detection on drone-captured scenarios is a recent popular task. As drones always navigate in different altitudes, the object scale varies violently, which burdens the optimization of networks. Moreover, high-speed and low-altitude flight bring in the motion blur on the densely packed objects, which leads to great challenge of object distinction. To solve the two issues mentioned above, we propose TPH-YOLOv5. Based on YOLOv5, we add one more prediction head to detect different-scale objects. Then we replace the original prediction heads with Transformer Prediction Heads (TPH) to explore the prediction potential with self-attention mechanism. We also integrate convolutional block attention model (CBAM) to find attention region on scenarios with dense objects. To achieve more improvement of our proposed TPH-YOLOv5, we provide bags of useful strategies such as data augmentation, multi-scale testing, multi-model integration and utilizing extra classifier. Extensive experiments on dataset VisDrone2021 show that TPH-YOLOv5 have good performance with impressive interpretability on drone-captured scenarios. On DET-test-challenge dataset, the AP result of TPH-YOLOv5 are 39.18%, which is better than previous SOTA method (DPNetV3) by 1.81%. On VisDrone Challenge 2021, TPH-YOLOv5 wins 5 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> place and achieves well-matched results with 1 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">st</sup> place model (AP 39.43%). Compared to baseline model (YOLOv5), TPH-YOLOv5 improves about 7%, which is encouraging and competitive.

Keywords

Computer scienceInterpretabilityArtificial intelligenceDroneClassifier (UML)TransformerObject (grammar)Machine learningObject detectionPattern recognition (psychology)Engineering

Affiliated Institutions

Related Publications

Low-density parity-check codes

A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number <tex xmlns:mml="http://www....

1962 IEEE Transactions on Information Theory 10397 citations

Compressed sensing

Suppose x is an unknown vector in Ropf <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</sup> (a digital image or signal); we pla...

2006 IEEE Transactions on Information Theory 22524 citations

Publication Info

Year
2021
Type
article
Pages
2778-2788
Citations
1875
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1875
OpenAlex
97
Influential
1677
CrossRef

Cite This

Xingkui Zhu, Shuchang Lyu, Xu Wang et al. (2021). TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) , 2778-2788. https://doi.org/10.1109/iccvw54120.2021.00312

Identifiers

DOI
10.1109/iccvw54120.2021.00312
arXiv
2108.11539

Data Quality

Data completeness: 88%