CHALLENGE ON CITYSCAPE AERIAL IMAGE DATASET FOR OBJECT DETECTION

IEEE ICIP 2025 --- Grand Challenge

Congratulations to the winners!

First Place πŸ†: Team Double J

Jing Jie TAN and Yi Jie WONG

Universiti Tunku Abdul Rahman, Malaysia

Second Place πŸ₯ˆ: Team HUS_ChapHet

Minh Hieu VU, The Son PHAN and Trong Duc NGUYEN

Hanoi University of Science, Vietnam National University, Vietnam

ICIP Logo

Ranking

Welcome to the CADOT Challenge ranking page! Here, you can find the latest updates on the performance of various models on our dataset. We are excited to share the results of the challenge and showcase the advancements in object detection technology.

Stay tuned for more updates!

Top 10 Ranking

Ranking Team mAP@50
1st πŸ† Double J πŸ† 75.78
2nd πŸ₯ˆ HUS_ChapHet πŸ₯ˆ 75.19
3rd Test 75.19
4th Superme 69.12
5th CERIS 67.34
6th G_ICAR 67.25
7th CREDIT PVAMU 66.43
8th Is Fine Tuning All You Need? 64.83
9th Fine-Tuning Is All You Need 60.87
10th YOLO is not all you need 55.51

Upload Your Results

To submit your results, please upload a json file by clicking the button below. The file should contain the results of your model on the CADOT.

Our Baseline Performance

To establish a clear benchmark for participants, we defined baseline performance using four state-of-the-art object detection models: YOLOv11, YOLOv12, Faster R-CNN, and DiffusionDet. For each model, we detail the experimental settings and configurations that were used to obtain the reported results, ensuring transparency and reproducibility.

YOLOv11/YOLOv12

For YOLOv11 and YOLOv12, we adopted the default training settings as specified in their respective publications. Both models are trained using a learning rate of 0.01 with the SGD optimizer, incorporating a momentum of 0.937 and a weight decay of 0.0005 to mitigate overfitting. Training is conducted over 300 epochs with a batch size of 16.

Faster R-CNN

we used the official implementation from Faster R-CNN paper using a VGG-16 backbone. The model is trained with the SGD optimizer, a learning rate of 0.005, a momentum of 0.9, and a weight decay of 0.0001. To improve robustness, we incorporate multi-scale training and applied standard data augmentations such as horizontal flipping and random cropping. All remaining hyperparameters follow the default settings provided in the Fast R-CNN paper.

DiffusionDet

For DiffusionDet, we used the Swin Transformer as the backbone architecture. The model is optimized using AdamW with a base learning rate of 2.5Γ—10⁻⁡ and a weight decay of 1Γ—10⁻⁴. Training is performed over 450,000 iterations. We applied the default data augmentation strategies, including RandomFlip, RandomResizedCrop, and RandomCrop.

Model Performance Comparison (mAP@50)

Classes YOLOv11 YOLOv12 Faster R-CNN DiffusionDet
val test val test val test val test
Basketball Field 52 2 38 32 0 0 24.55 61.81
Building 82 83 81 82 76.79 73.26 75.84 76.64
Crosswalk 92 94 91 90 70.50 72.02 86.06 86.17
Football Field 80 38 30 30 67.18 35.28 58.13 42.52
Graveyard 53 18 62 58 34.17 35.84 70.23 61.42
Large Vehicle 52 63 57 58 37.05 45.85 86.14 86.20
Medium Vehicle 73 75 75 70 28.64 37.26 52.98 40.59
Playground 15 19 12 34 3.32 0 53.26 58.19
Roundabout 43 33 29 37 23.03 27.27 0.40 17.02
Ship 81 83 71 73 29.71 45.31 52.42 52.85
Small Vehicle 91 91 91 91 15.05 22.90 74.65 70.27
Swimming Pool 46 53 69 45 1.56 13.64 26.68 40.59
Tennis Court 73 78 52 68 58.18 44.69 34.39 46.94
Train 31 50 17 59 29.21 29.14 32.52 65.91
mean Average Precision 62 56 56 59 33.88 34.46 52.76 58.43