Aerial Image Detection Challenge

Ranking

Welcome to the CADOT Challenge ranking page! Here, you can find the latest updates on the performance of various models on our dataset. We are excited to share the results of the challenge and showcase the advancements in object detection technology.

Stay tuned for more updates!

Top 10 Ranking

Ranking	Team	mAP@50
1st	🏆 Double J 🏆	75.78
2nd	🥈 HUS_ChapHet 🥈	75.19
3rd	Test	75.19
4th	Superme	69.12
5th	CERIS	67.34
6th	G_ICAR	67.25
7th	CREDIT PVAMU	66.43
8th	Is Fine Tuning All You Need?	64.83
9th	Fine-Tuning Is All You Need	60.87
10th	YOLO is not all you need	55.51

Upload Your Results

To submit your results, please upload a json file by clicking the button below. The file should contain the results of your model on the CADOT.

Our Baseline Performance

To establish a clear benchmark for participants, we defined baseline performance using four state-of-the-art object detection models: YOLOv11, YOLOv12, Faster R-CNN, and DiffusionDet. For each model, we detail the experimental settings and configurations that were used to obtain the reported results, ensuring transparency and reproducibility.

YOLOv11/YOLOv12

For YOLOv11 and YOLOv12, we adopted the default training settings as specified in their respective publications. Both models are trained using a learning rate of 0.01 with the SGD optimizer, incorporating a momentum of 0.937 and a weight decay of 0.0005 to mitigate overfitting. Training is conducted over 300 epochs with a batch size of 16.

Faster R-CNN

we used the official implementation from Faster R-CNN paper using a VGG-16 backbone. The model is trained with the SGD optimizer, a learning rate of 0.005, a momentum of 0.9, and a weight decay of 0.0001. To improve robustness, we incorporate multi-scale training and applied standard data augmentations such as horizontal flipping and random cropping. All remaining hyperparameters follow the default settings provided in the Fast R-CNN paper.

DiffusionDet

For DiffusionDet, we used the Swin Transformer as the backbone architecture. The model is optimized using AdamW with a base learning rate of 2.5×10⁻⁵ and a weight decay of 1×10⁻⁴. Training is performed over 450,000 iterations. We applied the default data augmentation strategies, including RandomFlip, RandomResizedCrop, and RandomCrop.

Model Performance Comparison (mAP@50)

Classes	YOLOv11		YOLOv12		Faster R-CNN		DiffusionDet
Classes	val	test	val	test	val	test	val	test
Basketball Field	52	2	38	32	0	0	24.55	61.81
Building	82	83	81	82	76.79	73.26	75.84	76.64
Crosswalk	92	94	91	90	70.50	72.02	86.06	86.17
Football Field	80	38	30	30	67.18	35.28	58.13	42.52
Graveyard	53	18	62	58	34.17	35.84	70.23	61.42
Large Vehicle	52	63	57	58	37.05	45.85	86.14	86.20
Medium Vehicle	73	75	75	70	28.64	37.26	52.98	40.59
Playground	15	19	12	34	3.32	0	53.26	58.19
Roundabout	43	33	29	37	23.03	27.27	0.40	17.02
Ship	81	83	71	73	29.71	45.31	52.42	52.85
Small Vehicle	91	91	91	91	15.05	22.90	74.65	70.27
Swimming Pool	46	53	69	45	1.56	13.64	26.68	40.59
Tennis Court	73	78	52	68	58.18	44.69	34.39	46.94
Train	31	50	17	59	29.21	29.14	32.52	65.91
mean Average Precision	62	56	56	59	33.88	34.46	52.76	58.43

CHALLENGE ON CITYSCAPE AERIAL IMAGE DATASET FOR OBJECT DETECTION

IEEE ICIP 2025 --- Grand Challenge

Congratulations to the winners!

First Place 🏆: Team Double J

Jing Jie TAN and Yi Jie WONG

Universiti Tunku Abdul Rahman, Malaysia

Second Place 🥈: Team HUS_ChapHet

Minh Hieu VU, The Son PHAN and Trong Duc NGUYEN

Hanoi University of Science, Vietnam National University, Vietnam