Hyperparameter | CIFAR-10 / CIFAR-100 | ImageNet |
|---|---|---|
Optimizer | SGD | SGD |
Initial Learning Rate | 0.1 | 0.1 |
Learning Rate Schedule | Cosine Annealing | Cosine Annealing |
Batch Size | 64 | 128 |
Weight Decay | 5×10⁻⁴ | 1×10⁻⁴ |
Momentum | 0.9 | 0.9 |
Epochs | 200 | 100 |
r | 16 | 16 |
K | {3, 5, 7} | {3, 5, 7} |
τ | 1.0 | 1.0 |
| NOTES: r represents the dimensionality reduction ratio of DMSCA, and K is the multi-scale convolution kernel. The learning rate and batch size on ImageNet are usually adjusted according to the number of GPUs used and the total batch size, for example, using a linear scaling rule[31]. The optimal value of the temperature coefficient τ may vary depending on the dataset and the model, and it will be adjusted in the experiments. | ||
Model | CIFAR-10 | CIFAR-100 | ImageNet(Top-1) | ImageNet(Top-5) | Avg. Improvement |
|---|---|---|---|---|---|
ResNet-18(BaseLine) | 94.2 ± 0.1 | 75.3 ± 0.2 | 69.76 ± 0.12 | 89.08 ± 0.09 | - |
ResNet-18 + SE-Net | 94.8 ± 0.1 | 76.1 ± 0.2 | 70.13 ± 0.10 | 89.45 ± 0.08 | + 0.6%+0.8% +0.37% |
ResNet-18 + CBAM | 95.1 ± 0.1 | 76.5 ± 0.2 | 70.72 ± 0.11 | 89.88 ± 0.07 | + 0.9%+1.2%+0.96% |
ResNet-18 + ECA-Net | 95.0 ± 0.1 | 76.3 ± 0.2 | 70.58 ± 0.09 | 89.79 ± 0.08 | + 0.8%+1.0%+0.82% |
ResNet-18 + CA | 95.2 ± 0.1 | 76.7 ± 0.2 | 70.89 ± 0.10 | 90.01 ± 0.07 | + 1.0%+1.4%+1.13% |
ResNet-18 + DMSCA | 96.5 ± 0.1 | 77.1 ± 0.2 | 71.03 ± 0.08 | 90.15 ± 0.06 | + 2.3%+1.8% +1.27% |
ResNet-34 (Baseline) | 94.8 ± 0.1 | 76.8 ± 0.2 | 73.31 ± 0.10 | 91.42 ± 0.07 | - |
ResNet-34 + SE-Net | 95.3 ± 0.1 | 77.4 ± 0.2 | 73.78 ± 0.09 | 91.75 ± 0.06 | + 0.5%+0.6% +0.47% |
ResNet-34 + CBAM | 95.9 ± 0.1 | 78.2 ± 0.2 | 74.25 ± 0.08 | 92.03 ± 0.05 | + 1.1%+1.4% +0.94% |
ResNet-34 + ECA-Net | 95.5 ± 0.1 | 77.6 ± 0.2 | 74.01 ± 0.09 | 91.89 ± 0.06 | + 0.7%+0.8% +0.70% |
ResNet-34 + CA | 95.7 ± 0.1 | 78.0 ± 0.2 | 74.18 ± 0.08 | 91.98 ± 0.05 | + 0.9%+1.2%+0.87% |
ResNet-34 + DMSCA | 97.1 ± 0.1 | 78.6 ± 0.2 | 74.52 ± 0.07 | 92.21 ± 0.04 | + 2.3%+1.8%+1.21% |
ResNet-50 (Baseline) | 95.1 ± 0.1 | 77.8 ± 0.2 | 76.13 ± 0.08 | 92.87 ± 0.05 | - |
ResNet-50 + SE-Net | 95.7 ± 0.1 | 78.4 ± 0.2 | 76.75 ± 0.07 | 93.28 ± 0.04 | + 0.6%+0.6% +0.62% |
ResNet-50 + CBAM | 96.0 ± 0.1 | 78.7 ± 0.2 | 77.12 ± 0.06 | 93.51 ± 0.04 | + 0.9%+0.9% +0.99% |
ResNet-50 + ECA-Net | 95.9 ± 0.1 | 78.6 ± 0.2 | 77.03 ± 0.07 | 93.45 ± 0.05 | + 0.8%+0.8% +0.90% |
ResNet-50 + CA | 96.1 ± 0.1 | 78.9 ± 0.2 | 77.28 ± 0.06 | 93.60 ± 0.04 | + 1.0%+1.1% +1.15% |
ResNet-50 + DMSCA | 97.3 ± 0.1 | 79.5 ± 0.2 | 77.65 ± 0.05 | 93.82 ± 0.03 | + 2.2%+1.7% +1.52% |
| Note: Avg. Improvement shows percentage increases in Top-1 accuracy compared to baseline models withidentical network structures (values represent CIFAR-10, CIFAR-100, ImageNet improvements). ImageNet results use single-center crop validation. All DMSCA improvements are statistically significant (p < 0.05, two-sided t-test). | |||||
| As demonstrated in Table II, DMSCA consistently outperforms all competing methods across all datasets and network architectures: | |||||
Method | Parameters (M) | FLOPs (G) | Memory (MB) | Inference (ms) |
|---|---|---|---|---|
ResNet-50(Baseline) | 25.56(-) | 4.11 | 335 | 8.3 ± 0.1 |
ResNet-50 + SE-Net | 28.08(+ 9.86%) | 4.12 | 360 | 8.7 ± 0.1(+ 4.82%) |
ResNet-50 + CBAM | 28.09(+ 9.90%) | 4.12 | 368 | 9.1 ± 0.1(+ 9.64%) |
ResNet-50 + ECA-Net | 25.57(+ 0.04%) | 4.11 | 342 | 8.5 ± 0.1(+ 2.41%) |
ResNet-50 + CA | 25.83(+ 1.06%) | 4.13 | 378 | 9.3 ± 0.1(+ 12.05%) |
ResNet-50 + DMSCA | 28.46(+ 11.34%) | 4.21 | 395 | 9.2 ± 0.1(+ 10.84%) |
| The efficiency analysis reveals: | ||||
Method | Top-1 Acc(%) | Δ Acc | Parameters (M) | Δ Parameters | FLOPs (G) |
|---|---|---|---|---|---|
ResNet-50 | 76.13 ± 0.08 | - | 25.56 | - | 4.11 |
ResNet-50 + SimAM | 76.89 ± 0.07 | + 0.76 | 25.56 | + 0.00 | 4.11 |
ResNet-50 + GAM | 77.35 ± 0.06 | + 1.22 | 26.78 | + 4.77 | 4.18 |
ResNet-50 + A²-Nets | 77.01 ± 0.07 | + 0.88 | 27.12 | + 6.10 | 4.25 |
ResNet-50 + BAM | 76.95 ± 0.08 | + 0.82 | 26.89 | + 5.20 | 4.19 |
ResNet-50 + DMSCA | 77.65 ± 0.05 | + 1.52 | 28.46 | + 11.34 | 4.21 |
Comparison Pair | Mean Diff | 95% CI | p-value | Cohen's d | Effect Size |
|---|---|---|---|---|---|
DMSCA vs Baseline | + 1.52% | [1.38%, 1.66%] | < 0.001 | 3.15 | Very Large |
DMSCA vs SE-Net | + 0.90% | [0.75%, 1.05%] | < 0.001 | 2.48 | Large |
DMSCA vs CBAM | + 0.53% | [0.39%, 0.67%] | < 0.001 | 1.97 | Large |
DMSCA vs ECA-Net | + 0.62% | [0.47%, 0.77%] | < 0.001 | 2.13 | Large |
DMSCA vs CA | + 0.37% | [0.22%, 0.52%] | < 0.001 | 1.56 | Large |
| Note: A p-value less than 0.05 indicates that the difference is statistically significant. Cohen's d is used to measure the effect size: 0.2 represents a small effect, 0.5 represents a medium effect, and 0.8 represents a large effect. | |||||
Hyperparameter type | Hyperparameter values | Top-1 Acc | ΔAcc | Parameter quantity | FLOPs (G) |
|---|---|---|---|---|---|
Reduction ratio(r) | 4 | 76.85 ± 0.18 | -0.28 | 11.45 | 1.83 |
8 | 77.02 ± 0.17 | -0.11 | 11.35 | 1.82 | |
16 | 77.13 ± 0.16 | 0.00 | 11.28 | 1.82 | |
32 | 76.98 ± 0.19 | -0.15 | 11.25 | 1.82 | |
temperature coefficient(τ) | 0.5 | 76.89 ± 0.18 | -0.24 | 11.28 | 1.82 |
1.0 | 77.13 ± 0.16 | 0.00 | 11.28 | 1.82 | |
1.5 | 77.08 ± 0.17 | -0.05 | 11.28 | 1.82 | |
2.0 | 76.95 ± 0.19 | -0.18 | 11.28 | 1.82 | |
Dynamic | 77.21 ± 0.15 | + 0.08 | 11.28 | 1.82 | |
Convolution kernel combination (K) | {3} | 76.45 ± 0.20 | -0.68 | 11.21 | 1.81 |
{3, 5} | 76.88 ± 0.18 | -0.25 | 11.25 | 1.82 | |
{3, 5, 7} | 77.13 ± 0.16 | 0.00 | 11.28 | 1.82 | |
{3, 5, 7, 9} | 77.09 ± 0.17 | -0.04 | 11.32 | 1.83 |
serial number | Model Configuration | Top1-Acc | ΔAcc (vs baseline) | ΔAcc (vs Prev) | Params (M) | FLOPs (G) |
|---|---|---|---|---|---|---|
1 | ResNet-18 | 75.32 ± 0.21 | - | - | 11.17 | 1.81 |
2 | ResNet-18 + GCE | 75.81 ± 0.19 | + 0.49 | + 0.49 | 11.20 | 1.81 |
3 | ResNet-18 + TCA(τ = 1) | 76.15 ± 0.20 | + 0.83 | + 0.34 | 11.21 | 1.81 |
4 | ResNet-18 + GCE + TCA (τ = dynamic, from DII) | 76.32 ± 0.18 | + 1.00 | + 0.17 | 11.21 | 1.81 |
5 | ResNet-18 + MS-SCE (K={3,5,7}) | 76.05 ± 0.22 | + 0.73 | - | 11.23 | 1.82 |
6 | ResNet-18 + GCE + TCA(τ = dyn) + MS-SCE(K={3,5,7}) | 76.68 ± 0.19 | + 1.36 | + 0.36 | 11.26 | 1.82 |
7 | ResNet-18 + GCE + TCA(τ = dyn) + MS-SCE + DII | 76.95 ± 0.17 | + 1.63 | + 0.27 | 11.27 | 1.82 |
8 | ResNet-18 + GCE + TCA (τ = dyn) + MS-SCE + DII + DFF | 77.08 ± 0.18 | + 1.76 | + 0.13 | 11.27 | 1.82 |
9 | ResNet-18 + DMSCA(Full) | 77.13 ± 0.16 | + 1.81 | + 0.05 | 11.28 | 1.82 |
| Note: ΔAcc (vs Prev) denotes accuracy change from previous configuration. Values are mean ± SD from five runs. | ||||||
Method | Focus Ratio† | Semantic Consistency‡ | Noise Ratio§ |
|---|---|---|---|
ResNet-18 | 0.65 ± 0.04 | 0.62 ± 0.05 | 0.25 ± 0.03 |
SE-Net | 0.72 ± 0.03 | 0.68 ± 0.04 | 0.15 ± 0.02 |
CBAM | 0.78 ± 0.03 | 0.74 ± 0.03 | 0.12 ± 0.02 |
ECA-Net | 0.75 ± 0.04 | 0.71 ± 0.04 | 0.13 ± 0.03 |
CA | 0.81 ± 0.02 | 0.77 ± 0.03 | 0.10 ± 0.02 |
DMSCA | 0.87 ± 0.02 | 084 ± 0.02 | 0.08 ± 0.01 |
| Note: † Focus ratio: Proportion of attention energy in target areas (via bounding boxes). ‡ Semantic consistency: Similarity (IoU/SSIM) to saliency maps. § Noise ratio: Energy in background (lower better). Higher is better for † and ‡. Values: mean ± SD. | |||
Degradation Type | baseline | +SE-Net | +CBAM | +ECA-Net | +CA | +DMSCA |
|---|---|---|---|---|---|---|
Original | 75.32 | 76.11 | 76.58 | 79.05 | 76.39 | 77.13 |
Gaussian Noise, σ = 15 | 65.21 | 66.48 | 67.05 | 66.76 | 66.98 | 68.25 |
Gaussian Noise, σ = 25 | 58.73 | 60.15 | 60.88 | 60.32 | 60.71 | 62.13 |
Motion Blur, kernel size = 7 | 68.14 | 69.32 | 69.98 | 69.51 | 69.77 | 71.15 |
JPEG Compression, quality = 30 | 70.14 | 71.75 | 72.46 | 71.99 | 72.25 | 73.58 |
Source dataset → Target dataset | Baseline | +SE-Net | +CBAM | +CA | +DMSCA |
|---|---|---|---|---|---|
CIFAR-100→CIFAR-10 | 92.1 | 92.8 | 93.2 | 93.4 | 94.1 |
ImageNet→CIFAR-100 | 82.3 | 82.3 | 83.7 | 84.0 | 84.4 |
ImageNet→Oxford-IIIT Pet | 89.2 | 89.2 | 90.1 | 90.6 | 91.7 |
ImageNet→Food-101 | 76.8 | 77.5 | 78.1 | 78.4 | 79.2 |
| In order to verify the universality of DMSCA, we conducted preliminary experiments on the tasks of object detection and semantic segmentation: | |||||
| Object Detection (COCO 2017): Under the Faster R-CNN framework, using ResNet-50 + DMSCA as the backbone network | |||||
| Baseline (ResNet-50): mAP = 37.4 | |||||
| ResNet-50 + DMSCA: mAP = 38.9 (+ 1.5) | |||||
| Semantic Segmentation (Cityscapes): Tested under the DeepLabV3 + framework | |||||
| Baseline (ResNet-50): mIoU = 78.2 | |||||
| ResNet-50 + DMSCA: mIoU = 79.6 (+ 1.4) | |||||