Â
Â
Â
7 September
Bohua Peng
Background
Â
Problem: Big data often come with noisy or ambiguous content. Learning in a random order leads to overfitting (memorizing) .
Â
Can we improve generlization without changing the base ML method?
Human learners perform better when learning from easy to hard .[1][2]
Background
Â
How to define the difficulty of an image? scoring functioin?
When should we add in more difficult images? pacing function?
Distribution of fixed (precomputed) scores is binary
Â
Outlines:
Â
How to define difficulty?
Part1: Instance-level difficulty metrics
Part2: Paced / Self-paced learning methods
When should we add in more difficult samples?
We focus on estimating instance-level difficulty score for image classification
CIFAR10-H shows human uncertainty for image classification
Ambiguity: 79 examples are misclassified by annotators
uncalibrated
Gap between GT class and the second-largest logit
The distribution of output margins of samples
2D difficulty score: Prediction depth[3]
Prediction depth is the earliest layer where the subsequent k-NN predictions coverge.
temperature scaling
Generally, easy samples have lower prediction depth while difficult samples have larger prediction depth
 We measure two prediction depth scores for each example
Never learnt
Acc. about 0.01
Easily affected
by seeds
about 0.5
Â
Informational
Acc. About 0.8
Â
Easy
 Acc. about 1.0
Â
(cut out the right amount of uncertainty)
A highly interpretable difficulty score -Â Angular Gap
Base method:
Normalized softmax classifier
Future work:
Allow BN (gamma, beta)
to be learnt during stage two
Â
We compute the correlation between 9 difficulty scores.
Footnotes:
H stands for human scores
ResNet18[1] is calibrated
ResNet18[2] is uncalibrated (control group)
Part1 Conclusion:
Conclusion:
Our solution: Adding local randomness to the procomputed order with class balancing
Standard paced learning (PL) uses a fixed precomputed order
Problem:Â Unstable training
dict
10 linear pacing functions
class-balance-aware scheduler
Self-paced learning (SPL) uses dynamic weights
Â
A linear classifier with SPL
MLP with SPL
The difficulty scores measured by
unconverged DNN are very different
from pretrained teachers
Ablation study of (PL methods) on CIFAR10-H
Standard paced learning can easily collapse during training
Accuracy
Loss
We train ResNet18 from scratch
with paced learning using our scheduler
Â
CL outperforms Adaptive Sharpness-aware Minimization (ASAM)
on test accuracy [8]
CIFAR10-H
CIFAR100
Results of paced learning with our CLA scheduler
CIFAR10-H
About 2%
CIFAR100
About 2%
For more search grids, please read our final report.
C-scores
Forgetting events
SPL cross entropy loss
(2nd epoch)
Does class imbalancing decrease test accuracy in SPL?
Can we improve SPL with model calibration?
Unsolved questions:
Difficulty scorese measured by unconverged DNNs are differentÂ
Transfer an EfficientNet B0 (SOTA) pretrained on ImageNet to CIFAR100
Training from easy to hard gives the pretrained model a better startup.
Â
Future work
Merge CL with knowledge distillation
Take-home messages:
Â
Our main contributions
Part One:
Part Two:
Reference:
[1] Yoshua Bengio, J. Louradour, Ronan Collobert, and J. Weston. Curriculum learning. 2009. pages 6, 19, 41
[2] Jerome S Bruner, Jacqueline J Goodnow, and George A Austin. A study of thinking. Rout- ledge, 2017. pages 5
[3] Robert JN Baldock, Hartmut Maennel, and Behnam Neyshabur. Deep learning through the lens of example difficulty. arXiv preprint arXiv:2106.09647, 2021. pages 13, 16, 27
[4] Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition, 2018. pages 35
[5] R. Battleday, Joshua C. Peterson, and T. Griffiths. Capturing human categorization of natural images by combining deep networks and cognitive models. Nature Communications, 11, 2020. pages 7, 8
[6] Mobarakol Islam and B. Glocker. Spatially varying label smoothing: Capturing uncertainty from expert annotations. In IPMI, 2021. pages 24
[7] Lu Jiang, Zhengyuan Zhou, Thomas Leung, L. Li, and Li Fei-Fei. Mentornet: Learning data- driven curriculum for very deep neural networks on corrupted labels. In ICML, 2018. pages 22
[8] Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks, 2021. pages 53
Q&A