Importance Sampling

This post is about the idea of Importance Sampling. I will review the paper: Biased Importance Sampling for Deep Neural Network Training

What is Importance Sampling?

When training a model, it is obvious that not all samples are equally important; many of them are properly handled after a few epochs of training, and most could be ignored at that point without impacting the resulting final model.

Biased Importance Sampling for Deep Neural Network Training

In summary, the contributions of this work are:

The use of the loss instead of the gradient norm to estimate the importance of a sample
The creation of a model able to approximate the loss for a low computational overhead
The development of an online algorithm that minimizes a soft max-loss in the training set through importance sampling.
They also show how this method leads to better generalization.

Questions

What about catastrophic forgetting? Unimportant samples might become important later.
How to use this for self supervision in presence of large amount of data where we may not have access to complete training data at the same time?
Keeping a separate model and using loss on that seems somewhat redundant. Why not get some signal from the same model?

They have also open sourced their code on github (related reddit thread)

Importance Sampling

Importance sampling focuses the computation to informative/important samples

Recently by the same author:

Image Classification Networks

Review of latest Imagenet Classfication papers.

Karan Dwivedi

Computer Science Student, CV/DL Enthusiast

You may find interesting:

Unsupervised Domain Adaptation

Unsupervised domain adaptation can be easily extended to semi supervised / supervised approaches.

Graph Neural Networks (Defintions)

Different definitions of Graph Neural Networks

What is Importance Sampling?

Biased Importance Sampling for Deep Neural Network Training

Questions