[cs231n]Understanding and Visualizing Convolutional Neural Networks
Lecture Note
References
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Rich feature hierarchies for accurate object detection and semantic segmentation
Visualizing and Understanding Convolutional Networks
Data Gradient.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
DeconvNet.
Visualizing and Understanding Convolutional Networks
Guided Backpropagation.
Striving for Simplicity: The All Convolutional Net
Reconstructing original images based on CNN Codes
Understanding Deep Image Representations by Inverting Them
How much spatial information is preserved?
Do ConvNets Learn Correspondence? (tldr: yes)
Plotting performance as a function of image attributes
ImageNet Large Scale Visual Recognition Challenge
Fooling ConvNets
Explaining and Harnessing Adversarial Examples
Comparing ConvNets to Human labelers
What I learned from competing against a ConvNet on ImageNet
Visualizing the activations and first-layer weights
See the original lecture.
Retrieving images that maximally activate a neuron
Another visualization technique is to take a large dataset of images, feed them through the network and keep track of which images maximally activate some neuron. We can then visualize the images to get an understanding of what the neuron is looking for in its receptive field.
Embedding the codes with t-SNE
Embed high-dimensional points so that locally, pairwise distances are conserved.
To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity). We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can them be visualized in a grid
One such visualization (among others) is shown in Rich feature hierarchies for accurate object detection and semantic segmentation
To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity). We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can them be visualized in a grid:
t-SNE
t-SNE visualization of CNN codes
Occluding parts of the image
One way of investigating which part of the image some classification prediction is coming from is by plotting the probability of the class of interest (e.g. dog class) as a function of the position of an occluder object. That is, we iterate over regions of the image, set a patch of the image to be all zero, and look at the probability of the class. We can visualize the probability as a 2-dimensional heat map.