文章目录
  1. 1. Visualizing the activations and first-layer weights
  2. 2. Retrieving images that maximally activate a neuron
  3. 3. Embedding the codes with t-SNE
  4. 4. Occluding parts of the image

Lecture Note
References
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Rich feature hierarchies for accurate object detection and semantic segmentation
Visualizing and Understanding Convolutional Networks
Data Gradient.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
DeconvNet.
Visualizing and Understanding Convolutional Networks
Guided Backpropagation.
Striving for Simplicity: The All Convolutional Net
Reconstructing original images based on CNN Codes
Understanding Deep Image Representations by Inverting Them
How much spatial information is preserved?
Do ConvNets Learn Correspondence? (tldr: yes)
Plotting performance as a function of image attributes
ImageNet Large Scale Visual Recognition Challenge
Fooling ConvNets

Explaining and Harnessing Adversarial Examples
Comparing ConvNets to Human labelers

What I learned from competing against a ConvNet on ImageNet

Visualizing the activations and first-layer weights

See the original lecture.

Retrieving images that maximally activate a neuron

Another visualization technique is to take a large dataset of images, feed them through the network and keep track of which images maximally activate some neuron. We can then visualize the images to get an understanding of what the neuron is looking for in its receptive field.

Embedding the codes with t-SNE

Embed high-dimensional points so that locally, pairwise distances are conserved.
To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity). We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can them be visualized in a grid
One such visualization (among others) is shown in Rich feature hierarchies for accurate object detection and semantic segmentation

To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity). We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can them be visualized in a grid:
t-SNE
t-SNE visualization of CNN codes

Occluding parts of the image

One way of investigating which part of the image some classification prediction is coming from is by plotting the probability of the class of interest (e.g. dog class) as a function of the position of an occluder object. That is, we iterate over regions of the image, set a patch of the image to be all zero, and look at the probability of the class. We can visualize the probability as a 2-dimensional heat map.

文章目录
  1. 1. Visualizing the activations and first-layer weights
  2. 2. Retrieving images that maximally activate a neuron
  3. 3. Embedding the codes with t-SNE
  4. 4. Occluding parts of the image