[cs231n]Assignment #3: ConvNets II, Transfer Learning, Visualization

文章目录

1. Dropout and Data Augmentation
2. Ensemble method
3. Transfer learning
4. Visualize Saliency Maps(very useful)
5. Fooling images for ConvNets (For detail code please view the ipython file)

K. Simonyan, A. Vedaldi, A. Zisserman , “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014
Szegedy, Christian, et al. “Intriguing properties of neural networks.” arXiv preprint, 2013.
Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images.” arXiv preprint, 2014.
Dropout and Data Augmentation
(Networks with dropout usually take a bit longer to train, so we will use more training epochs this time.)

Dropout usually applied to fully connected layer instead of convolutional layer.

Code for dropout:

def dropout_forward(x, dropout_param):
  """
  Performs the forward pass for (inverted) dropout.

  Inputs:
  - x: Input data, of any shape
  - dropout_param: A dictionary with the following keys:
    - p: Dropout parameter. We keep each neuron output with probability p.
    - mode: 'test' or 'train'. If the mode is train, then perform dropout;
      if the mode is test, then just return the input.
    - seed: Seed for the random number generator. Passing seed makes this
      function deterministic, which is needed for gradient checking but not in
      real networks.

  Outputs:
  - out: Array of the same shape as x.
  - cache: A tuple (dropout_param, mask). In training mode, mask is the dropout
    mask that was used to multiply the input; in test mode, mask is None.
  """
  p, mode = dropout_param['p'], dropout_param['mode']
  if 'seed' in dropout_param:
    np.random.seed(dropout_param['seed'])

  mask = None
  out = None

  if mode == 'train':
    pass
    mask = (np.random.rand(*x.shape)<p) / p
    out = x * mask
  elif mode == 'test':
    pass
    out = x

  cache = (dropout_param, mask)
  out = out.astype(x.dtype, copy=False)

  return out, cache

def dropout_backward(dout, cache):
  """
  Perform the backward pass for (inverted) dropout.

  Inputs:
  - dout: Upstream derivatives, of any shape
  - cache: (dropout_param, mask) from dropout_forward.
  """
  dropout_param, mask = cache
  mode = dropout_param['mode']
  if mode == 'train':
    ###########################################################################
    # TODO: Implement the training phase forward pass for inverted dropout.   #
    # Store the dropout mask in the mask variable.                            #
    ###########################################################################
    pass
    dx = mask * dout
    ###########################################################################
    #                            END OF YOUR CODE                             #
    ###########################################################################
  elif mode == 'test':
    dx = dout
  return dx

Code for data Augmentation

def random_flips(X):
  """
  Take random x-y flips of images.

  Input:
  - X: (N, C, H, W) array of image data.

  Output:
  - An array of the same shape as X, containing a copy of the data in X,
    but with half the examples flipped along the horizontal direction.
  """
  out = None
  #############################################################################
  # TODO: Implement the random_flips function. Store the result in out.       #
  #############################################################################
  pass
  N, C, H, W = X.shape
  out = np.zeros(X.shape)
  flag = np.random.rand(N)<0.5
  flag = flag.astype(int)
  flag[flag==0] = -1
  for i in xrange(N):
    out[i] = X[i,:,:,::flag[i]]


  #############################################################################
  #                           END OF YOUR CODE                                #
  #############################################################################
  return out


def random_crops(X, crop_shape):
  """
  Take random crops of images. For each input image we will generate a random
  crop of that image of the specified size.

  Input:
  - X: (N, C, H, W) array of image data
  - crop_shape: Tuple (HH, WW) to which each image will be cropped.

  Output:
  - Array of shape (N, C, HH, WW)
  """
  N, C, H, W = X.shape
  HH, WW = crop_shape
  assert HH < H and WW < W

  out = np.zeros((N, C, HH, WW), dtype=X.dtype)
  #############################################################################
  # TODO: Implement the random_crops function. Store the result in out.       #
  #############################################################################
  pass
  H_range = H - HH
  W_range = W - WW
  ratio_H = np.random.rand(N)
  ratio_W = np.random.rand(N)
  startH = np.round(ratio_H * H_range)
  startW = np.round(ratio_W * W_range)
  for i in xrange(N):
    out[i] = X[i,:,startH[i]:startH[i]+HH, startW[i]:startW[i]+WW]




  #############################################################################
  #                           END OF YOUR CODE                                #
  #############################################################################

  return out


def random_contrast(X, scale=(0.8, 1.2)):
  """
  Randomly adjust the contrast of images. For each input image, choose a
  number uniformly at random from the range given by the scale parameter,
  and multiply each pixel of the image by that number.

  Inputs:
  - X: (N, C, H, W) array of image data
  - scale: Tuple (low, high). For each image we sample a scalar in the
    range (low, high) and multiply the image by that scaler.

  Output:
  - Rescaled array out of shape (N, C, H, W) where out[i] is a contrast
    adjusted version of X[i].
  """
  low, high = scale
  N = X.shape[0]
  out = np.zeros_like(X)

  #############################################################################
  # TODO: Implement the random_contrast function. Store the result in out.    #
  #############################################################################
  pass
  ratio = np.random.rand(N)
  contrast = low + (high - low) * ratio
  for i in xrange(N):
    out[i] = contrast[i] * X[i]
  #############################################################################
  #                           END OF YOUR CODE                                #
  #############################################################################
  
  return out


def random_tint(X, scale=(-10, 10)):
  """
  Randomly tint images. For each input image, choose a random color whose
  red, green, and blue components are each drawn uniformly at random from
  the range given by scale. Add that color to each pixel of the image.

  Inputs:
  - X: (N, C, W, H) array of image data
  - scale: A tuple (low, high) giving the bounds for the random color that
    will be generated for each image.

  Output:
  - Tinted array out of shape (N, C, H, W) where out[i] is a tinted version
    of X[i].
  """
  low, high = scale
  N, C = X.shape[:2]
  out = np.zeros_like(X)

  #############################################################################
  # TODO: Implement the random_tint function. Store the result in out.        #
  #############################################################################
  pass
  tint_color = low + np.random.rand(N,C) * (high - low)
  #tint_color.reshape((N,C,1,1))
  for i in xrange(N):
    out[i] = X[i] + tint_color[i].reshape((C,1,1))
  #############################################################################
  #                           END OF YOUR CODE                                #
  #############################################################################

  return out


def fixed_crops(X, crop_shape, crop_type):
  """
  Take center or corner crops of images.

  Inputs:
  - X: Input data, of shape (N, C, H, W)
  - crop_shape: Tuple of integers (HH, WW) giving the size to which each
    image will be cropped.
  - crop_type: One of the following strings, giving the type of crop to
    compute:
    'center': Center crop
    'ul': Upper left corner
    'ur': Upper right corner
    'bl': Bottom left corner
    'br': Bottom right corner

  Returns:
  Array of cropped data of shape (N, C, HH, WW) 
  """
  N, C, H, W = X.shape
  HH, WW = crop_shape

  x0 = (W - WW) / 2
  y0 = (H - HH) / 2
  x1 = x0 + WW
  y1 = y0 + HH

  if crop_type == 'center':
    return X[:, :, y0:y1, x0:x1]
  elif crop_type == 'ul':
    return X[:, :, :HH, :WW]
  elif crop_type == 'ur':
    return X[:, :, :HH, -WW:]
  elif crop_type == 'bl':
    return X[:, :, -HH:, :WW]
  elif crop_type == 'br':
    return X[:, :, -HH:, -WW:]
  else:
    raise ValueError('Unrecognized crop type %s' % crop_type)

Ensemble method

Ensemble methods tends to always make the prediction better.

A simple way to implement an ensemble of models is to average the predicted probabilites for each model in the ensemble.

More concretely, suppose we have models $k$ models $m_1,\ldots,m_k$ and we want to combine them into an ensemble. If $p(x=y_i \mid m_j)$ is the probability that the input $x$ is classified as $y_i$ under model $m_j$, then the enemble predicts

$$p(x=y_i \mid {m_1,\ldots,m_k}) = \frac1k\sum_{j=1}^kp(x=y_i\mid m_j)$$

In this example, we have 10 pretrained models and the structures are all the same
Each of these models was trained for 25 epochs over the TinyImageNet-100-A training data with a batch size of 50 and with dropout on the hidden affine layer. Each model was trained using slightly different values for the learning rate, regularization, and dropout probability.

We can use all of them, or just part of them. And we can plot the validation accuracy with respect to the number of models selected.

Transfer learning

Just use the trained the network on a new dataset to get last fully connnected layer(RELUed) as the feature, and then use KNN/SVM/softmax as the classfier.
Use the network as the initial the parameter and fine-tune it with the new dataset.

Visualize Saliency Maps(very useful)

[1] K. Simonyan, A. Vedaldi, A. Zisserman , “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014

understand which part of an image is important for classification by visualizing the gradient of the correct class score with respect to the input image. Recall that if a region of the image has a high data gradient, then this indicates that the output of the ConvNet is sensitive to perturbations in that region of the input image.

We will do something similar, instead visualizing the gradient of the data loss with respect to the input image; this gives similar results and is cleaner to implement using our codebase.

The computation is relatively easy(use the conv_relu_pool_backward.)

Fooling images for ConvNets (For detail code please view the ipython file)

Two other papers [2, 3] showed that given a trained ConvNet, an input image, and a desired label, that we can add a small amount of noise to the input image to force the ConvNet to classify it as having the desired label.

Suppose that $L(x, y, m)$ is the data loss under model $m$, where we tell the network that the input $x$ should be classified as having label $y$. Given a starting image $x_0$, a desired label $y$, and a pretrained model $m$, we will create a fooling image $x_f$ by solving the following optimization problem:

$$x_f = \arg\min_x \left(L(x, y, m) + \frac\lambda2 |x - x_0|^2_2\right)$$

The term $|x - x_0|^2$ is $L_2$ regularization in image space which encourages the fooling image to look similar to the starting image, and the constant $\lambda$ is the strength of this regularization. We will use gradient descent to perform optimization under this model.

In the past, when using gradient descent we have stopped after a fixed number of iterations. Here we will use a different stopping criteria. Suppose that $p(x=y \mid m)$ is the probability that the input $x$ is assigned the label $y$ under the model $m$. We will specify a desired confidence threshold $t$ for the fooling image, and we will stop our optimization when we have $p(x_f=y\mid m) >= t$.

Fooling images from correctly classified images
Fooling image from random noise

Be a geek

梦想一定要有的，万一见鬼了呢

[cs231n]Assignment #3: ConvNets II, Transfer Learning, Visualization

Dropout and Data Augmentation

Ensemble method

Transfer learning

Visualize Saliency Maps(very useful)

Fooling images for ConvNets (For detail code please view the ipython file)