[cs231n]Assignment #2: Neural Networks, ConvNets I

文章目录

1. Train a Convnet
2. Things you should try:
3. Tips for training
4. Going above and beyond

Experiemence:

Design the Neural Network layer by laryer. For each layer, remember to implement the forward passs (save related parameters for backward pass) and backward pass.
First implement loss computation, then implement gradient computation. Then use gradient check just whether you implement correctly
For traning, baby sitting the training, print all the relevant parameters to guide the choose of hyper parameters.

Some examples:

affine_forward: save (x,w,b)

affine_backward: use dimension match, b is always the sum of dout along one axis

def affine_backward(dout, cache):
  """
  Computes the backward pass for an affine layer.

  Inputs:
  - dout: Upstream derivative, of shape (N, M)
  - cache: Tuple of:
    - x: Input data, of shape (N, d_1, ... d_k)
    - w: Weights, of shape (D, M)

  Returns a tuple of:
  - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
  - dw: Gradient with respect to w, of shape (D, M)
  - db: Gradient with respect to b, of shape (M,)
  """
  x, w, b = cache
  dx, dw, db = None, None, None
  #############################################################################
  # TODO: Implement the affine backward pass.                                 #
  #############################################################################
  pass
  N = x.shape[0]
  flat_x = x.reshape((N,-1))
  dflat_x = dout.dot(w.T)
  dx = dflat_x.reshape(x.shape)
  dw =  flat_x.T.dot(dout)
  db = np.sum(dout,0)
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx, dw, db

ReLU_forward: save x

ReLU_backward: use x to know which node should get back propagation(same as which node is fired during the farward pass)

def relu_backward(dout, cache):
  """
  Computes the backward pass for a layer of rectified linear units (ReLUs).

  Input:
  - dout: Upstream derivatives, of any shape
  - cache: Input x, of same shape as dout

  Returns:
  - dx: Gradient with respect to x
  """
  dx, x = None, cache
  #############################################################################
  # TODO: Implement the ReLU backward pass.                                   #
  #############################################################################
  pass
  indexs = x<=0
  dx = dout.copy()
  dx[indexs] = 0
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx

conv_forward
1. naive way to do, just using nested loops to move the filter acoss the image
2. using im2col, then do the convolution by matrix production
3. fft

conv_backward: Similar as conv_forwad(E.g. the naive way to implement is lke following)

"""
  A naive implementation of the backward pass for a convolutional layer.

  Inputs:
  - dout: Upstream derivatives.
  - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

  Returns a tuple of:
  - dx: Gradient with respect to x
  - dw: Gradient with respect to w
  - db: Gradient with respect to b
  """
  dx, dw, db = None, None, None
  #############################################################################
  # TODO: Implement the convolutional backward pass.                          #
  #############################################################################
  pass
  x, w, b, conv_param = cache
  N, C, H, W = x.shape
  F, _, HH, WW = w.shape
  pad = conv_param['pad']
  stride = conv_param['stride']

  dw = np.zeros((F, C, HH, WW))
  dx = np.zeros((N, C, H, W))
  H_prime = 1 + (H + 2 * pad - HH) / stride
  W_prime = 1 + (W + 2 * pad - WW) / stride
  padded_x = np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)),'constant',constant_values = 0)
  dpadded_x = np.zeros(padded_x.shape)


  db = dout.sum(-1).sum(-1).sum(0)
  for num in xrange(N):
    for i in xrange(F):
      for h in xrange(H_prime):
        for k in xrange(W_prime):
          dw[i] += dout[num,i,h,k] * padded_x[num,:, h*stride:h*stride+HH, k*stride:k*stride+WW]
          dpadded_x[num,:, h*stride:h*stride+HH, k*stride:k*stride+WW] += dout[num,i,h,k] * w[i]
  dx = dpadded_x[:,:,pad:H+pad,pad:W+pad]


  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx, dw, db

max_pool_forward: it seems max_pool operation can be transformed to fast filtering?
The naive solution is just using nested loops to select the max in each patch. and save x

max_pool_backward: according to x to know which node should got the gradient(argmax)
naive implementation:

def max_pool_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a max pooling layer.

  Inputs:
  - dout: Upstream derivatives
  - cache: A tuple of (x, pool_param) as in the forward pass.

  Returns:
  - dx: Gradient with respect to x
  """
  dx = None
  #############################################################################
  # TODO: Implement the max pooling backward pass                             #
  #############################################################################
  pass
  x, pool_param = cache
  N, C, H, W = x.shape
  HH = pool_param['pool_height']
  WW = pool_param['pool_width']
  stride = pool_param['stride']
  dx = np.zeros(x.shape)
  W_prime = (W - WW) / stride + 1
  H_prime = (H - HH) / stride + 1
  
  for num in xrange(N):
    for c in xrange(C):
      for i in xrange(H_prime):
        for j in xrange(W_prime):
          t_indexs = x[num,c,i*stride:(i+1)*stride,j*stride:(j+1)*stride].argmax(-1)
          t_max = x[num,c,i*stride:(i+1)*stride,j*stride:(j+1)*stride].max(-1)
          index1 = t_max.argmax(-1)
          index2 = t_indexs[index1]
          dx[num,c,i*stride+index1,j*stride + index2] = dout[num,c,i,j]
  



  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx

  8. Loss functions(SVM and softmax)
  ```python
  ef svm_loss(x, y):
  """
  Computes the loss and gradient using for multiclass SVM classification.

  Inputs:
  - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
  - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C

  Returns a tuple of:
  - loss: Scalar giving the loss
  - dx: Gradient of the loss with respect to x
  """
  N = x.shape[0]
  correct_class_scores = x[np.arange(N), y]
  margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
  margins[np.arange(N), y] = 0
  loss = np.sum(margins) / N
  num_pos = np.sum(margins > 0, axis=1)
  dx = np.zeros_like(x)
  dx[margins > 0] = 1
  dx[np.arange(N), y] -= num_pos
  dx /= N
  return loss, dx


def softmax_loss(x, y):
  """
  Computes the loss and gradient for softmax classification.

  Inputs:
  - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
  - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C

  Returns a tuple of:
  - loss: Scalar giving the loss
  - dx: Gradient of the loss with respect to x
  """
  probs = np.exp(x - np.max(x, axis=1, keepdims=True))
  probs /= np.sum(probs, axis=1, keepdims=True)
  N = x.shape[0]
  loss = -np.sum(np.log(probs[np.arange(N), y])) / N
  dx = probs.copy()
  dx[np.arange(N), y] -= 1
  dx /= N
  return loss, dx

Train a Convnet

Sanity check loss(parameter is 0, result is random guess)
Gradient check
Overfit small data(No error)
Train the net

Things you should try:

Filter size: Above we used 7x7; this makes pretty pictures but smaller filters may be more efficient
Number of filters: Above we used 32 filters. Do more or fewer do better?
Network depth: The network above has two layers of trainable parameters. Can you do better with a deeper network? You can implement alternative architectures in the file cs231n/classifiers/convnet.py. Some good architectures to try include:
- [conv-relu-pool]xN - conv - relu - [affine]xM - [softmax or SVM]
- [conv-relu-pool]XN - [affine]XM - [softmax or SVM]
- [conv-relu-conv-relu-pool]xN - [affine]xM - [softmax or SVM]

Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

If the parameters are working well, you should see improvement within a few hundred iterations
Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

Going above and beyond

If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.

Alternative update steps: For the assignment we implemented SGD+momentum and RMSprop; you could try alternatives like AdaGrad or AdaDelta.
Other forms of regularization such as L1 or Dropout
Alternative activation functions such as leaky ReLU or maxout
Model ensembles
Data augmentation

Be a geek

梦想一定要有的，万一见鬼了呢

[cs231n]Assignment #2: Neural Networks, ConvNets I

Train a Convnet

Things you should try:

Tips for training

Going above and beyond