文章目录
  1. 1. Train a Convnet
  2. 2. Things you should try:
  3. 3. Tips for training
  4. 4. Going above and beyond

Assignement Web

Experiemence:

  1. Design the Neural Network layer by laryer. For each layer, remember to implement the forward passs (save related parameters for backward pass) and backward pass.
  2. First implement loss computation, then implement gradient computation. Then use gradient check just whether you implement correctly
  3. For traning, baby sitting the training, print all the relevant parameters to guide the choose of hyper parameters.

Some examples:

  1. affine_forward: save (x,w,b)
  2. affine_backward: use dimension match, b is always the sum of dout along one axis

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    def affine_backward(dout, cache):
    """
    Computes the backward pass for an affine layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
    - x: Input data, of shape (N, d_1, ... d_k)
    - w: Weights, of shape (D, M)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """

    x, w, b = cache
    dx, dw, db = None, None, None
    #############################################################################
    # TODO: Implement the affine backward pass. #
    #############################################################################
    pass
    N = x.shape[0]
    flat_x = x.reshape((N,-1))
    dflat_x = dout.dot(w.T)
    dx = dflat_x.reshape(x.shape)
    dw = flat_x.T.dot(dout)
    db = np.sum(dout,0)
    #############################################################################
    # END OF YOUR CODE #
    #############################################################################
    return dx, dw, db
  3. ReLU_forward: save x

  4. ReLU_backward: use x to know which node should get back propagation(same as which node is fired during the farward pass)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    def relu_backward(dout, cache):
    """
    Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """

    dx, x = None, cache
    #############################################################################
    # TODO: Implement the ReLU backward pass. #
    #############################################################################
    pass
    indexs = x<=0
    dx = dout.copy()
    dx[indexs] = 0
    #############################################################################
    # END OF YOUR CODE #
    #############################################################################
    return dx
  5. conv_forward

    1. naive way to do, just using nested loops to move the filter acoss the image
    2. using im2col, then do the convolution by matrix production
    3. fft
  6. conv_backward: Similar as conv_forwad(E.g. the naive way to implement is lke following)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    """
    A naive implementation of the backward pass for a convolutional layer.

    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """

    dx, dw, db = None, None, None
    #############################################################################
    # TODO: Implement the convolutional backward pass. #
    #############################################################################
    pass
    x, w, b, conv_param = cache
    N, C, H, W = x.shape
    F, _, HH, WW = w.shape
    pad = conv_param['pad']
    stride = conv_param['stride']

    dw = np.zeros((F, C, HH, WW))
    dx = np.zeros((N, C, H, W))
    H_prime = 1 + (H + 2 * pad - HH) / stride
    W_prime = 1 + (W + 2 * pad - WW) / stride
    padded_x = np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)),'constant',constant_values = 0)
    dpadded_x = np.zeros(padded_x.shape)


    db = dout.sum(-1).sum(-1).sum(0)
    for num in xrange(N):
    for i in xrange(F):
    for h in xrange(H_prime):
    for k in xrange(W_prime):
    dw[i] += dout[num,i,h,k] * padded_x[num,:, h*stride:h*stride+HH, k*stride:k*stride+WW]
    dpadded_x[num,:, h*stride:h*stride+HH, k*stride:k*stride+WW] += dout[num,i,h,k] * w[i]
    dx = dpadded_x[:,:,pad:H+pad,pad:W+pad]


    #############################################################################
    # END OF YOUR CODE #
    #############################################################################
    return dx, dw, db
  7. max_pool_forward: it seems max_pool operation can be transformed to fast filtering?
    The naive solution is just using nested loops to select the max in each patch. and save x

  8. max_pool_backward: according to x to know which node should got the gradient(argmax)
    naive implementation:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    def max_pool_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a max pooling layer.

    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.

    Returns:
    - dx: Gradient with respect to x
    """

    dx = None
    #############################################################################
    # TODO: Implement the max pooling backward pass #
    #############################################################################
    pass
    x, pool_param = cache
    N, C, H, W = x.shape
    HH = pool_param['pool_height']
    WW = pool_param['pool_width']
    stride = pool_param['stride']
    dx = np.zeros(x.shape)
    W_prime = (W - WW) / stride + 1
    H_prime = (H - HH) / stride + 1

    for num in xrange(N):
    for c in xrange(C):
    for i in xrange(H_prime):
    for j in xrange(W_prime):
    t_indexs = x[num,c,i*stride:(i+1)*stride,j*stride:(j+1)*stride].argmax(-1)
    t_max = x[num,c,i*stride:(i+1)*stride,j*stride:(j+1)*stride].max(-1)
    index1 = t_max.argmax(-1)
    index2 = t_indexs[index1]
    dx[num,c,i*stride+index1,j*stride + index2] = dout[num,c,i,j]




    #############################################################################
    # END OF YOUR CODE #
    #############################################################################
    return dx

    8. Loss functions(SVM and softmax)
    ```python
    ef svm_loss(x, y):
    """
    Computes the loss and gradient using for multiclass SVM classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """

    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx


    def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """

    probs = np.exp(x - np.max(x, axis=1, keepdims=True))
    probs /= np.sum(probs, axis=1, keepdims=True)
    N = x.shape[0]
    loss = -np.sum(np.log(probs[np.arange(N), y])) / N
    dx = probs.copy()
    dx[np.arange(N), y] -= 1
    dx /= N
    return loss, dx

    Train a Convnet

    1. Sanity check loss(parameter is 0, result is random guess)
    2. Gradient check
    3. Overfit small data(No error)
    4. Train the net

    Things you should try:

  • Filter size: Above we used 7x7; this makes pretty pictures but smaller filters may be more efficient
  • Number of filters: Above we used 32 filters. Do more or fewer do better?
  • Network depth: The network above has two layers of trainable parameters. Can you do better with a deeper network? You can implement alternative architectures in the file cs231n/classifiers/convnet.py. Some good architectures to try include:
    • [conv-relu-pool]xN - conv - relu - [affine]xM - [softmax or SVM]
    • [conv-relu-pool]XN - [affine]XM - [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN - [affine]xM - [softmax or SVM]

Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

  • If the parameters are working well, you should see improvement within a few hundred iterations
  • Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
  • Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

Going above and beyond

If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.

  • Alternative update steps: For the assignment we implemented SGD+momentum and RMSprop; you could try alternatives like AdaGrad or AdaDelta.
  • Other forms of regularization such as L1 or Dropout
  • Alternative activation functions such as leaky ReLU or maxout
  • Model ensembles
  • Data augmentation
文章目录
  1. 1. Train a Convnet
  2. 2. Things you should try:
  3. 3. Tips for training
  4. 4. Going above and beyond