文章目录

Lecture note

This lecture just give a example to train a two layer NN. I just wrote down some points need to remide. Please read the original one for details.

  1. The gradient of the softmax function:
    $$
    p_k = \frac{e^{f_k}}{ \sum_j e^{f_j} } \hspace{1in} L_i =-\log\left(p_{y_i}\right)
    $$

$$
\frac{\partial L_i }{ \partial f_k } = p_k - \mathbb{1}(y_i = k)
$$

Notice how elegant and simple this expression is. Suppose the probabilities we computed were p = [0.2, 0.3, 0.5], and that the correct class was the middle one (with probability 0.3). According to this derivation the gradient on the scores would be df = [0.2, -0.7, 0.5].

  1. When compute the loss, we need to divide by the number of training sample. So we should use
    loss/N rather than 1/N * loss, since in python 1/N just give you 0. Or use 1/(float)N. When dealing with arrays, we can usenumpy.astype` to change the data type.

  2. In NN, when dealing with the gradient of bias, so it would just be the formulate of np.sum(dout, axis = ?) for the axis just use the dimension matching rule use in last lecture.

  3. To calculate the gradient of RELU, we need to know which neuron is on fire.

    1
    2
    dhidden = np.dot(dscores, W2.T)
    dhidden[hidden_layer <= 0] = 0
文章目录