[cs231n]Putting it together: Minimal Neural Network Case Study

文章目录

Lecture note

This lecture just give a example to train a two layer NN. I just wrote down some points need to remide. Please read the original one for details.

The gradient of the softmax function:
$$
p_k = \frac{e^{f_k}}{ \sum_j e^{f_j} } \hspace{1in} L_i =-\log\left(p_{y_i}\right)
$$

$$
\frac{\partial L_i }{ \partial f_k } = p_k - \mathbb{1}(y_i = k)
$$

Notice how elegant and simple this expression is. Suppose the probabilities we computed were p = [0.2, 0.3, 0.5], and that the correct class was the middle one (with probability 0.3). According to this derivation the gradient on the scores would be df = [0.2, -0.7, 0.5].

When compute the loss, we need to divide by the number of training sample. So we should use
loss/N rather than 1/N * loss, since in python 1/N just give you 0. Or use 1/(float)N. When dealing with arrays, we can usenumpy.astype` to change the data type.
In NN, when dealing with the gradient of bias, so it would just be the formulate of np.sum(dout, axis = ?) for the axis just use the dimension matching rule use in last lecture.
To calculate the gradient of RELU, we need to know which neuron is on fire.
1
2
dhidden = np.dot(dscores, W2.T)
dhidden[hidden_layer <= 0] = 0

Be a geek

梦想一定要有的，万一见鬼了呢

[cs231n]Putting it together: Minimal Neural Network Case Study