[cs231n]Putting it together: Minimal Neural Network Case Study
Lecture note
This lecture just give a example to train a two layer NN. I just wrote down some points need to remide. Please read the original one for details.
- The gradient of the softmax function:
$$
p_k = \frac{e^{f_k}}{ \sum_j e^{f_j} } \hspace{1in} L_i =-\log\left(p_{y_i}\right)
$$
$$
\frac{\partial L_i }{ \partial f_k } = p_k - \mathbb{1}(y_i = k)
$$
Notice how elegant and simple this expression is. Suppose the probabilities we computed were p = [0.2, 0.3, 0.5]
, and that the correct class was the middle one (with probability 0.3). According to this derivation the gradient on the scores would be df = [0.2, -0.7, 0.5]
.
When compute the loss, we need to divide by the number of training sample. So we should use
loss/N
rather than1/N * lo
ss, since in python 1/N just give you 0. Or use 1/(float)N. When dealing with arrays, we can use
numpy.astype` to change the data type.In NN, when dealing with the gradient of bias, so it would just be the formulate of
np.sum(dout, axis = ?)
for the axis just use the dimension matching rule use in last lecture.To calculate the gradient of RELU, we need to know which neuron is on fire.
1
2dhidden = np.dot(dscores, W2.T)
dhidden[hidden_layer <= 0] = 0