Lecture Note
References:

Read More

Lecture Note
Hinton Note on the same topic
Reference
Read

About Nesterov’s Accelerated Momentum (NAG)
Advances in optimizing Recurrent Networks
Ilya Sutskever’s thesis

L-BFGS VS SGD

Read More

Data Preprocessing, Weight Initialization, Regularization (L2/L1/Maxnorm/Dropout), Loss functions
Lecture Notes
Some references:
Should read:
Elastic net regularization
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
Dropout Training as Adaptive Regularization
DropConnect

others:
Understanding the difficulty of training deep feedforward neural networks
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Hierarchical Softmax

Read More

import numpy as np
import scipy as scp

  • np.flatnonzero # return indices that are non-zero in the flattened version of a.
    This is equivalent to a.ravel().nonzero()[0].

    see also
    nonzero
    Return the indices of the non-zero elements of the input array.
    ravel
    Return a 1-D array containing the elements of the input array.

  • **numpy.random.choice(a, size=None, replace=True, p=None) #Generates a random sample from a given 1-D array

    a 1-D array-like or int
    If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a was np.arange(n)

  • ** numpy.argsort(a, axis=-1, kind=’quicksort’, order=None)[source]
    Returns the indices that would sort an array.

  • np.random.permutation
  • np.array_split
  • np.vstack