Understading AlexNet
Reference:
- Caffe AlexNet Prototxt
- cs231n: Convolutional Neural Networks: Architectures, Convolution / Pooling Layers
- ImageNet Classification with Deep ConvolutionalNeural Networks
- slides of the paper
First in CS231n, we know the following (K is the kernal size, P is the padding size, S is the stride)
For convolutonal layer:
$$W_{new} = \frac {W_{old} - K + 2P}{S} + 1$$
For the pooling layer (we don’t use padding):
$$W_{new} = \frac {W_{old} - K}{S} + 1$$
Preprocessing
Resize the image, subtract the mean , and do the crop1
2Image----resize---->256 x 256----crop---->227 x 227 (it seems 224 in the original paper is wrong)
Here for crops: 4 conner patches and one center pathch
Network Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 input(size, channel) para output(size, channel)
Conv1 227 x 227, 3 K=11, S=4, P=0, group=1 55 x 55, 96
ReLU1
LRN1
Pool1 55 x 55, 96 K=3, S=2 27 x 27, 96
Conv2 27 x 27, 256 K=5, S = 1, P=2 group=2 27 x 27, 256 (since group=2, means48->128, 48->128 see the above image,two parts)
ReLU2
LRN2
Pool2 27 x 27, 256 K=3, S = 2, P = 0 13 x 13, 256
Conv3 13 x 13, 256 K=3, S=1, P=1, group=1 13 x 13, 384 (since group=1, in the above image, you can see the lines crossed in that layer)
ReLU3
Conv4 13 x 13, 384 K=3, S=1, P=1, group=2 13 x 13, 384 (group=2, split again, 192->192, 192->192)
ReLU4
Conv5 13 x 13, 384 K=3, S=1, P=1, group=2 13 x 13, 256 (group=2, two parts, 192->128, 192->128)
ReLU5
Pool5 13 x 13, 284 K=3, S=2 6 x 6, 256
FC6 6 x 6, 256 Fully connected 4096 x 1
ReLU6
Dropout6 (p=0.5)(Only in training phase, see 231n if forget)
FC7 4096, 1 Fully connected 4096 x 1
ReLU7
Dropout7 (p=0.5)(Only in training phase, see 231n if forget)
FC8 4096, 1 Fully connected 1000 x 1
LOSS 1000 x 1 SoftMAX loss function scalar