AlexNet is a deep convolutional neural network (CNN) that gained significant attention and marked a breakthrough in the field of deep learning, especially in the context of image classification. It was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012.
LSVRC (Large Scale Visual Recognition Challenge) is a competition where research teams evaluate their algorithms on a huge dataset of labeled images (ImageNet) and compete to achieve higher accuracy on several visual recognition tasks. This made a huge impact on how teams approach the completion afterward.
The AlexNet contains 8 layers with weights;
5 convolutional layers 3 fully connected layers.
Here are the key components and features of AlexNet:
AlexNet consists of eight layers in total, with five convolutional layers followed by three fully connected layers.
It uses the rectified linear unit (ReLU) activation function, which helps in overcoming the vanishing gradient problem and speeds up the training process.
The first convolutional layer has 96 kernels of size
11×11×3 (where 3 is for the RGB channels).
Subsequent convolutional layers use smaller filter sizes (e.g.,
5*5) and are followed by max-pooling layers.
Max-pooling is applied to reduce the spatial dimensions and capture the most important features.
Local Response Normalization (LRN):
LRN is applied after the first and second convolutional layers to normalize the responses within a local neighborhood.
This helps enhance the contrast between the activated neurons and improves generalization.
Fully Connected Layers:
The final three layers are fully connected layers. The first two have 4096 neurons each, and the last one has 1000 neurons corresponding to the ImageNet classes.
Dropout is applied to these fully connected layers during training to prevent overfitting.
The output layer uses the softmax activation function to convert the final layer's outputs into probabilities for different classes.
5 convolution layer , 3 fully connected layer
96 feature image map ,one is edge,other is corner
In first convolution
96 feature map p=0(valid padding means there is no padding) s=4(stride) n=227 conv or filter size=k=11*11 input dimension size=n=227*227 [(227-11+2*0)/4] +1=55
In second convolution
p=0(valid padding means there is no padding) s=2(stride) max pooling=k=3*3 input dimension size=n=55*55 [(55-3+2*0)/2] +1=27
max-pooling====>decreasing dimension without making lose information
basic goals of image net architecture====> decreasing width and height and increasing depth