

If you peek at the original paper, I especially recommend checking out Section 3.2, titled “Global Average Pooling”. The authors then applied a softmax activation function to yield the predicted probability of each class.

The max pooling layer was then fed to a GAP layer, which yielded a vector with a single entry for each possible object in the classification task. The first paper to propose GAP layers designed an architecture where the final max pooling layer contained one activation map for each image category in the dataset. GAP layers reduce each feature map to a single number by simply taking the average of all values. However, GAP layers perform a more extreme type of dimensionality reduction, where a tensor with dimensions is reduced in size to have dimensions. Similar to max pooling layers, GAP layers are used to reduce the spatial dimensions of a three-dimensional tensor. In the last few years, experts have turned to global average pooling (GAP) layers to minimize overfitting by reducing the total number of parameters in the model. In practice, dropout layers are used to avoid overfitting. Notice that most of the parameters in the model belong to the fully connected layers!Īs you can probably imagine, an architecture like this has the risk of overfitting to the training dataset. The final max pooling layer is then flattened and followed by three densely connected layers. You will notice five blocks of (two to three) convolutional layers followed by a max pooling layer.

Python - c 'from 16 import VGG16 VGG16().summary()'
