What is Max-Pooling and why we use it with Convolutional Neural Networks?

When we need to use Max-Pooling?

3 min readAug 17, 2020

When you learn subject you should understand its part in the bigger picture. You should know what you don’t know.
To give you the whole picture, I always tell you about things that are exists on this subject — but, we don’t talk about them in this article.

What we’ll talk about?

When should we use Max-Pooling?
How it works

What we’ll NOT talk about?

How to code it — syntax is changed frequently and you can figure it out by Documentations.
Average Pooling
Global Max Pooling — check it out if you have images with different sizes.
Stride parameter in Max Pooling

Max pooling Formal Definition (from computersciencewiki.org):

Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.

When should we use Max pooling?

The problem: When we train a CNN (Convolutional Neural Networks) we sometimes want to detect an object in the image. For example, we train a model to look on images and say if a specific image contains a dog or not. We push a lot of images to the model for training — and the model is confusing because in each image the dog is in another position. What if we could help the model and make it not looking on position and “learn” better with no mentioning of position of the object?

The Solution: Max pooling.

We use Max pooling when we want to detect an object inside an image and say whether it exists on the image or not, Without reference to the position of the object in the image. Let’s see how it works.

How does it work in reality?

Let’s say the feature we want to detect is in the left-bottom corner. The values in that area are high because our algorithm find the feature it supposes to find. What happens when we start to do max-pooling?

If we do max-pooling one more time we get only 1 pixel with the value 112. And this is how we know our feature is inside the image.

Let’s do it again in our mind — but now, let’s imagine the value 112 is in another position. We still get 1 pixel with the value 112 — so it doesn’t matter where the feature is — we’ll detect it.

Of course, when you choose the max pooling size (here we do it with 2*2 size) you need to consider a few things:

How many pixels are in your image?

2. In most of your images, are the features are just a little piece of the image or the entire subject in the image?

It’s an hyperparameter, Most of the time you’ll do trial and error on max-pooling size and amount of layers of pooling.

Note: we don’t know the original position after we do max-pooling and that’s why it fit for this kind of missions - binary feature detection. It’s not a good option when we need to find position of feature in the image. (you may learn in the future about how we use it when we do want to find the position but it is out of the scope right now.)

If you want to understand it visually I recommend to see this video: https://www.youtube.com/watch?v=ZjM_XQa5s6s