Mobilenet V2 Architecture in Computer Vision (original) (raw)

Last Updated : 12 May, 2026

MobileNet V2 is an efficient convolutional neural network architecture designed for mobile and embedded vision applications. Developed by Google, it improves upon MobileNet V1 by enhancing performance while maintaining a lightweight design suitable for resource-constrained environments.

Key Features

1. Inverted Residuals

MobileNet V2 introduces inverted residual blocks, which are its core building units. Instead of reducing dimensions first (as in traditional residual blocks), it first expands the input and then compresses it back. An inverted residual block consists of three steps:

This design helps reduce computation while maintaining important features.

2. Depthwise Separable Convolutions

Like MobileNet V1, MobileNet V2 uses depthwise separable convolutions to make the model efficient and reduce the number of parameters and computations significantly.

It splits standard convolution into:

3. Linear Bottlenecks

MobileNet V2 uses linear bottlenecks in the final projection layer of each block.

4. ReLU6 Activation Function

MobileNet V2 uses ReLU6, a variation of ReLU.

Architecture

MobileNet V2 follows a streamlined architecture built around inverted residual blocks, which serve as the core building units of the network.

  1. **Initial Convolution Layer: A standard convolution layer with 32 filters and a stride of 2.
  2. **Series of Inverted Residual Blocks: The network contains several stages, each with a specific number of inverted residual blocks. The expansion factors, output channels, and strides vary across stages to manage the computational complexity and receptive field.
  3. **Final Convolution Layer: A 1x1 convolution layer with 1280 filters, followed by a global average pooling layer.
  4. **Fully Connected Layer: A fully connected layer with softmax activation for classification tasks.

Detailed Layer Configuration

The following table shows the layer-wise configuration of MobileNet V2:

Layer Type Input Size Output Size Kernel Size Stride Expansion Factor
Initial Conv 224x224x3 112x112x32 3x3 2 -
Inverted Residual Block 112x112x32 112x112x16 3x3 1 1
Inverted Residual Block x2 112x112x16 56x56x24 3x3 2 6
Inverted Residual Block x3 56x56x24 28x28x32 3x3 2 6
Inverted Residual Block x4 28x28x32 14x14x64 3x3 2 6
Inverted Residual Block x3 14x14x64 14x14x96 3x3 1 6
Inverted Residual Block x3 14x14x96 7x7x160 3x3 2 6
Inverted Residual Block x1 7x7x160 7x7x320 3x3 1 6
Final Conv 7x7x320 7x7x1280 1x1 1 -
Global Avg Pooling 7x7x1280 1x1x1280 - - -
Fully Connected 1x1x1280 1x1x1000 - - -

Implementing MobileNet V2 using TensorFlow

Consider an example of using a pre-trained MobileNet V2 model to classify an image of a cat.

Python `

import tensorflow as tf from tensorflow.keras.applications import MobileNetV2 from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions import numpy as np

Load the MobileNetV2 model

model = MobileNetV2(weights='imagenet')

Load an image for testing

img_path = '/content/simba-8618301_1280.jpg' # Path to your test image img = image.load_img(img_path, target_size=(224, 224))

Preprocess the image

x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x)

Make predictions

preds = model.predict(x) print('Predicted:', decode_predictions(preds, top=3)[0])

`

**Output:

Predicted: [('n02123045', 'tabby', 0.5783735), ('n02123159', 'tiger_cat', 0.11342117), ('n02124075', 'Egyptian_cat', 0.05013833)]

The model returns a list of predictions, where each entry contains a class ID, class name, and its probability score.

Advantages

Applications