A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

Monocular depth estimation involves predicting scene depth from a single RGB image—a fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. […] The post A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV appeared first on MarkTechPost.

Mar 27, 2025 - 23:45
 0
A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

Monocular depth estimation involves predicting scene depth from a single RGB image—a fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.

!pip install -q timm opencv-python matplotlib

First, we install the necessary Python libraries—timm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.

!git clone https://github.com/isl-org/MiDaS.git
%cd MiDaS

Then, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.

import torch
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from torchvision.transforms import Compose
from google.colab import files


from midas.dpt_depth import DPTDepthModel
from midas.transforms import Resize, NormalizeImage, PrepareForNet
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)
model = model_path.to(device)
model.eval()

Here, we download the pretrained MiDaS DPT_Large model from Intel’s torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.

transform = Compose([
    Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet()
])

We define MiDaS’s image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.

uploaded = files.upload()
for filename in uploaded:
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    break

We allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.

img_input = transform({"image": img})["image"]
input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)


with torch.no_grad():
    prediction = model(input_tensor)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()


depth_map = prediction.cpu().numpy()

Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.

plt.figure(figsize=(10, 5))


plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")


plt.subplot(1, 2, 2)
plt.imshow(depth_map, cmap='inferno')
plt.title("Depth Map")
plt.axis("off")


plt.tight_layout()
plt.show()

Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the ‘inferno’ colormap for better contrast.

In conclusion, by completing this tutorial, we’ve successfully deployed Intel’s MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, we’ve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV appeared first on MarkTechPost.