Object Counting Project using ESP32-CAM and OpenCV

Imagine a real-time object counting system that is budget-friendly and easy to implement. You can achieve this goal with an ESP32-CAM. Today we will build an ESP32-CAM Object Counting System. This project is a combination of the power of embedded systems and computer vision.

The main processor of the system is ESP32-CAM, a budget-friendly microcontroller with an integrated camera. This tiny powerhouse captures live video streams and transmits them over Wi-Fi. On the other side, a Python-based application processes these streams, detects objects using image processing techniques, and displays the count dynamically.

Whether it’s tracking inventory in a warehouse, monitoring traffic flow, or automating production lines, this system is versatile and adaptable. You can implement this project with a minimum number of components. It is quite easy.

Join us as we explore how to build this smart counting system step-by-step. You'll learn to configure the ESP32-CAM, process images in Python, and create a seamless, real-time object detection system. Let’s see how to bring this project to life!

System Architecture of the ESP32-CAM Object Counting System

The ESP32-CAM Object Counting System is built on a modular and efficient architecture, combining hardware and software components to achieve real-time object detection and counting. Below is a detailed breakdown of the system architecture:

1. Hardware Layer

  1. ESP32-CAM Module

    • Acts as the primary hardware for image capture and Wi-Fi communication.

    • Equipped with an onboard camera to stream live video at different resolutions.

    • Connects to a local Wi-Fi network to transmit data.

  2. Power Supply

    • Provides stable power to the ESP32-CAM module, typically via a USB connection or external battery pack.

2. Communication Layer

  1. Wi-Fi Connection

    • The ESP32-CAM connects to a local Wi-Fi network to enable seamless data transmission.

    • Uses HTTP requests to serve video streams at different resolutions.

  2. HTTP Server on ESP32-CAM

    • Runs a lightweight web server on port 80.

    • Responds to specific endpoints (/cam-lo.jpg, /cam-mid.jpg, /cam-hi.jpg) to provide real-time image frames at requested resolutions.


3. Processing Layer

  1. ESP32-CAM Side

    • Captures and processes raw image data using the onboard camera.

    • Serves the images as JPEG streams through the HTTP server.

  2. Python Application on Host Machine

    • Receives image streams from the ESP32-CAM using HTTP requests.

    • Processes the images using OpenCV for: 

      • Grayscale conversion.

      • Noise reduction with Gaussian blur.

      • Edge detection by using the Canny algorithm.

      • Contour detection to identify objects in the frame.

    • Counts the detected objects and updates the display dynamically.


4. User Interaction Layer

  1. Live Video Feed

    • Displays the real-time video stream with contours drawn around detected objects.

  2. Object Count Display

    • Provides a dynamic count of detected objects in the video feed.

    • The count is displayed on the console or integrated into a graphical interface.

  3. User Commands

    • Enables interaction through keyboard inputs (e.g., pressing 'a' to print the object count or 'q' to quit the application).

5. System Workflow

  1. The ESP32-CAM captures live video and streams it as JPEG images over a Wi-Fi network.

  2. The Python application on the host machine fetches the image frames via HTTP requests.

  3. The fetched images undergo processing in OpenCV to detect and count objects.

  4. The processed video is displayed, and the object count is dynamically updated based on user input.


This architecture ensures a clear separation of tasks, with the ESP32-CAM handling image capture and streaming, and the Python application focusing on image processing and visualization. The modular design makes it easy to expand or adapt the system for various applications.

List of components

Object Counting Project using ESP32-CAM and OpenCV

Components

Quantity

ESP32-CAM WiFi + Bluetooth Camera Module

1

FTDI USB to Serial Converter 3V3-5V

1

Male-to-female jumper wires

4

Female-to-female jumper wire

1

MicroUSB data cable

1

Circuit diagram

The following is the circuit diagram for this project.

Object Counting Project using ESP32-CAM and OpenCV

Fig: Circuit diagram

Object Counting Project using ESP32-CAM and OpenCV

ESP32-CAM WiFi + Bluetooth Camera Module

FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)

5V

VCC

GND

GND

UOT

Rx

UOR

TX

IO0

GND (FTDI or ESP32-CAM)

Programming

Board installation

If it is your first project with any board of the ESP32 series, you need to do the board installation first. If ESP32 boards are already installed in your Arduino IDE, you can skip this installation section. You may also need to install the CP210x USB driver.

  • Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK. 

Object Counting Project using ESP32-CAM and OpenCV

Fig: Board Installation

  • Go to Tools>Board>Boards Manager and install the ESP32 boards. 

Object Counting Project using ESP32-CAM and OpenCV

Fig: Board Installation

Install the ESP32-CAM library

  • Download the ESP32-CAM library from Github (the link is given in the reference section). Then install it by following the path sketch>include library> add.zip library. 

Object Counting Project using ESP32-CAM and OpenCV

Now select the correct path to the library, click on the library folder and press open.

Object Counting Project using ESP32-CAM and OpenCV

Board selection and code uploading

Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer by using a micro USB data cable. If the board has no connector, you have to connect the FTDI module to the computer with the data cable. If you never used the FTDI board on your computer, you will need to install the FTDI driver first.

  • After connecting the camera,  Go to Tools>boards>esp32>Ai thinker ESP32-CAM

Object Counting Project using ESP32-CAM and OpenCV

Fig: Camera board selection

After selecting the board, select the appropriate COM port and upload the following code:

#include

#include

#include

const char* WIFI_SSID = "Hamad";

const char* WIFI_PASS = "barsha123";

WebServer server(80);

static auto loRes = esp32cam::Resolution::find(320, 240);

static auto midRes = esp32cam::Resolution::find(350, 530);

static auto hiRes = esp32cam::Resolution::find(800, 600);

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

void handleJpgLo()

{

  if (!esp32cam::Camera.changeResolution(loRes)) {

    Serial.println("SET-LO-RES FAIL");

  }

  serveJpg();

}

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

void handleJpgMid()

{

  if (!esp32cam::Camera.changeResolution(midRes)) {

    Serial.println("SET-MID-RES FAIL");

  }

  serveJpg();

}

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());

  Serial.println("  /cam-lo.jpg");

  Serial.println("  /cam-hi.jpg");

  Serial.println("  /cam-mid.jpg");

  server.on("/cam-lo.jpg", handleJpgLo);

  server.on("/cam-hi.jpg", handleJpgHi);

  server.on("/cam-mid.jpg", handleJpgMid);

  server.begin();

}

void loop()

{

  server.handleClient();

}

After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Object Counting Project using ESP32-CAM and OpenCV

Fig: Code successfully uploaded to ESP32-CAM

You have to copy the IP address and paste it into the following part of your Python code.

Object Counting Project using ESP32-CAM and OpenCV

Python code

Copy-paste the following Python code and save it using a Python interpreter. 

import cv2

import urllib.request

import numpy as np


url = 'http://192.168.1.101/'  # Update the URL if needed

cv2.namedWindow("live transmission", cv2.WINDOW_AUTOSIZE)

while True:

    img_resp = urllib.request.urlopen(url + 'cam-lo.jpg')

    imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

    img = cv2.imdecode(imgnp, -1)

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    canny = cv2.Canny(cv2.GaussianBlur(gray, (11, 11), 0), 30, 150, 3)

    dilated = cv2.dilate(canny, (1, 1), iterations=2)

    (Cnt, _) = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

    # Draw contours

    cv2.drawContours(img, Cnt, -1, (0, 255, 0), 2)

    # Display the number of counted objects on the video feed

    count_text = f"Objects Counted: {len(Cnt)}"

    cv2.putText(img, count_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)


    cv2.imshow("live transmission", img)

    cv2.imshow("mit contour", canny)


    key = cv2.waitKey(5)

    if key == ord('q'):

        break


cv2.destroyAllWindows()

Setting Up Python Environment

Install Dependencies:

1)Create a virtual environment:
python -m venv venv

source venv/bin/activate  # Linux/Mac

venv\Scripts\activate   # Windows

2)Install required libraries:

pip install opencv-python numpy

After setting the Pythong Environment, run the Python code. 


ESP32-CAM code breakdown

#include

#include

#include


  • #include : Adds support for creating a lightweight HTTP server.

  • #include : Allows the ESP32 to connect to Wi-Fi networks.

  • #include : Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.

 

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

 


  • WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.

 WebServer server(80);


  • WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).

 

static auto loRes = esp32cam::Resolution::find(320, 240);

static auto midRes = esp32cam::Resolution::find(350, 530);

static auto hiRes = esp32cam::Resolution::find(800, 600);


esp32cam::Resolution::find: Defines three camera resolutions:

  • loRes: Low resolution (320x240).

  • midRes: Medium resolution (350x530).

  • hiRes: High resolution (800x600).

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

 

 


  • esp32cam::capture: Captures a frame from the camera.

  • Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.

  • Logging Success: Prints the resolution and size of the captured image.

  • Serving the Image:

    • Sets the content length and MIME type as image/jpeg.

    • Writes the image data directly to the client.

void handleJpgLo()

{

  if (!esp32cam::Camera.changeResolution(loRes)) {

    Serial.println("SET-LO-RES FAIL");

  }

  serveJpg();

}

 

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

 

void handleJpgMid()

{

  if (!esp32cam::Camera.changeResolution(midRes)) {

    Serial.println("SET-MID-RES FAIL");

  }

  serveJpg();

}

 


  • handleJpgLo: Switches the camera to low resolution using esp32cam::Camera.changeResolution(loRes) and calls serveJpg.

  • handleJpgHi: Switches to high resolution and serves the image.

  • handleJpgMid: Switches to medium resolution and serves the image.

  • Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

 

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());

  Serial.println("  /cam-lo.jpg");

  Serial.println("  /cam-hi.jpg");

  Serial.println("  /cam-mid.jpg");

 

  server.on("/cam-lo.jpg", handleJpgLo);

  server.on("/cam-hi.jpg", handleJpgHi);

  server.on("/cam-mid.jpg", handleJpgMid);

 

  server.begin();

}


  Serial Initialization:

  • Initializes the serial port for debugging.

  • Sets baud rate to 115200.

  Camera Configuration:

  • Sets pins for the AI Thinker ESP32-CAM module.

  • Configures the default resolution, buffer count, and JPEG quality (80%).

  • Attempts to initialize the camera and logs the status.

  Wi-Fi Setup:

  • Connects to the specified Wi-Fi network in station mode.

  • Waits for the connection and logs the device's IP address.

  Web Server Routes:

  • Maps URL endpoints (/cam-lo.jpg, /cam-hi.jpg, /cam-mid.jpg) to their respective handlers.

  Server Start:

  • Starts the web server.

void loop()

{

  server.handleClient();

}


  • server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.

Summary of Workflow

  1. The ESP32-CAM connects to Wi-Fi and starts a web server.

  2. URL endpoints (/cam-lo.jpg, /cam-mid.jpg, /cam-hi.jpg) let the user request images at different resolutions.

  3. The camera captures an image and serves it to the client as a JPEG.

  4. The system continuously handles new client requests.


Python code breakdown

Importing Libraries


import cv2

import urllib.request

import numpy as np

  • cv2: OpenCV library for image processing.

  • urllib.request: Used to fetch images from the live camera feed via an HTTP request.

  • numpy: Helps in manipulating and decoding image data into arrays.


Camera Setup


url = 'http://192.168.1.101/'  # Update the URL if needed

cv2.namedWindow("live transmission", cv2.WINDOW_AUTOSIZE)

  • url: The IP address of the camera with the endpoint cam-lo.jpg to get the image stream.

  • cv2.namedWindow: Creates a window to display the live video feed.


Main Loop


while True:

  • A loop continuously fetches and processes frames from the camera feed until the user quits by pressing 'q'.


Fetching the Image


img_resp = urllib.request.urlopen(url + 'cam-lo.jpg')

imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

img = cv2.imdecode(imgnp, -1)

  • urllib.request.urlopen: Sends an HTTP GET request to the camera URL and retrieves an image. Here you can use ‘cam-hi.jpg’ or ‘cam-mid.jpg’ instead. You can use any of the three resolutions of images and see which one gives you the best result.  

  • bytearray: Converts the image data into a binary format for processing.

  • np.array: Converts the binary data into a NumPy array.

  • cv2.imdecode: Decodes the NumPy array into an image (OpenCV-readable format).


Image Preprocessing


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

canny = cv2.Canny(cv2.GaussianBlur(gray, (11, 11), 0), 30, 150, 3)

dilated = cv2.dilate(canny, (1, 1), iterations=2)

  • cv2.cvtColor: Converts the image to grayscale for easier edge detection.

  • cv2.GaussianBlur: Applies a Gaussian blur to reduce noise and detail in the image.

    • Parameters (11, 11) specify the kernel size (area used for the blur).

  • cv2.Canny: Performs edge detection.

    • 30, 150: Lower and upper thresholds for edge detection.

    • 3: Size of the Sobel kernel.

  • cv2.dilate: Expands the edges detected by the Canny algorithm to close gaps and make objects more defined.

    • (1, 1): Kernel size for dilation.

    • iterations=2: Number of times the dilation is applied.


Finding Contours


(Cnt, _) = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

  • cv2.findContours: Finds the outlines of objects in the binary (edge-detected) image.

    • dilated.copy(): A copy of the dilated image is used to find contours.

    • cv2.RETR_EXTERNAL: Retrieves only the outermost contours.

    • cv2.CHAIN_APPROX_NONE: Retains all contour points without compression.

  • Cnt: List of all detected contours.


Drawing Contours


cv2.drawContours(img, Cnt, -1, (0, 255, 0), 2)

  • cv2.drawContours: Draws the detected contours onto the original image.

    • img: The image to draw on.

    • Cnt: The list of contours.

    • -1: Indicates that all contours should be drawn.

    • (0, 255, 0): The color of the contours (green in BGR format).

    • 2: Thickness of the contour lines.


Displaying the Object Count


count_text = f"Objects Counted: {len(Cnt)}"

cv2.putText(img, count_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

  • f"Objects Counted: {len(Cnt)}": A formatted string showing the number of detected objects.

  • cv2.putText: Adds the text onto the image.

    • img: The image to draw on.

    • (10, 30): Coordinates of the bottom-left corner of the text.

    • cv2.FONT_HERSHEY_SIMPLEX: The font style.

    • 1: Font scale (size).

    • (0, 0, 255): Text color (red in BGR format).

    • 2: Thickness of the text.


Displaying the Video Feed


cv2.imshow("live transmission", img)

cv2.imshow("mit contour", canny)

  • cv2.imshow: Displays images in separate windows.

    • "live transmission": Shows the original image with contours and text.

    • "mit contour": Shows the edge-detected binary image.


Keyboard Interaction

    key = cv2.waitKey(5)

    if key == ord('q'):

        break

  • cv2.waitKey: Waits for 5 milliseconds for a key press.

  • ord('q'): Checks if the 'q' key is pressed, and if so, breaks the loop to exit the program.

Cleanup

cv2.destroyAllWindows()

cv2.destroyAllWindows: Closes all OpenCV windows when the loop ends.


Summary of Workflow

  1. Fetches the image from the live camera feed.

  2. Processes the image to detect edges and contours.

  3. Counts and draws contours on the image.

  4. Displays the image with the object count overlaid.

  5. Exits when 'q' is pressed.

Testing


  1. Power up the ESP32-CAM and connect it to Wi-Fi.

  2. Run the Python script, ensuring the ESP32-CAM URL is correctly set.

  3. See the result of counting the objects in the display.

Note: The background and the objects should be of different colors.  If you place black objects on a black background, you will get the wrong results.

Object Counting Project using ESP32-CAM and OpenCV

Fig: coin counting

Troubleshooting:

  • Guru Meditation Error: Ensure stable power to the ESP32-CAM.

  • No Image Display: Check the IP address and ensure the ESP32-CAM is accessible from your computer.

  • Library Conflicts: Use a virtual environment to isolate Python dependencies.

To wrap up

This project demonstrates a seamless integration of an ESP32-CAM module and Python to build a real-time object-counting system. By using the ESP32-CAM's ability to capture and serve images over Wi-Fi, coupled with Python's powerful OpenCV library, we achieved an efficient and cost-effective solution for object counting and detection.

Throughout the tutorial, we explored each component in detail, from setting up the ESP32-CAM to processing live image streams with Python. Along the way, we learned to customize image resolutions, handle server routes, and enhance detection accuracy using OpenCV functions like edge detection and contour analysis.

This project not only provides a practical application but also serves as a solid foundation for more advanced computer vision systems. Whether you aim to integrate machine learning for object classification or scale this system for industrial monitoring, the possibilities are vast.

We hope this tutorial has inspired you to dive deeper into the world of IoT and computer vision. Happy building!