ESP32-CAM-Based Real-Time Face Detection and Counting System

Hello friends. We hope you are doing fine. Today we are back with another interesting project. It is based on the image processing technology. Developing efficient and cost-effective solutions for real-time applications is becoming increasingly important in the area of embedded systems and computer vision. This project makes full use of ESP32-CAM. ESP32-CAM is a compact and AI-enabled microcontroller with built-in Wi-Fi capabilities. We will create a real-time face detection and counting system.

The ESP32-CAM serves as the core of the system. It captures high-resolution images at 800x600 resolution and hosts an HTTP server to serve individual JPEG images over a local network. The device’s efficient JPEG compression and network capabilities ensure minimal latency while maintaining high-quality image delivery, enabling real-time processing on the client side.

On the client side, a Python application powered by OpenCV collects image frames from the ESP32-CAM. Using Haar cascade classifiers, the application detects faces in each frame. It can also figure out whether they are frontal or in profile orientation.

This project is focused on face detection and counting. It marks detected faces with bounding boxes. It also counts both frontal and profile faces seen in the video stream.

Applications of this face detection and counting system include smart attendance systems, people flow monitoring in public spaces, and automation solutions in retail or event management. This project demonstrates how IoT-enabled devices like the ESP32-CAM can work seamlessly with computer vision algorithms to provide cost-effective and reliable solutions for real-world challenges. By focusing solely on face detection and counting, the system achieves an optimal balance between simplicity, scalability, and computational efficiency.

System Architecture of Face Counting with ESP32-CAM and Python

1. Hardware Layer:

  • ESP32-CAM:

    • Captures images at a resolution of 800x600 (or specified resolution).

    • Serves captured images over an HTTP server at a specific endpoint (e.g., /cam-hi.jpg).

    • Configured to operate as an access point or station mode connected to Wi-Fi.

  • Network Connection:

    • Wi-Fi provides communication between the ESP32-CAM and the Python application running on a computer.

  • Computer:

    • Runs the Python application to process the images and display results.

2. Software Layer:

  • ESP32-CAM Firmware:

    • Configures the camera for capturing images.

    • Sets up a lightweight HTTP server to serve JPEG images to connected clients.

  • Python Application:

    • Fetches images from the ESP32-CAM.

    • Processes images to count and annotate detected faces.

3. Communication Layer:

  • HTTP Protocol:

    • The ESP32-CAM serves images using HTTP.

    • The Python application uses HTTP GET requests to fetch the images from the camera.

4. Face Detection and Processing Layer:

  • Image Acquisition:

    • Python fetches images from the ESP32-CAM endpoint.

  • Preprocessing:

    • Converts the fetched image to a format suitable for OpenCV operations (e.g., cv2.imdecode to convert byte data into an image).

  • Face Detection:

    • Uses OpenCV's Haar Cascade classifiers to detect:

      • Frontal Faces: Uses haarcascade_frontalface_default.xml.

      • Profile Faces: Uses haarcascade_profileface.xml.

    • Counts the number of faces detected in the current frame.

  • Annotation:

    • Draws bounding boxes (rectangles) and labels around detected faces on the image frame.

    • Adds text overlays to display the count of detected frontal and profile faces.

5. User Interface Layer:

  • Visual Output:

    • Displays the annotated frames with bounding boxes and face counts in a real-time OpenCV window titled "Face Detector."

  • User Interaction:

    • Allows the user to terminate the application by pressing the 'q' key.

6. Workflow Summary:

  1. Image Capture:

    • ESP32-CAM captures and serves the image.

  2. Image Fetching:

    • Python retrieves the image via an HTTP GET request.

  3. Processing and Detection:

    • Haar Cascade classifiers detect faces, count them, and annotate the frame.

  4. Display and Output:

    • Python displays the processed image in a GUI window with visual feedback for face counts.

  5. Loop and Termination:

    • The loop continues until the user exits.

List of components

Components

Quantity

ESP32-CAM WiFi + Bluetooth Camera Module

1

FTDI USB to Serial Converter 3V3-5V

1

Male-to-female jumper wires

4

Female-to-female jumper wire

1

MicroUSB data cable

1

Circuit diagram

The following is the circuit diagram for this project.

Fig: Circuit diagram

ESP32-CAM WiFi + Bluetooth Camera Module

FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)

5V

VCC

GND

GND

UOT

Rx

UOR

TX

IO0

GND (FTDI or ESP32-CAM)

Programming

Board installation

If it is your first project with any board of the ESP32 series, you need to do the board installation first. If ESP32 boards are already installed in your Arduino IDE, you can skip this installation section. You may also need to install the CP210x USB driver.

  • Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK. 

Fig: Board Installation

  • Go to Tools>Board>Boards Manager and install the ESP32 boards. 

Fig: Board Installation

Install the ESP32-CAM library.

  • Download the ESP32-CAM library from Github (the link is given in the reference section). Then install it by following the path sketch>include library> add.zip library. 

Now select the correct path to the library, click on the library folder and press open. 

Board selection and code uploading.

Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer by using a micro USB data cable. If the board has no connector, you have to connect the FTDI module to the computer with the data cable. If you never used the FTDI board on your computer, you will need to install the FTDI driver first.

  • After connecting the camera,  Go to Tools>boards>esp32>Ai thinker ESP32-CAM

Fig: Camera board selection

After selecting the board, select the appropriate COM port and upload the following code:

#include

#include

#include

 

const char* WIFI_SSID = "Hamad";

const char* WIFI_PASS = "barsha123";

 

WebServer server(80);

 


static auto hiRes = esp32cam::Resolution::find(800, 600);

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

 


 

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

 


 

 

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

 

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());


  Serial.println("  /cam-hi.jpg");


 

 

  server.on("/cam-hi.jpg", handleJpgHi);


 

  server.begin();

}

 

void loop()

{

  server.handleClient();

}



After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Fig: Code successfully uploaded to ESP32-CAM

You have to copy the IP address and paste it into the following part of your Python code. 

Python code

Haar Cascade Models

Face detection in this project relies on pre-trained Haar cascade models provided by OpenCV. These models are essential for detecting features like frontal and profile faces in images. Haar cascades are XML files containing trained data for specific object detection tasks. For this project, the following models are used:

  1. Frontal Face Detection Model: haarcascade_frontalface_default.xml

  2. Profile Face Detection Model: haarcascade_profileface.xml

These files are mandatory for the Python code to perform face detection. Below is a guide on how to download and set up these files.


Step 1: Downloading the Models

The Haar cascade models can be downloaded directly from OpenCV’s GitHub repository.

  1. Open your web browser and go to the OpenCV GitHub repository for Haar cascades:
    https://github.com/opencv/opencv/tree/master/data/haarcascades

  2. Locate the following files in the repository:

    • haarcascade_frontalface_default.xml

    • haarcascade_profileface.xml

  3. Click on each file to open its content.

  4. On the file's page, click the Raw button to view the raw XML content.

  5. Right-click and select Save As to download the file. Save it with its original filename (.xml extension) to the directory where your Python script (main.py) is saved.


Step 2: Placing the Files

Since the XML files are placed in the same directory as your Python script, there is no need to specify a separate folder in your code. Ensure the downloaded files are saved in the same directory as your script, as shown below:

project_folder/

├── main.py

├── haarcascade_frontalface_default.xml

└── haarcascade_profileface.xml



Step 3: Updating the Python Script

Update your script to load the models from the current directory. This requires referencing the XML files directly without a folder path:

frontal_face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier("haarcascade_profileface.xml")



Verifying the Setup

  1. Ensure the XML files are saved in the same directory as the Python script.

  2. Run the Python script. If the models load successfully, there will be no errors related to file loading, and face detection should function as expected.

By downloading the files and placing them in the same directory as your script, you simplify the setup and enable seamless face detection functionality.




Main python script 

Copy-paste the following Python code and save it using a Python interpreter. 

import cv2

import requests

import numpy as np


# Replace with your ESP32-CAM's IP address

ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"


# Load Haar Cascades for different types of face detection

frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")


def process_frame(frame):

    # Convert to grayscale for detection

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)


    # Perform frontal face detection

    frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))


    # Perform profile face detection

    profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))


    # Draw rectangles for detected frontal faces

    for (x, y, w, h) in frontal_faces:

        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)  # Red for frontal faces

        cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)


    # Draw rectangles for detected profile faces

    for (x, y, w, h) in profile_faces:

        cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)  # Blue for profile faces

        cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)


    # Add detection counts to the frame

    cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)


    return frame


while True:

    # Fetch an image from the ESP32-CAM

    response = requests.get(ESP32_CAM_URL)

    if response.status_code == 200:

        img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)

        frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)


        # Process and display the frame

        processed_frame = process_frame(frame)

        cv2.imshow("Face Detector", processed_frame)


        # Quit when 'q' is pressed

        if cv2.waitKey(1) & 0xFF == ord('q'):

            break

    else:

        print("Failed to fetch image from ESP32-CAM")


cv2.destroyAllWindows()



Setting Up Python Environment

Install Dependencies:

1)Create a virtual environment:
python -m venv venv

source venv/bin/activate  # Linux/Mac

venv\Scripts\activate   # Windows

2)Install required libraries:

pip install opencv-python numpy

After setting the Pythong Environment, run the Python code. 

ESP32-CAM code breakdown

#include

#include

#include


  • #include : Adds support for creating a lightweight HTTP server.

  • #include : Allows the ESP32 to connect to Wi-Fi networks.

  • #include : Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.

 

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

 


  • WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.

 WebServer server(80);


  • WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).

 


static auto hiRes = esp32cam::Resolution::find(800, 600);


esp32cam::Resolution::find: Defines camera resolutions:

  • hiRes: High resolution (800x600).

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

 

 


  • esp32cam::capture: Captures a frame from the camera.

  • Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.

  • Logging Success: Prints the resolution and size of the captured image.

  • Serving the Image:

    • Sets the content length and MIME type as image/jpeg.

    • Writes the image data directly to the client.

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

 


  • handleJpgHi: Switches the camera to high resolution using esp32cam::Camera.changeResolution(hiRes) and calls serveJpg.

  • Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

 

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());

  Serial.println("  /cam-hi.jpg");


 

  server.on("/cam-hi.jpg", handleJpgHi);

 

 

  server.begin();

}


  Serial Initialization:

  • Initializes the serial port for debugging.

  • Sets baud rate to 115200.

  Camera Configuration:

  • Sets pins for the AI Thinker ESP32-CAM module.

  • Configures the default resolution, buffer count, and JPEG quality (80%).

  • Attempts to initialize the camera and logs the status.

  Wi-Fi Setup:

  • Connects to the specified Wi-Fi network in station mode.

  • Waits for the connection and logs the device's IP address.

  Web Server Routes:

  • Maps URL endpoint ( /cam-hi.jpg).

  •   Server Start:

  • Starts the web server.

void loop()

{

  server.handleClient();

}


  • server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.

Summary of Workflow

  1. The ESP32-CAM connects to Wi-Fi and starts a web server.

  2. URL endpoint /cam-hi.jpg) lets the user request images at high resolution.

  3. The camera captures an image and serves it to the client as a JPEG.

  4. The system continuously handles new client requests.


Python code breakdown

Importing Libraries


import cv2

import requests

import numpy as np



  • cv2: OpenCV library for image processing.

  • requests: To fetch the image frames from the ESP32-CAM over HTTP.

  • numpy (np): For array operations, used here to handle the byte stream received from the ESP32-CAM.



ESP32-CAM URL


ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"


  • Replace this URL with the actual IP address of your ESP32-CAM on your local network. The endpoint "/cam-hi.jpg" returns the latest frame captured by the ESP32-CAM.


Loading Haar Cascades


frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")



  • Haar cascades are pre-trained classifiers provided by OpenCV to detect objects like faces.

  • haarcascade_frontalface_default.xml: Detects frontal faces.

  • haarcascade_profileface.xml: Detects side/profile faces.


Frame Processing Function


def process_frame(frame):

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)



  • cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY): Converts the image to grayscale, which is required by Haar cascades for face detection.

Frontal Face Detection


 frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))


detectMultiScale: Detects objects in the image.

  • scaleFactor=1.1: Specifies how much the image size is reduced at each scale.

  • minNeighbors=5: Minimum number of neighbouring rectangles required for positive detection.

  • minSize=(20, 20): Minimum size of detected objects.

Profile Face Detection


 profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))

  • Same as frontal detection but uses the profile cascade for side faces.

Drawing Rectangles for Faces


    for (x, y, w, h) in frontal_faces:

        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)

        cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)


  • Draws a red rectangle around each detected frontal face.

  • Adds the label "Frontal Face" above the rectangle.


    for (x, y, w, h) in profile_faces:

        cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)

        cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)


  • Draws a blue rectangle for each detected profile face.

  • Labels it as "Profile Face."


Adding Face Counts

    cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)


  • Displays the count of detected frontal and profile faces on the top-left of the frame.





Main Loop


while True:

    response = requests.get(ESP32_CAM_URL)



  • Continuously fetches images from the ESP32-CAM.

Handle the Image Response

    if response.status_code == 200:

        img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)

        frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)


  • Converts the HTTP response to a NumPy array.

  • Decodes the byte array into an OpenCV image using cv2.imdecode.


Process and Display the Frame

        processed_frame = process_frame(frame)

        cv2.imshow("Face Detector", processed_frame)

  • Processes the frame using the process_frame function.

  • Displays the processed frame in a window titled "Face Detector."

Quit on Key Press

        if cv2.waitKey(1) & 0xFF == ord('q'):

            break


  • Checks if the 'q' key is pressed to exit the loop.

Error Handling

    else:

        print("Failed to fetch image from ESP32-CAM")


  • Prints an error message if the ESP32-CAM fails to provide an image.


Clean Up

cv2.destroyAllWindows()


  • Closes all OpenCV windows when the program exits.












Summary of the Workflow

  1. Setup:

    • The code connects to the ESP32-CAM via its IP address to fetch image frames in real time.

    • It loads pre-trained Haar Cascade classifiers for detecting frontal and profile faces.

  2. Continuous Image Fetching:

    • The program enters a loop where it fetches a new image frame from the ESP32-CAM using an HTTP GET request.

  3. Image Processing:

    • The image is converted into a format usable by OpenCV.

    • The frame is processed to:

      • Convert it to grayscale (required for Haar Cascade detection).

      • Detect frontal faces and profile faces using the respective classifiers.

  4. Face Detection and Visualization:

    • For each detected face:

      • A rectangle is drawn around it:

        • Red for frontal faces.

        • Blue for profile faces.

      • A label ("Frontal Face" or "Profile Face") is added above the rectangle.

    • The count of detected frontal and profile faces is displayed on the frame.

  5. Display:

    • The processed frame, with visual indicators and counts, is displayed in a window titled "Face Detector."

  6. User Interaction:

    • The program continues fetching, processing, and displaying frames until the user presses the 'q' key to quit.

  7. Error Handling:

    • If the ESP32-CAM fails to provide an image, an error message is printed, and the loop continues.

  8. Cleanup:

    • Upon exiting the loop, all OpenCV windows are closed to release resources.


Key Workflow Steps:

  1. Fetch Image → 2. Convert Image → 3. Detect Faces → 4. Annotate Frame → 5. Display Frame → 6. Repeat Until Exit.


Testing


  1. Power up the ESP32-CAM and connect it to Wi-Fi.

  2. Run the Python script. Make sure that the ESP32-CAM URL is correctly set.

  3. See the result of counting the faces in the display.

  4. You can test with real-life people and photos. 

                 Fig: Face  counting

Troubleshooting:

  • Guru Meditation Error: Ensure stable power to the ESP32-CAM.

  • No Image Display: Check the IP address and ensure the ESP32-CAM is accessible from your computer.

  • Library Conflicts: Use a virtual environment to isolate Python dependencies.

  • Dots at the time of uploading the code: Immediately press the RST button.

  • Multiple failed upload attempts despite pressing the RST button: Restart your computer and try again. 

To wrap up

This project demonstrates an effective implementation of a face-counting system using ESP32-CAM and Python. The system uses the ESP32-CAM’s capability to capture and serve high-resolution images over HTTP. The Python client uses OpenCV's Haar cascade classifiers to effectively detect and count frontal and profile faces in each frame. It provides real-time feedback.

This project can be adapted for various applications, such as crowd monitoring, security, and smart building management. It provides an affordable and flexible solution. 

Future improvements can be made using advanced face detection algorithms like DNN-based models. This project highlights how simple hardware and software integration can address complex problems in computer vision.

Object Counting Project using ESP32-CAM and OpenCV

Imagine a real-time object counting system that is budget-friendly and easy to implement. You can achieve this goal with an ESP32-CAM. Today we will build an ESP32-CAM Object Counting System. This project is a combination of the power of embedded systems and computer vision.

The main processor of the system is ESP32-CAM, a budget-friendly microcontroller with an integrated camera. This tiny powerhouse captures live video streams and transmits them over Wi-Fi. On the other side, a Python-based application processes these streams, detects objects using image processing techniques, and displays the count dynamically.

Whether it’s tracking inventory in a warehouse, monitoring traffic flow, or automating production lines, this system is versatile and adaptable. You can implement this project with a minimum number of components. It is quite easy.

Join us as we explore how to build this smart counting system step-by-step. You'll learn to configure the ESP32-CAM, process images in Python, and create a seamless, real-time object detection system. Let’s see how to bring this project to life!

System Architecture of the ESP32-CAM Object Counting System

The ESP32-CAM Object Counting System is built on a modular and efficient architecture, combining hardware and software components to achieve real-time object detection and counting. Below is a detailed breakdown of the system architecture:

1. Hardware Layer

  1. ESP32-CAM Module

    • Acts as the primary hardware for image capture and Wi-Fi communication.

    • Equipped with an onboard camera to stream live video at different resolutions.

    • Connects to a local Wi-Fi network to transmit data.

  2. Power Supply

    • Provides stable power to the ESP32-CAM module, typically via a USB connection or external battery pack.

2. Communication Layer

  1. Wi-Fi Connection

    • The ESP32-CAM connects to a local Wi-Fi network to enable seamless data transmission.

    • Uses HTTP requests to serve video streams at different resolutions.

  2. HTTP Server on ESP32-CAM

    • Runs a lightweight web server on port 80.

    • Responds to specific endpoints (/cam-lo.jpg, /cam-mid.jpg, /cam-hi.jpg) to provide real-time image frames at requested resolutions.


3. Processing Layer

  1. ESP32-CAM Side

    • Captures and processes raw image data using the onboard camera.

    • Serves the images as JPEG streams through the HTTP server.

  2. Python Application on Host Machine

    • Receives image streams from the ESP32-CAM using HTTP requests.

    • Processes the images using OpenCV for: 

      • Grayscale conversion.

      • Noise reduction with Gaussian blur.

      • Edge detection by using the Canny algorithm.

      • Contour detection to identify objects in the frame.

    • Counts the detected objects and updates the display dynamically.


4. User Interaction Layer

  1. Live Video Feed

    • Displays the real-time video stream with contours drawn around detected objects.

  2. Object Count Display

    • Provides a dynamic count of detected objects in the video feed.

    • The count is displayed on the console or integrated into a graphical interface.

  3. User Commands

    • Enables interaction through keyboard inputs (e.g., pressing 'a' to print the object count or 'q' to quit the application).

5. System Workflow

  1. The ESP32-CAM captures live video and streams it as JPEG images over a Wi-Fi network.

  2. The Python application on the host machine fetches the image frames via HTTP requests.

  3. The fetched images undergo processing in OpenCV to detect and count objects.

  4. The processed video is displayed, and the object count is dynamically updated based on user input.


This architecture ensures a clear separation of tasks, with the ESP32-CAM handling image capture and streaming, and the Python application focusing on image processing and visualization. The modular design makes it easy to expand or adapt the system for various applications.

List of components

Components

Quantity

ESP32-CAM WiFi + Bluetooth Camera Module

1

FTDI USB to Serial Converter 3V3-5V

1

Male-to-female jumper wires

4

Female-to-female jumper wire

1

MicroUSB data cable

1

Circuit diagram

The following is the circuit diagram for this project.

Fig: Circuit diagram

ESP32-CAM WiFi + Bluetooth Camera Module

FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)

5V

VCC

GND

GND

UOT

Rx

UOR

TX

IO0

GND (FTDI or ESP32-CAM)

Programming

Board installation

If it is your first project with any board of the ESP32 series, you need to do the board installation first. If ESP32 boards are already installed in your Arduino IDE, you can skip this installation section. You may also need to install the CP210x USB driver.

  • Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK. 

Fig: Board Installation

  • Go to Tools>Board>Boards Manager and install the ESP32 boards. 

Fig: Board Installation

Install the ESP32-CAM library

  • Download the ESP32-CAM library from Github (the link is given in the reference section). Then install it by following the path sketch>include library> add.zip library. 

Now select the correct path to the library, click on the library folder and press open.

Board selection and code uploading

Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer by using a micro USB data cable. If the board has no connector, you have to connect the FTDI module to the computer with the data cable. If you never used the FTDI board on your computer, you will need to install the FTDI driver first.

  • After connecting the camera,  Go to Tools>boards>esp32>Ai thinker ESP32-CAM

Fig: Camera board selection

After selecting the board, select the appropriate COM port and upload the following code:

#include

#include

#include

const char* WIFI_SSID = "Hamad";

const char* WIFI_PASS = "barsha123";

WebServer server(80);

static auto loRes = esp32cam::Resolution::find(320, 240);

static auto midRes = esp32cam::Resolution::find(350, 530);

static auto hiRes = esp32cam::Resolution::find(800, 600);

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

void handleJpgLo()

{

  if (!esp32cam::Camera.changeResolution(loRes)) {

    Serial.println("SET-LO-RES FAIL");

  }

  serveJpg();

}

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

void handleJpgMid()

{

  if (!esp32cam::Camera.changeResolution(midRes)) {

    Serial.println("SET-MID-RES FAIL");

  }

  serveJpg();

}

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());

  Serial.println("  /cam-lo.jpg");

  Serial.println("  /cam-hi.jpg");

  Serial.println("  /cam-mid.jpg");

  server.on("/cam-lo.jpg", handleJpgLo);

  server.on("/cam-hi.jpg", handleJpgHi);

  server.on("/cam-mid.jpg", handleJpgMid);

  server.begin();

}

void loop()

{

  server.handleClient();

}

After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Fig: Code successfully uploaded to ESP32-CAM

You have to copy the IP address and paste it into the following part of your Python code.

Python code

Copy-paste the following Python code and save it using a Python interpreter. 

import cv2

import urllib.request

import numpy as np


url = 'http://192.168.1.101/'  # Update the URL if needed

cv2.namedWindow("live transmission", cv2.WINDOW_AUTOSIZE)

while True:

    img_resp = urllib.request.urlopen(url + 'cam-lo.jpg')

    imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

    img = cv2.imdecode(imgnp, -1)

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    canny = cv2.Canny(cv2.GaussianBlur(gray, (11, 11), 0), 30, 150, 3)

    dilated = cv2.dilate(canny, (1, 1), iterations=2)

    (Cnt, _) = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

    # Draw contours

    cv2.drawContours(img, Cnt, -1, (0, 255, 0), 2)

    # Display the number of counted objects on the video feed

    count_text = f"Objects Counted: {len(Cnt)}"

    cv2.putText(img, count_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)


    cv2.imshow("live transmission", img)

    cv2.imshow("mit contour", canny)


    key = cv2.waitKey(5)

    if key == ord('q'):

        break


cv2.destroyAllWindows()

Setting Up Python Environment

Install Dependencies:

1)Create a virtual environment:
python -m venv venv

source venv/bin/activate  # Linux/Mac

venv\Scripts\activate   # Windows

2)Install required libraries:

pip install opencv-python numpy

After setting the Pythong Environment, run the Python code. 


ESP32-CAM code breakdown

#include

#include

#include


  • #include : Adds support for creating a lightweight HTTP server.

  • #include : Allows the ESP32 to connect to Wi-Fi networks.

  • #include : Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.

 

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

 


  • WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.

 WebServer server(80);


  • WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).

 

static auto loRes = esp32cam::Resolution::find(320, 240);

static auto midRes = esp32cam::Resolution::find(350, 530);

static auto hiRes = esp32cam::Resolution::find(800, 600);


esp32cam::Resolution::find: Defines three camera resolutions:

  • loRes: Low resolution (320x240).

  • midRes: Medium resolution (350x530).

  • hiRes: High resolution (800x600).

void serveJpg()

{

  auto frame = esp32cam::capture();

  if (frame == nullptr) {

    Serial.println("CAPTURE FAIL");

    server.send(503, "", "");

    return;

  }

  Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

                static_cast(frame->size()));

 

  server.setContentLength(frame->size());

  server.send(200, "image/jpeg");

  WiFiClient client = server.client();

  frame->writeTo(client);

}

 

 


  • esp32cam::capture: Captures a frame from the camera.

  • Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.

  • Logging Success: Prints the resolution and size of the captured image.

  • Serving the Image:

    • Sets the content length and MIME type as image/jpeg.

    • Writes the image data directly to the client.

void handleJpgLo()

{

  if (!esp32cam::Camera.changeResolution(loRes)) {

    Serial.println("SET-LO-RES FAIL");

  }

  serveJpg();

}

 

void handleJpgHi()

{

  if (!esp32cam::Camera.changeResolution(hiRes)) {

    Serial.println("SET-HI-RES FAIL");

  }

  serveJpg();

}

 

void handleJpgMid()

{

  if (!esp32cam::Camera.changeResolution(midRes)) {

    Serial.println("SET-MID-RES FAIL");

  }

  serveJpg();

}

 


  • handleJpgLo: Switches the camera to low resolution using esp32cam::Camera.changeResolution(loRes) and calls serveJpg.

  • handleJpgHi: Switches to high resolution and serves the image.

  • handleJpgMid: Switches to medium resolution and serves the image.

  • Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.

void  setup(){

  Serial.begin(115200);

  Serial.println();

  {

    using namespace esp32cam;

    Config cfg;

    cfg.setPins(pins::AiThinker);

    cfg.setResolution(hiRes);

    cfg.setBufferCount(2);

    cfg.setJpeg(80);

 

    bool ok = Camera.begin(cfg);

    Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

  }

  WiFi.persistent(false);

  WiFi.mode(WIFI_STA);

  WiFi.begin(WIFI_SSID, WIFI_PASS);

  while (WiFi.status() != WL_CONNECTED) {

    delay(500);

  }

  Serial.print("http://");

  Serial.println(WiFi.localIP());

  Serial.println("  /cam-lo.jpg");

  Serial.println("  /cam-hi.jpg");

  Serial.println("  /cam-mid.jpg");

 

  server.on("/cam-lo.jpg", handleJpgLo);

  server.on("/cam-hi.jpg", handleJpgHi);

  server.on("/cam-mid.jpg", handleJpgMid);

 

  server.begin();

}


  Serial Initialization:

  • Initializes the serial port for debugging.

  • Sets baud rate to 115200.

  Camera Configuration:

  • Sets pins for the AI Thinker ESP32-CAM module.

  • Configures the default resolution, buffer count, and JPEG quality (80%).

  • Attempts to initialize the camera and logs the status.

  Wi-Fi Setup:

  • Connects to the specified Wi-Fi network in station mode.

  • Waits for the connection and logs the device's IP address.

  Web Server Routes:

  • Maps URL endpoints (/cam-lo.jpg, /cam-hi.jpg, /cam-mid.jpg) to their respective handlers.

  Server Start:

  • Starts the web server.

void loop()

{

  server.handleClient();

}


  • server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.

Summary of Workflow

  1. The ESP32-CAM connects to Wi-Fi and starts a web server.

  2. URL endpoints (/cam-lo.jpg, /cam-mid.jpg, /cam-hi.jpg) let the user request images at different resolutions.

  3. The camera captures an image and serves it to the client as a JPEG.

  4. The system continuously handles new client requests.


Python code breakdown

Importing Libraries


import cv2

import urllib.request

import numpy as np

  • cv2: OpenCV library for image processing.

  • urllib.request: Used to fetch images from the live camera feed via an HTTP request.

  • numpy: Helps in manipulating and decoding image data into arrays.


Camera Setup


url = 'http://192.168.1.101/'  # Update the URL if needed

cv2.namedWindow("live transmission", cv2.WINDOW_AUTOSIZE)

  • url: The IP address of the camera with the endpoint cam-lo.jpg to get the image stream.

  • cv2.namedWindow: Creates a window to display the live video feed.


Main Loop


while True:

  • A loop continuously fetches and processes frames from the camera feed until the user quits by pressing 'q'.


Fetching the Image


img_resp = urllib.request.urlopen(url + 'cam-lo.jpg')

imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

img = cv2.imdecode(imgnp, -1)

  • urllib.request.urlopen: Sends an HTTP GET request to the camera URL and retrieves an image. Here you can use ‘cam-hi.jpg’ or ‘cam-mid.jpg’ instead. You can use any of the three resolutions of images and see which one gives you the best result.  

  • bytearray: Converts the image data into a binary format for processing.

  • np.array: Converts the binary data into a NumPy array.

  • cv2.imdecode: Decodes the NumPy array into an image (OpenCV-readable format).


Image Preprocessing


gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

canny = cv2.Canny(cv2.GaussianBlur(gray, (11, 11), 0), 30, 150, 3)

dilated = cv2.dilate(canny, (1, 1), iterations=2)

  • cv2.cvtColor: Converts the image to grayscale for easier edge detection.

  • cv2.GaussianBlur: Applies a Gaussian blur to reduce noise and detail in the image.

    • Parameters (11, 11) specify the kernel size (area used for the blur).

  • cv2.Canny: Performs edge detection.

    • 30, 150: Lower and upper thresholds for edge detection.

    • 3: Size of the Sobel kernel.

  • cv2.dilate: Expands the edges detected by the Canny algorithm to close gaps and make objects more defined.

    • (1, 1): Kernel size for dilation.

    • iterations=2: Number of times the dilation is applied.


Finding Contours


(Cnt, _) = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

  • cv2.findContours: Finds the outlines of objects in the binary (edge-detected) image.

    • dilated.copy(): A copy of the dilated image is used to find contours.

    • cv2.RETR_EXTERNAL: Retrieves only the outermost contours.

    • cv2.CHAIN_APPROX_NONE: Retains all contour points without compression.

  • Cnt: List of all detected contours.


Drawing Contours


cv2.drawContours(img, Cnt, -1, (0, 255, 0), 2)

  • cv2.drawContours: Draws the detected contours onto the original image.

    • img: The image to draw on.

    • Cnt: The list of contours.

    • -1: Indicates that all contours should be drawn.

    • (0, 255, 0): The color of the contours (green in BGR format).

    • 2: Thickness of the contour lines.


Displaying the Object Count


count_text = f"Objects Counted: {len(Cnt)}"

cv2.putText(img, count_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

  • f"Objects Counted: {len(Cnt)}": A formatted string showing the number of detected objects.

  • cv2.putText: Adds the text onto the image.

    • img: The image to draw on.

    • (10, 30): Coordinates of the bottom-left corner of the text.

    • cv2.FONT_HERSHEY_SIMPLEX: The font style.

    • 1: Font scale (size).

    • (0, 0, 255): Text color (red in BGR format).

    • 2: Thickness of the text.


Displaying the Video Feed


cv2.imshow("live transmission", img)

cv2.imshow("mit contour", canny)

  • cv2.imshow: Displays images in separate windows.

    • "live transmission": Shows the original image with contours and text.

    • "mit contour": Shows the edge-detected binary image.


Keyboard Interaction

    key = cv2.waitKey(5)

    if key == ord('q'):

        break

  • cv2.waitKey: Waits for 5 milliseconds for a key press.

  • ord('q'): Checks if the 'q' key is pressed, and if so, breaks the loop to exit the program.

Cleanup

cv2.destroyAllWindows()

cv2.destroyAllWindows: Closes all OpenCV windows when the loop ends.


Summary of Workflow

  1. Fetches the image from the live camera feed.

  2. Processes the image to detect edges and contours.

  3. Counts and draws contours on the image.

  4. Displays the image with the object count overlaid.

  5. Exits when 'q' is pressed.

Testing


  1. Power up the ESP32-CAM and connect it to Wi-Fi.

  2. Run the Python script, ensuring the ESP32-CAM URL is correctly set.

  3. See the result of counting the objects in the display.

Note: The background and the objects should be of different colors.  If you place black objects on a black background, you will get the wrong results.

Fig: coin counting

Troubleshooting:

  • Guru Meditation Error: Ensure stable power to the ESP32-CAM.

  • No Image Display: Check the IP address and ensure the ESP32-CAM is accessible from your computer.

  • Library Conflicts: Use a virtual environment to isolate Python dependencies.

To wrap up

This project demonstrates a seamless integration of an ESP32-CAM module and Python to build a real-time object-counting system. By using the ESP32-CAM's ability to capture and serve images over Wi-Fi, coupled with Python's powerful OpenCV library, we achieved an efficient and cost-effective solution for object counting and detection.

Throughout the tutorial, we explored each component in detail, from setting up the ESP32-CAM to processing live image streams with Python. Along the way, we learned to customize image resolutions, handle server routes, and enhance detection accuracy using OpenCV functions like edge detection and contour analysis.

This project not only provides a practical application but also serves as a solid foundation for more advanced computer vision systems. Whether you aim to integrate machine learning for object classification or scale this system for industrial monitoring, the possibilities are vast.

We hope this tutorial has inspired you to dive deeper into the world of IoT and computer vision. Happy building!

Mobile App Development Could Well be 2025's Hot Skill

It's not news that the world is one of portability now. Ever since laptops became more easily transportable and smartphones became the one object you could find in every person's pocket, desktop and fixed-location tech has been something of a relic of a bygone era. You're more likely to see a pig fly than someone under the age of 30 using a landline phone or desktop computer.

However, as businesses have pivoted to online-first digital storefronts and presences, there has been a small section that has yet to catch up with the mobile-only generation with fully-optimized digital presences. Because of that, there's a clear niche that software engineers and developers can exploit and it could be 2025's hottest skill set for this group.

Mobile Priority for Every Business in the 21st Century

Every industry is now aware of the fact that they can't rely on the old-school method of promotion and attracting customers. First, it was the need to have a simple website. Then it became the need for a mobile-optimised website. And now, it's imperative that businesses of many different kinds have a dedicated mobile app that can be used on a number of devices.

The casino industry is a great microcosm of this. It's gone from being a solely brick-and-mortar deal to one that is almost primarily seen as digital. For example, Puntit India is a solely-online casino with a fully optimized mobile website that replicates the aesthetic and functionality of a dedicated app that you would install from the Apple App Store or Google Play Store.

A Potentially Lucrative Career Path for Young Professionals

With it clear that businesses are looking to ensure they have a mobile presence as well as a standard website, it should be no surprise that mobile app developers have a chance to enter a profession with an impressive salary, especially when compared with other salaries in the software engineering and development sector.

In fact, according to data from Indeed US , mobile app development jobs average around a $117,000 annual salary. That's in comparison with $105,000 for more standard software development roles. It might not seem like the biggest salary gap, but it shows that mobile developers are in higher demand and there's opportunity, especially in a freelance environment.

Small Businesses Great Adopters of Mobile

What is interesting is the makeup of the businesses that are looking to mobile and modernise. In 2022, the information suggested that around 50% of small-to-medium enterprises had their own app. Even more encouragingly, though, was that around 80% of the remaining businesses that were yet to jump on the bandwagon were looking to do so soon.

The fact that mobile isn't simply the domain of the big businesses that can afford their own in-house dev team is equally promising. It suggests that, while businesses may also be outsourcing their app development , they are still seeing it as an important part of their strategy. And this means that anyone who has mobile development in their arsenal is going to be seen as an asset, whether they are already working for an SME or they are functioning as an outside contractor.

Artificial Intelligence and App Development

One of the biggest changes in software development as a whole in recent years has been the growth of artificial intelligence within the digital sphere. No longer is AI seen as something specific to the world of gaming or more obscure corners of computer sciences. Instead, it has now become a part of a lot of people’s day-to-day lives, and nowhere is this clearer than in the technology and digital media industry. More and more, generative AI and AI enhancement are seen as standard in software and app development.

You only need to look at search engines like Google. Despite always relying on complex algorithms comparable to AI in order to deliver reliable results, the adoption of high-level AI tech to deliver editorialised results has become commonplace. More than anything, it is evident that mobile apps will continue to use this new technology to deliver efficient results in text-to-speech and natural language processing (NLP).

Entertainment Apps, AI, and Future Development

It’s not just search engines and text-to-speech apps and models that are adopting AI in the new mobile app revolution, though. Entertainment apps have seen a huge overhaul in how they operate. Spotify, one of the biggest music streaming platforms, has seen one of the biggest shifts. After cementing its place as the de facto online music collection of millions, it has pivoted back to a format more reminiscent of a pre-digital age.

With their recent addition of an AI-powered DJ, complete with an artificially generated script and voice, Spotify is harking back to the golden days of radio. The most important point to take from this is that it is further evidence that AI and mobile app development are fast becoming inseparable and that anyone who can jump on to the trend at an early stage will find themselves at the forefront when it comes to career development.

A Bright Outlook in a Difficult Trading Environment

Anyone who has been in the business world in recent years will know that it's not the easiest time for any business or freelancer to operate. With increasing costs and downsizing from a number of the biggest names in tech, such as Meta, professionals must look to insulate themselves from the turbulence the industry is experiencing. The gig economy has become a staple, especially in the US, and that doesn't look like changing any time soon.

With that said, these sorts of skills are not only sought after by big businesses but are also great for ensuring that they have some insurance outside of the corporate world. Because of that, it won't be surprising to see that mobile development courses experience a massive uptick in enrollment as we navigate 2025 and beyond. If that is the case, it's not out of the realm of possibility that we see a real boom in the next five years.

Getting Started with ESP32-CAM | Pinout, Features, Programming, Code Uploading

In today’s tutorial, we’ll show you how to program the ESP32-CAM module. ESP32-CAM module is suitable for building basic surveillance or monitoring systems. Its price is quite reasonable. You can use it for lots of AI-based projects like object detection, face recognition etc. 

However, many users face hard luck when setting up and uploading code to ESP32-CAM development boards. This tutorial will provide you with a guideline for successfully programming the ESP32-CAM.  

Overview of the ESP32-CAM Development Board

The ESP32-CAM is a standalone development board that integrates an ESP32-S chip, a camera module, onboard flash memory, and a microSD card slot. It features built-in Wi-Fi and Bluetooth connectivity and supports OV2640 or OV7670 cameras with a resolution of up to 2 megapixels.

Key Features:

  • Ultra-small 802.11b/g/n Wi-Fi + Bluetooth/BLE SoC module

  • Low-power, dual-core 32-bit processor with a clock speed of up to 240MHz and computing power of 600 DMIPS

  • 520 KB built-in SRAM and 4M external PSRAM

  • Supports multiple interfaces: UART, SPI, I2C, PWM, ADC, and DAC

  • Compatible with OV2640 and OV7670 cameras and includes built-in flash storage

  • Enables Wi-Fi-based image uploads and supports TF cards

  • Multiple sleep modes for power efficiency

  • Operates in STA, AP, and STA+AP modes

Specifications:

  • Dimensions: 27 × 40.5 × 4.5 mm

  • SPI Flash: Default 32Mbit

  • RAM: 520KB internal + 4M external PSRAM

  • Bluetooth: BT 4.2 BR/EDR and BLE

  • Wi-Fi Standards: 802.11 b/g/n/e/i

  • Interfaces: UART, SPI, I2C, PWM

  • TF Card Support: Up to 16GB (4G recommended)

  • GPIO Pins: 9 available

  • Image Output Formats: JPEG (only with OV2640), BMP, Grayscale

  • Antenna: Onboard with 2dBi gain

  • Security: WPA/WPA2/WPAS-Enterprise/WPS

  • Power Supply: 5V

  • Operating Temperature: -20°C to 85°C

Power Consumption:

  • Without flash: 180mA @ 5V

  • With max brightness flash: 310mA @ 5V

  • Deep sleep mode: 6mA @ 5V

  • Modem sleep mode: 20mA @ 5V

  • Light sleep mode: 6.7mA @ 5V

ESP32-CAM Pinout

The ESP32-CAM module has fewer accessible GPIO pins compared to a standard ESP32 board since many are allocated for the camera and SD card module. Certain pins should be avoided during programming:

  • GPIO1, GPIO3, and GPIO0 are essential for uploading code and should not be used for other functions.

  • GPIO0 is linked to the camera XCLK pin and should remain unconnected during normal operation. It must be pulled to GND only when uploading firmware.

  • P_OUT Pin: Labeled as VCC on some boards, this pin provides 3.3V or 5V output depending on the solder pad configuration. It cannot be used to power the board—use the dedicated 5V pin instead.

  • GPIO 2, 4, 12, 13, 14, and 15 are assigned to the SD card reader. If the SD card module is unused, these pins can be repurposed as general I/O.

Notably, the onboard flash LED is connected to GPIO 4, meaning it may turn on when using the SD card reader. To prevent this behaviour, use the following code snippet:

SD_MMC.begin("/sdcard", true);


For an in-depth explanation of the ESP32-CAM pinout and GPIO usage, refer to the Random Nerd Tutorials guide: ESP32-CAM AI-Thinker Pinout Guide: GPIOs Usage Explained.

Schematic 

Following is a full schematic of the ESP32-CAM. 

Driver installation

You need to install the CP210X driver on your computer to get the ESP32-CAM working. You can download the driver from here

Board installation

No matter which method you choose to program your ESP32-CAM, you need to do the board installation. If ESP32 boards are already installed in your Arduino IDE, feel free to skip this installation section. Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK. 

  • Go to Tools>Board>Boards Manager and click ‘install’. 

Install the ESP32-CAM library.

  • Download the ESP32-CAM library from Github (the link is given in the reference section). Follow the path sketch>include library> add.zip library. 

Now select the correct path to the library, click on the library folder and press open. 

Board selection and code uploading

Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer using a micro USB data cable. If the board has no connector, you need to connect the FTDI module to the computer with the data cable. You will need to install the FTDI driver first.

  • When you’re done with the connection,  Go to Tools>boards>esp32>Ai thinker ESP32-CAM

After selecting the board, select the appropriate COM port and upload the following code:

Method 1: Using the ESP32-CAM Programmer Shield

ESP32-CAM programmer shield is made exclusively to program the ESP32-CAM. The shield is equipped with a USB-to-serial converter.  The built-in USB-to-serial converter simplifies the process of connecting the board to a computer for programming and debugging. It also includes a microSD card slot for expanded storage, enabling easy data storage and retrieval. Additionally, the shield features a power switch and an LED indicator, allowing for straightforward power control and status monitoring. With its compact design and user-friendly functionality, the ESP32-CAM-MB Programmer Shield is a valuable tool for developers working with the ESP32-CAM-MB board.

Connecting ESP32-CAM with the Programmer Shield

Just connect the ESP32-CAM module on top of the Programming Shield as shown below, and connect a USB cable from the Programming Shield to your computer. Now you can program your ESP32-CAM.

Connecting the Programming Shield to Your Computer

First, take a functional  USB cable.  It should be securely connected and to the USB port of your computer. When plugged in, you should hear a notification sound from your computer. A red LED on the Programming Shield should illuminate. Next, confirm that you have selected the AI Thinker ESP32-CAM board and the appropriate Serial Port. Refer to the image below for guidance.

Press the upload button to upload your code.

Press the IOo button of the programming shield. 

The text ‘connecting’ should appear in the output panel.

While holding down the IOo button, press the RST button and release. See the following picture to know the location of the RST button, that you need to press.

When the dots in the text “Connecting …..” stop appearing you can release the IO0 button as well. If the following text appears, it indicates that the code is being uploaded:

Running Mode

When the code is uploaded, you will see the message “Hard resetting via RTS pin…” in the Output Panel. You must press the RST button on the ESP32-CAM module to run the uploaded program. Avoid using the RST button on the Programming Shield. Also, do not press the IO0 button.

Test Code for ESP32-CAM with Programming Shield

This simple Blink program turns on the Flash LED on the ESP32-CAM for 10 milliseconds, then waits for 2 seconds before repeating the cycle.

int flashPin = 4;

void setup() {

  pinMode(flashPin, OUTPUT);

}


void loop() {

  digitalWrite(flashPin, HIGH);

  delay(10);

  digitalWrite(flashPin, LOW);

  delay(2000);

}

You will see the LED flashing if the code is uploaded without any problem. 

Method 2: Programming ESP32-CAM with FTDI programmer.

List of components



Components

Quantity

ESP32-CAM WiFi + Bluetooth Camera Module

1

FTDI USB to Serial Converter 3V3-5V

1

Male-to-female jumper wires

4

Female-to-female jumper wire

1

MicroUSB data cable

1

Circuit diagram

Following is the circuit diagram of this project.


ESP32-CAM WiFi + Bluetooth Camera Module FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)

5V

VCC

GND

GND

UOT

Rx

UOR

TX

IO0

GND (FTDI or ESP32-CAM)

Programming

Testing Code for ESP32-CAM with FTDI Programmer

This code functions similarly to a standard Blink program but gradually increases the brightness of the flash LED over 255 steps before turning it off for a second and repeating the cycle.

int flashPin = 4;


void setup() {

  pinMode(flashPin, OUTPUT);

}


void loop() {

  for (int brightness = 0; brightness < 255; brightness++) {

    analogWrite(flashPin, brightness);

    delay(1);

  }

  analogWrite(flashPin, 0);

  delay(1000);

}


Since programming the ESP32-CAM (even with the FTDI Programmer) can be cumbersome, it’s advisable to first verify the functionality of the SD card and camera before attempting more complex projects. The following sections outline how to do this.


Testing the SD Card

The ESP32-CAM officially supports up to 4GB microSD cards, but 8GB and 16GB cards generally work fine. Larger cards require reformatting to FAT32, which can be done using guiformat.exe from Ridgecrop.

The test program below creates a file, writes a test message to it, and reads back the content. If the output matches expectations, the SD card is functioning correctly.

#include "SD_MMC.h"

#include "FS.h"

#include "LittleFS.h"


int flashPin = 4;


void setup() {

  Serial.begin(115200);  

  SD_MMC.begin();

  LittleFS.begin(true);


  // Create and write a test file

  File file = LittleFS.open("/test.txt", FILE_WRITE);

  file.print("*** Test successful ***");

  file.close();


    file = LittleFS.open("/test.txt");

  while (file.available()) {

    Serial.write(file.read());

  }

  file.close();


  // Set the flash LED as output 

  pinMode(flashPin, OUTPUT);

// turn the LED off

  analogWrite(flashPin, 0);

}


void loop() {

}


Code Breakdown

1. Libraries & Initialization:


#include "SD_MMC.h"

#include "FS.h"

#include "LittleFS.h"


Additionally, we define flashPin to control the flash LED:

int flashPin = 4;


2. Setup Function:
The setup() function initializes serial communication at 115200 baud:

Serial.begin(115200);


Then we need to initialize the SD card and LittleFS file system. The argument true ensures that LittleFS is formatted if it isn't already:

SD_MMC.begin();

LittleFS.begin(true);


A file named test.txt is created, a test message is written, and the file is closed:

File file = LittleFS.open("/test.txt", FILE_WRITE);

file.print("*** Test successful ***");

file.close();


The file is reopened in read mode, its contents are printed to the serial monitor, and then it is closed:

file = LittleFS.open("/test.txt");

while (file.available()) {

  Serial.write(file.read());

}

file.close();


We need to turn off the flash LED. 

pinMode(flashPin, OUTPUT);

analogWrite(flashPin, 0);


3. Loop Function:
The loop() function remains empty since the entire process occurs within setup(). 


Serial Monitor Output

If the SD card test is successful, you will see *** Test successful ***

in the Serial Monitor.

This confirms that data can be written to and read from the SD card.

Additional SD Card Testing

For more extensive diagnostics, you can use the SDMMC_Test.ino example provided in the ESP32 library. This program includes additional debugging information and can be accessed via:

File > Examples > Examples for AI-Thinker ESP32-CAM > SDMMC > SDMMC_Test

Testing the Camera

After verifying the SD card, the next step is to test the camera module. The following is a simplified program that captures an image each time the ESP32-CAM is reset. The image will be saved in the SD card.

#include "esp_camera.h"

#include "soc/rtc_cntl_reg.h"

#include "SD_MMC.h"

#include "EEPROM.h"

// Pin configuration for AI-Thinker ESP32-CAM module

#define PWDN_GPIO_NUM     32

#define RESET_GPIO_NUM    -1

#define XCLK_GPIO_NUM      0

#define SIOD_GPIO_NUM     26

#define SIOC_GPIO_NUM     27

#define Y9_GPIO_NUM       35

#define Y8_GPIO_NUM       34

#define Y7_GPIO_NUM       39

#define Y6_GPIO_NUM       36

#define Y5_GPIO_NUM       21

#define Y4_GPIO_NUM       19

#define Y3_GPIO_NUM       18

#define Y2_GPIO_NUM        5

#define VSYNC_GPIO_NUM    25

#define HREF_GPIO_NUM     23

#define PCLK_GPIO_NUM     22

void configCamera() {

  camera_config_t config;

  config.ledc_channel = LEDC_CHANNEL_0;

  config.ledc_timer = LEDC_TIMER_0;

  config.pin_d0 = Y2_GPIO_NUM;

  config.pin_d1 = Y3_GPIO_NUM;

  config.pin_d2 = Y4_GPIO_NUM;

  config.pin_d3 = Y5_GPIO_NUM;

  config.pin_d4 = Y6_GPIO_NUM;

  config.pin_d5 = Y7_GPIO_NUM;

  config.pin_d6 = Y8_GPIO_NUM;

  config.pin_d7 = Y9_GPIO_NUM;

  config.pin_xclk = XCLK_GPIO_NUM;

  config.pin_pclk = PCLK_GPIO_NUM;

  config.pin_vsync = VSYNC_GPIO_NUM;

  config.pin_href = HREF_GPIO_NUM;

  config.pin_sscb_sda = SIOD_GPIO_NUM;

  config.pin_sscb_scl = SIOC_GPIO_NUM;

  config.pin_pwdn = PWDN_GPIO_NUM;

  config.pin_reset = RESET_GPIO_NUM;

  config.xclk_freq_hz = 20000000;

  config.pixel_format = PIXFORMAT_JPEG;

  config.frame_size = FRAMESIZE_UXGA;

  config.jpeg_quality = 10;

  config.fb_count = 2;

  esp_camera_init(&config);

}

unsigned int incrementCounter() {

  unsigned int counter = 0;

  EEPROM.get(0, counter);

  EEPROM.put(0, counter + 1);

  EEPROM.commit();

  return counter;

}


void captureImage() {

  camera_fb_t* fb = esp_camera_fb_get();

  unsigned int counter = incrementCounter();

  String filename = "/pic" + String(counter) + ".jpg";

  Serial.println(filename);

  File file = SD_MMC.open(filename.c_str(), FILE_WRITE);

  file.write(fb->buf, fb->len);

  file.close();

  esp_camera_fb_return(fb);

}

void setup() {

  WRITE_PERI_REG(RTC_CNTL_BROWN_OUT_REG, 0);

  Serial.begin(115200);

  SD_MMC.begin();

  EEPROM.begin(16);

  configCamera();

  captureImage();

  esp_deep_sleep_start();

}

void loop() {

}

Code Overview

  • The configCamera() function sets up the camera with the appropriate pin configurations.

  • The incrementCounter() function tracks the number of captured images using EEPROM.

  • The captureImage() function takes a picture and saves it to the SD card.

  • The setup() function initializes the camera and SD card, captures an image, and puts the ESP32-CAM into deep sleep mode to conserve power.

This basic framework can be expanded for use cases such as motion-triggered or interval-based image capture.

Frequently Asked Questions

Here are some common problems that you may face while working with the ESP32-CAM.  You have provided the solutions too. 

Wi-Fi Connectivity

Q: Why isn’t my ESP32-CAM connecting to Wi-Fi?
A: Check if you have entered the right SSID and password.

Camera & Image Quality

Q: Is there any way to improve the image quality of the camera?
A: Adjust the camera settings in your code, experimenting with different resolutions and frame rates for the best results.

Q: Why are my images blurry or unclear?
A: Poor lighting conditions can degrade image quality. Ensure proper lighting, fine-tune camera settings, and remove the protective lens foil.

Serial Communication & Code Upload

Q: Why the camera is not responding to serial monitor commands?
A:  Check the connections between the board and the computer. Also, confirm that the baud rate (115200) in your code matches the serial monitor settings.

Q: Why do I see  “Timed out waiting for packet header” during code upload?

 A: An unstable USB connection may cause this problem. You can try a different USB cable or PORT. 

Q: What to do if the  ESP32-CAM freezes during code upload?
A: Disconnect and reconnect the USB cable, reset the board, and attempt the upload again. Check that your code isn't causing crashes.

Q: How to resolve the error “A fatal error occurred: Failed to connect to ESP32: Timed out waiting for packet header”?
A: This may be caused by an incorrect baud rate or a faulty USB cable. 

SD Card Issues

Q: Why isn’t my SD card being detected?
A: The SD card is properly inserted and formatted as FAT32. Cards between 4GB and 16GB work best, while higher-capacity cards may cause issues.

Power & Performance

Q: My ESP32-CAM gets hot—should I be concerned?
A: It’s normal for the board to warm up during operation, but excessive heat could indicate a short circuit or power issue.

Q: How can I reduce power consumption?
A: Use sleep modes.

Other Camera Issues

Q: Why isn’t my ESP32-CAM capturing images?
A: Check that the camera module is securely connected, and ensure the correct camera module type is defined in your code.

Q: Can I use ESP32-CAM for video streaming?
A: Use a web server library such as ESP32-CAM-Webserver to stream video over Wi-Fi. Ensure your network can handle the required bandwidth.

Bootloader & OTA Programming

Q: Why won’t my ESP32-CAM enter bootloader mode?
A: Ensure GPIO0 is connected to GND, and press the reset button at the correct moment to enter bootloader mode.

Q: Can I upload code wirelessly?
A: Yes, for that you have to use Over-The-Air (OTA) programming. 

If you continue to experience issues, double-check the wiring, connections, and settings in your code.

Syed Zain Nasir

I am Syed Zain Nasir, the founder of <a href=https://www.TheEngineeringProjects.com/>The Engineering Projects</a> (TEP). I am a programmer since 2009 before that I just search things, make small projects and now I am sharing my knowledge through this platform.I also work as a freelancer and did many projects related to programming and electrical circuitry. <a href=https://plus.google.com/+SyedZainNasir/>My Google Profile+</a>

Share
Published by
Syed Zain Nasir