The Engineering Projects

ESP32 Projects

ESP32-CAM-Based Real-Time Face Detection and Counting System

Hello friends. We hope you are doing fine. Today we are back with another interesting project. It is based on the image processing technology. Developing efficient and cost-effective solutions for real-time applications is becoming increasingly important in the area of embedded systems and computer vision. This project makes full use of ESP32-CAM. ESP32-CAM is a compact and AI-enabled microcontroller with built-in Wi-Fi capabilities. We will create a real-time face detection and counting system.

The ESP32-CAM serves as the core of the system. It captures high-resolution images at 800x600 resolution and hosts an HTTP server to serve individual JPEG images over a local network. The device’s efficient JPEG compression and network capabilities ensure minimal latency while maintaining high-quality image delivery, enabling real-time processing on the client side.

On the client side, a Python application powered by OpenCV collects image frames from the ESP32-CAM. Using Haar cascade classifiers, the application detects faces in each frame. It can also figure out whether they are frontal or in profile orientation.

This project is focused on face detection and counting. It marks detected faces with bounding boxes. It also counts both frontal and profile faces seen in the video stream.

Applications of this face detection and counting system include smart attendance systems, people flow monitoring in public spaces, and automation solutions in retail or event management. This project demonstrates how IoT-enabled devices like the ESP32-CAM can work seamlessly with computer vision algorithms to provide cost-effective and reliable solutions for real-world challenges. By focusing solely on face detection and counting, the system achieves an optimal balance between simplicity, scalability, and computational efficiency.

System Architecture of Face Counting with ESP32-CAM and Python

1. Hardware Layer:

ESP32-CAM:

Captures images at a resolution of 800x600 (or specified resolution).
Serves captured images over an HTTP server at a specific endpoint (e.g., /cam-hi.jpg).
Configured to operate as an access point or station mode connected to Wi-Fi.

Network Connection:

Wi-Fi provides communication between the ESP32-CAM and the Python application running on a computer.

Computer:

Runs the Python application to process the images and display results.

2. Software Layer:

ESP32-CAM Firmware:

Configures the camera for capturing images.
Sets up a lightweight HTTP server to serve JPEG images to connected clients.

Python Application:

Fetches images from the ESP32-CAM.
Processes images to count and annotate detected faces.

3. Communication Layer:

HTTP Protocol:

The ESP32-CAM serves images using HTTP.
The Python application uses HTTP GET requests to fetch the images from the camera.

4. Face Detection and Processing Layer:

Image Acquisition:

Python fetches images from the ESP32-CAM endpoint.

Preprocessing:

Converts the fetched image to a format suitable for OpenCV operations (e.g., cv2.imdecode to convert byte data into an image).

Face Detection:

Uses OpenCV's Haar Cascade classifiers to detect:

Frontal Faces: Uses haarcascade_frontalface_default.xml.
Profile Faces: Uses haarcascade_profileface.xml.

Counts the number of faces detected in the current frame.

Annotation:

Draws bounding boxes (rectangles) and labels around detected faces on the image frame.
Adds text overlays to display the count of detected frontal and profile faces.

5. User Interface Layer:

Visual Output:

Displays the annotated frames with bounding boxes and face counts in a real-time OpenCV window titled "Face Detector."

User Interaction:

Allows the user to terminate the application by pressing the 'q' key.

6. Workflow Summary:

Image Capture:

ESP32-CAM captures and serves the image.

Image Fetching:

Python retrieves the image via an HTTP GET request.

Processing and Detection:

Haar Cascade classifiers detect faces, count them, and annotate the frame.

Display and Output:

Python displays the processed image in a GUI window with visual feedback for face counts.

Loop and Termination:

The loop continues until the user exits.

List of components

Components	Quantity
ESP32-CAM WiFi + Bluetooth Camera Module	1
FTDI USB to Serial Converter 3V3-5V	1
Male-to-female jumper wires	4
Female-to-female jumper wire	1
MicroUSB data cable	1

Circuit diagram

The following is the circuit diagram for this project.

Fig: Circuit diagram

ESP32-CAM WiFi + Bluetooth Camera Module	FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position)
5V	VCC
GND	GND
UOT	Rx
UOR	TX
IO0	GND (FTDI or ESP32-CAM)

Programming

Board installation

If it is your first project with any board of the ESP32 series, you need to do the board installation first. If ESP32 boards are already installed in your Arduino IDE, you can skip this installation section. You may also need to install the CP210x USB driver.

Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK.

Fig: Board Installation

Go to Tools>Board>Boards Manager and install the ESP32 boards.

Fig: Board Installation

Install the ESP32-CAM library.

Download the ESP32-CAM library from Github (the link is given in the reference section). Then install it by following the path sketch>include library> add.zip library.

Now select the correct path to the library, click on the library folder and press open.

Board selection and code uploading.

Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer by using a micro USB data cable. If the board has no connector, you have to connect the FTDI module to the computer with the data cable. If you never used the FTDI board on your computer, you will need to install the FTDI driver first.

After connecting the camera, Go to Tools>boards>esp32>Ai thinker ESP32-CAM

Fig: Camera board selection

After selecting the board, select the appropriate COM port and upload the following code:

#include

const char* WIFI_SSID = "Hamad";

const char* WIFI_PASS = "barsha123";

WebServer server(80);

static auto hiRes = esp32cam::Resolution::find(800, 600);

void serveJpg()

{

auto frame = esp32cam::capture();

if (frame == nullptr) {

Serial.println("CAPTURE FAIL");

server.send(503, "", "");

return;

}

Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

static_cast(frame->size()));

server.setContentLength(frame->size());

server.send(200, "image/jpeg");

WiFiClient client = server.client();

frame->writeTo(client);

}

void handleJpgHi()

{

if (!esp32cam::Camera.changeResolution(hiRes)) {

Serial.println("SET-HI-RES FAIL");

}

serveJpg();

}

void setup(){

Serial.begin(115200);

Serial.println();

{

using namespace esp32cam;

Config cfg;

cfg.setPins(pins::AiThinker);

cfg.setResolution(hiRes);

cfg.setBufferCount(2);

cfg.setJpeg(80);

bool ok = Camera.begin(cfg);

Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

}

WiFi.persistent(false);

WiFi.mode(WIFI_STA);

WiFi.begin(WIFI_SSID, WIFI_PASS);

while (WiFi.status() != WL_CONNECTED) {

delay(500);

}

Serial.print("http://");

Serial.println(WiFi.localIP());

Serial.println(" /cam-hi.jpg");

server.on("/cam-hi.jpg", handleJpgHi);

server.begin();

}

void loop()

{

server.handleClient();

}

After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.

Fig: Code successfully uploaded to ESP32-CAM

You have to copy the IP address and paste it into the following part of your Python code.

Python code

Haar Cascade Models

Face detection in this project relies on pre-trained Haar cascade models provided by OpenCV. These models are essential for detecting features like frontal and profile faces in images. Haar cascades are XML files containing trained data for specific object detection tasks. For this project, the following models are used:

Frontal Face Detection Model: haarcascade_frontalface_default.xml
Profile Face Detection Model: haarcascade_profileface.xml

These files are mandatory for the Python code to perform face detection. Below is a guide on how to download and set up these files.

Step 1: Downloading the Models

The Haar cascade models can be downloaded directly from OpenCV’s GitHub repository.

Open your web browser and go to the OpenCV GitHub repository for Haar cascades:
https://github.com/opencv/opencv/tree/master/data/haarcascades
Locate the following files in the repository:

haarcascade_frontalface_default.xml
haarcascade_profileface.xml

Click on each file to open its content.
On the file's page, click the Raw button to view the raw XML content.
Right-click and select Save As to download the file. Save it with its original filename (.xml extension) to the directory where your Python script (main.py) is saved.

Step 2: Placing the Files

Since the XML files are placed in the same directory as your Python script, there is no need to specify a separate folder in your code. Ensure the downloaded files are saved in the same directory as your script, as shown below:

project_folder/

├── main.py

├── haarcascade_frontalface_default.xml

└── haarcascade_profileface.xml

Step 3: Updating the Python Script

Update your script to load the models from the current directory. This requires referencing the XML files directly without a folder path:

frontal_face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier("haarcascade_profileface.xml")

Verifying the Setup

Ensure the XML files are saved in the same directory as the Python script.
Run the Python script. If the models load successfully, there will be no errors related to file loading, and face detection should function as expected.

By downloading the files and placing them in the same directory as your script, you simplify the setup and enable seamless face detection functionality.

Main python script

Copy-paste the following Python code and save it using a Python interpreter.

import cv2

import requests

import numpy as np

# Replace with your ESP32-CAM's IP address

ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"

# Load Haar Cascades for different types of face detection

frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")

def process_frame(frame):

# Convert to grayscale for detection

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Perform frontal face detection

frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))

# Perform profile face detection

profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))

# Draw rectangles for detected frontal faces

for (x, y, w, h) in frontal_faces:

cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2) # Red for frontal faces

cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)

# Draw rectangles for detected profile faces

for (x, y, w, h) in profile_faces:

cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2) # Blue for profile faces

cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

# Add detection counts to the frame

cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)

return frame

while True:

# Fetch an image from the ESP32-CAM

response = requests.get(ESP32_CAM_URL)

if response.status_code == 200:

img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)

frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)

# Process and display the frame

processed_frame = process_frame(frame)

cv2.imshow("Face Detector", processed_frame)

# Quit when 'q' is pressed

if cv2.waitKey(1) & 0xFF == ord('q'):

break

else:

print("Failed to fetch image from ESP32-CAM")

cv2.destroyAllWindows()

Setting Up Python Environment

Install Dependencies:

1)Create a virtual environment:
python -m venv venv

source venv/bin/activate # Linux/Mac

venv\Scripts\activate # Windows

2)Install required libraries:

pip install opencv-python numpy

After setting the Pythong Environment, run the Python code.

ESP32-CAM code breakdown

#include

#include : Adds support for creating a lightweight HTTP server.
#include : Allows the ESP32 to connect to Wi-Fi networks.
#include : Provides functions to control the ESP32-CAM module, including camera initialization and capturing images.

const char* WIFI_SSID = "SSID";

const char* WIFI_PASS = "password";

WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.

WebServer server(80);

WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).

static auto hiRes = esp32cam::Resolution::find(800, 600);

esp32cam::Resolution::find: Defines camera resolutions:

hiRes: High resolution (800x600).

void serveJpg()

{

auto frame = esp32cam::capture();

if (frame == nullptr) {

Serial.println("CAPTURE FAIL");

server.send(503, "", "");

return;

}

Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),

static_cast(frame->size()));

server.setContentLength(frame->size());

server.send(200, "image/jpeg");

WiFiClient client = server.client();

frame->writeTo(client);

}

esp32cam::capture: Captures a frame from the camera.
Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.
Logging Success: Prints the resolution and size of the captured image.
Serving the Image:

Sets the content length and MIME type as image/jpeg.
Writes the image data directly to the client.

void handleJpgHi()

{

if (!esp32cam::Camera.changeResolution(hiRes)) {

Serial.println("SET-HI-RES FAIL");

}

serveJpg();

}

handleJpgHi: Switches the camera to high resolution using esp32cam::Camera.changeResolution(hiRes) and calls serveJpg.
Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.

void setup(){

Serial.begin(115200);

Serial.println();

{

using namespace esp32cam;

Config cfg;

cfg.setPins(pins::AiThinker);

cfg.setResolution(hiRes);

cfg.setBufferCount(2);

cfg.setJpeg(80);

bool ok = Camera.begin(cfg);

Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");

}

WiFi.persistent(false);

WiFi.mode(WIFI_STA);

WiFi.begin(WIFI_SSID, WIFI_PASS);

while (WiFi.status() != WL_CONNECTED) {

delay(500);

}

Serial.print("http://");

Serial.println(WiFi.localIP());

Serial.println(" /cam-hi.jpg");

server.on("/cam-hi.jpg", handleJpgHi);

server.begin();

}

 Serial Initialization:

Initializes the serial port for debugging.
Sets baud rate to 115200.

 Camera Configuration:

Sets pins for the AI Thinker ESP32-CAM module.
Configures the default resolution, buffer count, and JPEG quality (80%).
Attempts to initialize the camera and logs the status.

 Wi-Fi Setup:

Connects to the specified Wi-Fi network in station mode.
Waits for the connection and logs the device's IP address.

 Web Server Routes:

Maps URL endpoint ( /cam-hi.jpg).
 Server Start:

Starts the web server.

void loop()

{

server.handleClient();

}

server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.

Summary of Workflow

The ESP32-CAM connects to Wi-Fi and starts a web server.
URL endpoint /cam-hi.jpg) lets the user request images at high resolution.
The camera captures an image and serves it to the client as a JPEG.
The system continuously handles new client requests.

Python code breakdown

Importing Libraries

import cv2

import requests

import numpy as np

cv2: OpenCV library for image processing.
requests: To fetch the image frames from the ESP32-CAM over HTTP.
numpy (np): For array operations, used here to handle the byte stream received from the ESP32-CAM.

ESP32-CAM URL

ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"

Replace this URL with the actual IP address of your ESP32-CAM on your local network. The endpoint "/cam-hi.jpg" returns the latest frame captured by the ESP32-CAM.

Loading Haar Cascades

frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")

Haar cascades are pre-trained classifiers provided by OpenCV to detect objects like faces.
haarcascade_frontalface_default.xml: Detects frontal faces.
haarcascade_profileface.xml: Detects side/profile faces.

Frame Processing Function

def process_frame(frame):

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY): Converts the image to grayscale, which is required by Haar cascades for face detection.

Frontal Face Detection

frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))

detectMultiScale: Detects objects in the image.

scaleFactor=1.1: Specifies how much the image size is reduced at each scale.
minNeighbors=5: Minimum number of neighbouring rectangles required for positive detection.
minSize=(20, 20): Minimum size of detected objects.

Profile Face Detection

profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))

Same as frontal detection but uses the profile cascade for side faces.

Drawing Rectangles for Faces

for (x, y, w, h) in frontal_faces:

cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)

cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)

Draws a red rectangle around each detected frontal face.
Adds the label "Frontal Face" above the rectangle.

for (x, y, w, h) in profile_faces:

cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)

cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

Draws a blue rectangle for each detected profile face.
Labels it as "Profile Face."

Adding Face Counts

cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)

Displays the count of detected frontal and profile faces on the top-left of the frame.

Main Loop

while True:

response = requests.get(ESP32_CAM_URL)

Continuously fetches images from the ESP32-CAM.

Handle the Image Response

if response.status_code == 200:

img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)

frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)

Converts the HTTP response to a NumPy array.
Decodes the byte array into an OpenCV image using cv2.imdecode.

Process and Display the Frame

processed_frame = process_frame(frame)

cv2.imshow("Face Detector", processed_frame)

Processes the frame using the process_frame function.
Displays the processed frame in a window titled "Face Detector."

Quit on Key Press

if cv2.waitKey(1) & 0xFF == ord('q'):

break

Checks if the 'q' key is pressed to exit the loop.

Error Handling

else:

print("Failed to fetch image from ESP32-CAM")

Prints an error message if the ESP32-CAM fails to provide an image.

Clean Up

cv2.destroyAllWindows()

Closes all OpenCV windows when the program exits.

Summary of the Workflow

Setup:

The code connects to the ESP32-CAM via its IP address to fetch image frames in real time.
It loads pre-trained Haar Cascade classifiers for detecting frontal and profile faces.

Continuous Image Fetching:

The program enters a loop where it fetches a new image frame from the ESP32-CAM using an HTTP GET request.

Image Processing:

The image is converted into a format usable by OpenCV.
The frame is processed to:

Convert it to grayscale (required for Haar Cascade detection).
Detect frontal faces and profile faces using the respective classifiers.

Face Detection and Visualization:

For each detected face:

A rectangle is drawn around it:

Red for frontal faces.
Blue for profile faces.

A label ("Frontal Face" or "Profile Face") is added above the rectangle.

The count of detected frontal and profile faces is displayed on the frame.

Display:

The processed frame, with visual indicators and counts, is displayed in a window titled "Face Detector."

User Interaction:

The program continues fetching, processing, and displaying frames until the user presses the 'q' key to quit.

Error Handling:

If the ESP32-CAM fails to provide an image, an error message is printed, and the loop continues.

Cleanup:

Upon exiting the loop, all OpenCV windows are closed to release resources.

Key Workflow Steps:

Fetch Image → 2. Convert Image → 3. Detect Faces → 4. Annotate Frame → 5. Display Frame → 6. Repeat Until Exit.

Testing

Power up the ESP32-CAM and connect it to Wi-Fi.
Run the Python script. Make sure that the ESP32-CAM URL is correctly set.
See the result of counting the faces in the display.
You can test with real-life people and photos.

Fig: Face counting

Troubleshooting:

Guru Meditation Error: Ensure stable power to the ESP32-CAM.
No Image Display: Check the IP address and ensure the ESP32-CAM is accessible from your computer.
Library Conflicts: Use a virtual environment to isolate Python dependencies.
Dots at the time of uploading the code: Immediately press the RST button.
Multiple failed upload attempts despite pressing the RST button: Restart your computer and try again.

To wrap up

This project demonstrates an effective implementation of a face-counting system using ESP32-CAM and Python. The system uses the ESP32-CAM’s capability to capture and serve high-resolution images over HTTP. The Python client uses OpenCV's Haar cascade classifiers to effectively detect and count frontal and profile faces in each frame. It provides real-time feedback.

This project can be adapted for various applications, such as crowd monitoring, security, and smart building management. It provides an affordable and flexible solution.

Future improvements can be made using advanced face detection algorithms like DNN-based models. This project highlights how simple hardware and software integration can address complex problems in computer vision.

JLCPCB – Prototype 10 PCBs for $2 (For Any Color)
China’s Largest PCB Prototype Enterprise, 600,000+ Customers & 10,000+ Online Orders Daily
How to Get PCB Cash Coupon from JLCPCB: https://bit.ly/2GMCH9w

Syed Zain Nasir

I am Syed Zain Nasir, the founder of <a href=https://www.TheEngineeringProjects.com/>The Engineering Projects</a> (TEP). I am a programmer since 2009 before that I just search things, make small projects and now I am sharing my knowledge through this platform.I also work as a freelancer and did many projects related to programming and electrical circuitry. <a href=https://plus.google.com/+SyedZainNasir/>My Google Profile+</a>

Next pH Sensor Library for Proteus »

Previous « Introduction to Nucleo Development Board

ESP32-CAM-Based Real-Time Face Detection and Counting System

System Architecture of Face Counting with ESP32-CAM and Python

1. Hardware Layer:

2. Software Layer:

3. Communication Layer:

4. Face Detection and Processing Layer:

5. User Interface Layer:

6. Workflow Summary:

List of components

Circuit diagram

Programming

Board installation

Install the ESP32-CAM library.

Board selection and code uploading.

Python code

Haar Cascade Models

Step 1: Downloading the Models

Step 2: Placing the Files

Step 3: Updating the Python Script

Verifying the Setup

Main python script

Setting Up Python Environment

Install Dependencies:

ESP32-CAM code breakdown

Summary of Workflow

Python code breakdown

Importing Libraries

ESP32-CAM URL

Loading Haar Cascades

Frame Processing Function

Frontal Face Detection

Profile Face Detection

Drawing Rectangles for Faces

Adding Face Counts

Main Loop

Handle the Image Response

Process and Display the Frame

Quit on Key Press

Error Handling

Summary of the Workflow

Key Workflow Steps:

Testing

Troubleshooting:

To wrap up

Related Post