Hello friends. We hope you are doing fine. Today we are back with another interesting project. It is based on the image processing technology. Developing efficient and cost-effective solutions for real-time applications is becoming increasingly important in the area of embedded systems and computer vision. This project makes full use of ESP32-CAM. ESP32-CAM is a compact and AI-enabled microcontroller with built-in Wi-Fi capabilities. We will create a real-time face detection and counting system.
The ESP32-CAM serves as the core of the system. It captures high-resolution images at 800x600 resolution and hosts an HTTP server to serve individual JPEG images over a local network. The device’s efficient JPEG compression and network capabilities ensure minimal latency while maintaining high-quality image delivery, enabling real-time processing on the client side.
On the client side, a Python application powered by OpenCV collects image frames from the ESP32-CAM. Using Haar cascade classifiers, the application detects faces in each frame. It can also figure out whether they are frontal or in profile orientation.
This project is focused on face detection and counting. It marks detected faces with bounding boxes. It also counts both frontal and profile faces seen in the video stream.
Applications of this face detection and counting system include smart attendance systems, people flow monitoring in public spaces, and automation solutions in retail or event management. This project demonstrates how IoT-enabled devices like the ESP32-CAM can work seamlessly with computer vision algorithms to provide cost-effective and reliable solutions for real-world challenges. By focusing solely on face detection and counting, the system achieves an optimal balance between simplicity, scalability, and computational efficiency.
ESP32-CAM:
Captures images at a resolution of 800x600 (or specified resolution).
Serves captured images over an HTTP server at a specific endpoint (e.g., /cam-hi.jpg).
Configured to operate as an access point or station mode connected to Wi-Fi.
Network Connection:
Wi-Fi provides communication between the ESP32-CAM and the Python application running on a computer.
Computer:
Runs the Python application to process the images and display results.
ESP32-CAM Firmware:
Configures the camera for capturing images.
Sets up a lightweight HTTP server to serve JPEG images to connected clients.
Python Application:
Fetches images from the ESP32-CAM.
Processes images to count and annotate detected faces.
HTTP Protocol:
The ESP32-CAM serves images using HTTP.
The Python application uses HTTP GET requests to fetch the images from the camera.
Image Acquisition:
Python fetches images from the ESP32-CAM endpoint.
Preprocessing:
Converts the fetched image to a format suitable for OpenCV operations (e.g., cv2.imdecode to convert byte data into an image).
Face Detection:
Uses OpenCV's Haar Cascade classifiers to detect:
Frontal Faces: Uses haarcascade_frontalface_default.xml.
Profile Faces: Uses haarcascade_profileface.xml.
Counts the number of faces detected in the current frame.
Annotation:
Draws bounding boxes (rectangles) and labels around detected faces on the image frame.
Adds text overlays to display the count of detected frontal and profile faces.
Visual Output:
Displays the annotated frames with bounding boxes and face counts in a real-time OpenCV window titled "Face Detector."
User Interaction:
Allows the user to terminate the application by pressing the 'q' key.
Image Capture:
ESP32-CAM captures and serves the image.
Image Fetching:
Python retrieves the image via an HTTP GET request.
Processing and Detection:
Haar Cascade classifiers detect faces, count them, and annotate the frame.
Display and Output:
Python displays the processed image in a GUI window with visual feedback for face counts.
Loop and Termination:
The loop continues until the user exits.
Components |
Quantity |
ESP32-CAM WiFi + Bluetooth Camera Module |
1 |
FTDI USB to Serial Converter 3V3-5V |
1 |
Male-to-female jumper wires |
4 |
Female-to-female jumper wire |
1 |
MicroUSB data cable |
1 |
The following is the circuit diagram for this project.
Fig: Circuit diagram
ESP32-CAM WiFi + Bluetooth Camera Module |
FTDI USB to Serial Converter 3V3-5V (Voltage selection button should be in 5V position) |
---|---|
5V |
VCC |
GND |
GND |
UOT |
Rx |
UOR |
TX |
IO0 |
GND (FTDI or ESP32-CAM) |
If it is your first project with any board of the ESP32 series, you need to do the board installation first. If ESP32 boards are already installed in your Arduino IDE, you can skip this installation section. You may also need to install the CP210x USB driver.
Go to File > preferences, type https://dl.espressif.com/dl/package_esp32_index.json and click OK.
Fig: Board Installation
Go to Tools>Board>Boards Manager and install the ESP32 boards.
Fig: Board Installation
Download the ESP32-CAM library from Github (the link is given in the reference section). Then install it by following the path sketch>include library> add.zip library.
Now select the correct path to the library, click on the library folder and press open.
Connect the camera board to your computer. Some camera boards come with a micro USB connector of their own. You can connect the camera to the computer by using a micro USB data cable. If the board has no connector, you have to connect the FTDI module to the computer with the data cable. If you never used the FTDI board on your computer, you will need to install the FTDI driver first.
After connecting the camera, Go to Tools>boards>esp32>Ai thinker ESP32-CAM
Fig: Camera board selection
After selecting the board, select the appropriate COM port and upload the following code:
#include
#include
#include
const char* WIFI_SSID = "Hamad";
const char* WIFI_PASS = "barsha123";
WebServer server(80);
static auto hiRes = esp32cam::Resolution::find(800, 600);
void serveJpg()
{
auto frame = esp32cam::capture();
if (frame == nullptr) {
Serial.println("CAPTURE FAIL");
server.send(503, "", "");
return;
}
Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),
static_cast
server.setContentLength(frame->size());
server.send(200, "image/jpeg");
WiFiClient client = server.client();
frame->writeTo(client);
}
void handleJpgHi()
{
if (!esp32cam::Camera.changeResolution(hiRes)) {
Serial.println("SET-HI-RES FAIL");
}
serveJpg();
}
void setup(){
Serial.begin(115200);
Serial.println();
{
using namespace esp32cam;
Config cfg;
cfg.setPins(pins::AiThinker);
cfg.setResolution(hiRes);
cfg.setBufferCount(2);
cfg.setJpeg(80);
bool ok = Camera.begin(cfg);
Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");
}
WiFi.persistent(false);
WiFi.mode(WIFI_STA);
WiFi.begin(WIFI_SSID, WIFI_PASS);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
Serial.print("http://");
Serial.println(WiFi.localIP());
Serial.println(" /cam-hi.jpg");
server.on("/cam-hi.jpg", handleJpgHi);
server.begin();
}
void loop()
{
server.handleClient();
}
After uploading the code, disconnect the IO0 pin of the camera from GND. Then press the RST pin. The following messages will appear.
Fig: Code successfully uploaded to ESP32-CAM
You have to copy the IP address and paste it into the following part of your Python code.
Face detection in this project relies on pre-trained Haar cascade models provided by OpenCV. These models are essential for detecting features like frontal and profile faces in images. Haar cascades are XML files containing trained data for specific object detection tasks. For this project, the following models are used:
Frontal Face Detection Model: haarcascade_frontalface_default.xml
Profile Face Detection Model: haarcascade_profileface.xml
These files are mandatory for the Python code to perform face detection. Below is a guide on how to download and set up these files.
The Haar cascade models can be downloaded directly from OpenCV’s GitHub repository.
Open your web browser and go to the OpenCV GitHub repository for Haar cascades:
https://github.com/opencv/opencv/tree/master/data/haarcascades
Locate the following files in the repository:
haarcascade_frontalface_default.xml
haarcascade_profileface.xml
Click on each file to open its content.
On the file's page, click the Raw button to view the raw XML content.
Right-click and select Save As to download the file. Save it with its original filename (.xml extension) to the directory where your Python script (main.py) is saved.
Since the XML files are placed in the same directory as your Python script, there is no need to specify a separate folder in your code. Ensure the downloaded files are saved in the same directory as your script, as shown below:
project_folder/
├── main.py
├── haarcascade_frontalface_default.xml
└── haarcascade_profileface.xml
Update your script to load the models from the current directory. This requires referencing the XML files directly without a folder path:
frontal_face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
profile_face_cascade = cv2.CascadeClassifier("haarcascade_profileface.xml")
Ensure the XML files are saved in the same directory as the Python script.
Run the Python script. If the models load successfully, there will be no errors related to file loading, and face detection should function as expected.
By downloading the files and placing them in the same directory as your script, you simplify the setup and enable seamless face detection functionality.
Copy-paste the following Python code and save it using a Python interpreter.
import cv2
import requests
import numpy as np
# Replace with your ESP32-CAM's IP address
ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"
# Load Haar Cascades for different types of face detection
frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")
def process_frame(frame):
# Convert to grayscale for detection
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Perform frontal face detection
frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
# Perform profile face detection
profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
# Draw rectangles for detected frontal faces
for (x, y, w, h) in frontal_faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2) # Red for frontal faces
cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
# Draw rectangles for detected profile faces
for (x, y, w, h) in profile_faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2) # Blue for profile faces
cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
# Add detection counts to the frame
cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)
return frame
while True:
# Fetch an image from the ESP32-CAM
response = requests.get(ESP32_CAM_URL)
if response.status_code == 200:
img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)
frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
# Process and display the frame
processed_frame = process_frame(frame)
cv2.imshow("Face Detector", processed_frame)
# Quit when 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
print("Failed to fetch image from ESP32-CAM")
cv2.destroyAllWindows()
1)Create a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
2)Install required libraries:
pip install opencv-python numpy
After setting the Pythong Environment, run the Python code.
#include
#include
#include
#include
#include
#include
const char* WIFI_SSID = "SSID";
const char* WIFI_PASS = "password";
WIFI_SSID and WIFI_PASS: Define the SSID and password of the Wi-Fi network that the ESP32 will connect to.
WebServer server(80);
WebServer server(80): Creates an HTTP server instance that listens on port 80 (default HTTP port).
static auto hiRes = esp32cam::Resolution::find(800, 600);
esp32cam::Resolution::find: Defines camera resolutions:
hiRes: High resolution (800x600).
void serveJpg()
{
auto frame = esp32cam::capture();
if (frame == nullptr) {
Serial.println("CAPTURE FAIL");
server.send(503, "", "");
return;
}
Serial.printf("CAPTURE OK %dx%d %db\n", frame->getWidth(), frame->getHeight(),
static_cast
server.setContentLength(frame->size());
server.send(200, "image/jpeg");
WiFiClient client = server.client();
frame->writeTo(client);
}
esp32cam::capture: Captures a frame from the camera.
Failure Handling: If no frame is captured, it logs a failure and sends a 503 error response.
Logging Success: Prints the resolution and size of the captured image.
Serving the Image:
Sets the content length and MIME type as image/jpeg.
Writes the image data directly to the client.
void handleJpgHi()
{
if (!esp32cam::Camera.changeResolution(hiRes)) {
Serial.println("SET-HI-RES FAIL");
}
serveJpg();
}
handleJpgHi: Switches the camera to high resolution using esp32cam::Camera.changeResolution(hiRes) and calls serveJpg.
Error Logging: If the resolution change fails, it logs a failure message to the Serial Monitor.
void setup(){
Serial.begin(115200);
Serial.println();
{
using namespace esp32cam;
Config cfg;
cfg.setPins(pins::AiThinker);
cfg.setResolution(hiRes);
cfg.setBufferCount(2);
cfg.setJpeg(80);
bool ok = Camera.begin(cfg);
Serial.println(ok ? "CAMERA OK" : "CAMERA FAIL");
}
WiFi.persistent(false);
WiFi.mode(WIFI_STA);
WiFi.begin(WIFI_SSID, WIFI_PASS);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
}
Serial.print("http://");
Serial.println(WiFi.localIP());
Serial.println(" /cam-hi.jpg");
server.on("/cam-hi.jpg", handleJpgHi);
server.begin();
}
Serial Initialization:
Initializes the serial port for debugging.
Sets baud rate to 115200.
Camera Configuration:
Sets pins for the AI Thinker ESP32-CAM module.
Configures the default resolution, buffer count, and JPEG quality (80%).
Attempts to initialize the camera and logs the status.
Wi-Fi Setup:
Connects to the specified Wi-Fi network in station mode.
Waits for the connection and logs the device's IP address.
Web Server Routes:
Maps URL endpoint ( /cam-hi.jpg).
Server Start:
Starts the web server.
void loop()
{
server.handleClient();
}
server.handleClient(): Continuously listens for incoming HTTP requests and serves responses based on the defined endpoints.
The ESP32-CAM connects to Wi-Fi and starts a web server.
URL endpoint /cam-hi.jpg) lets the user request images at high resolution.
The camera captures an image and serves it to the client as a JPEG.
The system continuously handles new client requests.
import cv2
import requests
import numpy as np
cv2: OpenCV library for image processing.
requests: To fetch the image frames from the ESP32-CAM over HTTP.
numpy (np): For array operations, used here to handle the byte stream received from the ESP32-CAM.
ESP32_CAM_URL = "http://192.168.1.104/cam-hi.jpg"
Replace this URL with the actual IP address of your ESP32-CAM on your local network. The endpoint "/cam-hi.jpg" returns the latest frame captured by the ESP32-CAM.
frontal_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
profile_face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_profileface.xml")
Haar cascades are pre-trained classifiers provided by OpenCV to detect objects like faces.
haarcascade_frontalface_default.xml: Detects frontal faces.
haarcascade_profileface.xml: Detects side/profile faces.
def process_frame(frame):
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY): Converts the image to grayscale, which is required by Haar cascades for face detection.
frontal_faces = frontal_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
detectMultiScale: Detects objects in the image.
scaleFactor=1.1: Specifies how much the image size is reduced at each scale.
minNeighbors=5: Minimum number of neighbouring rectangles required for positive detection.
minSize=(20, 20): Minimum size of detected objects.
profile_faces = profile_face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(20, 20))
Same as frontal detection but uses the profile cascade for side faces.
for (x, y, w, h) in frontal_faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)
cv2.putText(frame, "Frontal Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
Draws a red rectangle around each detected frontal face.
Adds the label "Frontal Face" above the rectangle.
for (x, y, w, h) in profile_faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.putText(frame, "Profile Face", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
Draws a blue rectangle for each detected profile face.
Labels it as "Profile Face."
cv2.putText(frame, f"Frontal Faces: {len(frontal_faces)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.putText(frame, f"Profile Faces: {len(profile_faces)}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), 2)
Displays the count of detected frontal and profile faces on the top-left of the frame.
while True:
response = requests.get(ESP32_CAM_URL)
Continuously fetches images from the ESP32-CAM.
if response.status_code == 200:
img_arr = np.asarray(bytearray(response.content), dtype=np.uint8)
frame = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
Converts the HTTP response to a NumPy array.
Decodes the byte array into an OpenCV image using cv2.imdecode.
processed_frame = process_frame(frame)
cv2.imshow("Face Detector", processed_frame)
Processes the frame using the process_frame function.
Displays the processed frame in a window titled "Face Detector."
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Checks if the 'q' key is pressed to exit the loop.
else:
print("Failed to fetch image from ESP32-CAM")
Prints an error message if the ESP32-CAM fails to provide an image.
Clean Up
cv2.destroyAllWindows()
Closes all OpenCV windows when the program exits.
Setup:
The code connects to the ESP32-CAM via its IP address to fetch image frames in real time.
It loads pre-trained Haar Cascade classifiers for detecting frontal and profile faces.
Continuous Image Fetching:
The program enters a loop where it fetches a new image frame from the ESP32-CAM using an HTTP GET request.
Image Processing:
The image is converted into a format usable by OpenCV.
The frame is processed to:
Convert it to grayscale (required for Haar Cascade detection).
Detect frontal faces and profile faces using the respective classifiers.
Face Detection and Visualization:
For each detected face:
A rectangle is drawn around it:
Red for frontal faces.
Blue for profile faces.
A label ("Frontal Face" or "Profile Face") is added above the rectangle.
The count of detected frontal and profile faces is displayed on the frame.
Display:
The processed frame, with visual indicators and counts, is displayed in a window titled "Face Detector."
User Interaction:
The program continues fetching, processing, and displaying frames until the user presses the 'q' key to quit.
Error Handling:
If the ESP32-CAM fails to provide an image, an error message is printed, and the loop continues.
Cleanup:
Upon exiting the loop, all OpenCV windows are closed to release resources.
Fetch Image → 2. Convert Image → 3. Detect Faces → 4. Annotate Frame → 5. Display Frame → 6. Repeat Until Exit.
Power up the ESP32-CAM and connect it to Wi-Fi.
Run the Python script. Make sure that the ESP32-CAM URL is correctly set.
See the result of counting the faces in the display.
You can test with real-life people and photos.
Fig: Face counting
Guru Meditation Error: Ensure stable power to the ESP32-CAM.
No Image Display: Check the IP address and ensure the ESP32-CAM is accessible from your computer.
Library Conflicts: Use a virtual environment to isolate Python dependencies.
Dots at the time of uploading the code: Immediately press the RST button.
Multiple failed upload attempts despite pressing the RST button: Restart your computer and try again.
This project demonstrates an effective implementation of a face-counting system using ESP32-CAM and Python. The system uses the ESP32-CAM’s capability to capture and serve high-resolution images over HTTP. The Python client uses OpenCV's Haar cascade classifiers to effectively detect and count frontal and profile faces in each frame. It provides real-time feedback.
This project can be adapted for various applications, such as crowd monitoring, security, and smart building management. It provides an affordable and flexible solution.
Future improvements can be made using advanced face detection algorithms like DNN-based models. This project highlights how simple hardware and software integration can address complex problems in computer vision.
JLCPCB – Prototype 10 PCBs for $2 (For Any Color)
China’s Largest PCB Prototype Enterprise, 600,000+ Customers & 10,000+ Online Orders Daily
How to Get PCB Cash Coupon from JLCPCB: https://bit.ly/2GMCH9w