Image Recognition On FPGA using Neural Network (Ongoing)
For this project, I designed and implemented a custom neural network entirely in VHDL to perform image classification of fruits (apple, banana, orange) using a PYNQ-Z2 FPGA and a Raspberry Pi 5. The goal was to understand neural network hardware implementation from the ground up without relying on prebuilt accelerators or high-level frameworks.
Project Overview
The system was divided into three parts:
Model Training (TensorFlow)
A lightweight CNN was trained using TensorFlow on grayscale 32×32 image patches of fruits. The model included a convolutional layer, ReLU activation, and a fully connected output layer. After training, weights and biases were extracted and converted into fixed-point representations suitable for hardware.
Hardware Implementation (VHDL on PYNQ-Z2)
The entire inference pipeline—including convolution, activation, and classification—was implemented in VHDL. We manually wrote and tested each module, storing the model weights directly in the FPGA using constants or ROMs.
Preprocessing & Communication (Raspberry Pi 5)
The Pi captured 480p images, converted them to grayscale, and split them into patches. These were sent to the FPGA over UART, and the FPGA returned a classification result (apple/banana/orange).
Challenges & Solutions
Challenge: Fixed-point arithmetic
Neural networks typically use floating point, but FPGAs are more efficient with fixed-point math. I resolved this by carefully scaling and quantizing the weights and inputs, and testing the impact on accuracy.
Challenge: VHDL complexity
Implementing CNN logic like convolutions and matrix multiplies in VHDL was non-trivial. I modularized the design (e.g., separate conv, ReLU, FC blocks) and validated each block using simulation testbenches before integration.
Challenge: Communication between Pi and FPGA
Establishing reliable data transfer required synchronizing UART communication and handling delays. I used simple handshaking and buffering mechanisms to ensure data integrity.
Challenge: Limited FPGA resources
I optimized resource usage by limiting the number of filters, using a single-channel input, and reusing logic wherever possible.
Results
The final system could successfully classify 32×32 patches of fruit images in real-time, entirely on hardware, with the Raspberry Pi acting as the front-end. The project served as a strong foundation for understanding hardware-accelerated machine learning and low-level neural network design.