Bachelor Thesis

Abstract

Research Overview

The demand for high-speed and energy-efficient arithmetic circuits is rapidly growing in modern digital systems, particularly in domains such as image processing, machine learning and signal processing, where exact computation is not always essential. This thesis presents the design and implementation of an approximate 8×8 signed multiplier using FPGA technology, aiming to strike an optimal balance between accuracy, performance and hardware efficiency.

The proposed design features a three-stage architecture: the first stage employs Radix-4 Booth encoding to reduce the number of partial product rows from eight to four, minimizing hardware complexity and enhancing speed; the second stage utilizes the FPGA's carry chain to compress the top two rows into one, effectively reducing the partial products to three rows; and in the final stage, FPGA's LUT6_2 primitives are used column-wise to directly compute the final product, eliminating the need for a traditional multi-level adder tree.

The design is described in VHDL and synthesized on an FPGA platform. Simulation and synthesis results reveal significant improvements in area utilization and propagation delay, with acceptable accuracy for approximate computing applications. Notably, the proposed multiplier achieves a maximum error of 3 and an average error of approximately 1.5, making it well-suited for error-tolerant applications in resource-constrained environments.

Approximate Multiplier Radix-4 Booth Encoding Carry Chain LUT6_2 FPGA VHDL Partial Product Reduction Signed Multiplication

Thesis Chapters

Chapter 1: Introduction

Overview of multiplication in digital systems, motivation for approximate multipliers, importance of signed multiplication, and role of FPGA in arithmetic design.

Chapter 2: Literature Review

Review of previous work on exact and approximate multiplier design, signed arithmetic architectures and FPGA-based optimization.

Chapter 3: FPGA Architecture

Details of modern FPGA architectures, configurable logic blocks, look-up tables (LUTs), CARRY4, and synthesis techniques.

Chapter 4: Methodology

Design flow overview, partial product generation using Radix-4 encoding, carry chain-based compression, and LUT6_2-based final result calculation.

Chapter 5: Results & Discussion

Hardware performance results, error analysis, Vivado simulation results, and analysis of accuracy trade-off.

Chapter 6: Conclusion & Future Work

Summary of findings, conclusions, and future research directions for extending the design.

Key Results

LUT Utilization

Lowest LUT usage compared to existing designs (Nagar: 74, Ullah: 54, Rehman: 94)

Critical Path Delay

4.42 ns

Competitive delay performance while maintaining low resource utilization

Maximum Error

Bounded and predictable error profile suitable for error-tolerant applications

Average Error

~1.5

Low average error making it suitable for image processing and ML applications

Design Methodology

Partial Product Generation

Reduces partial product rows using Radix-4 Booth encoding with Baugh-Wooley's algorithm

➡️⬅️

Partial Product Reduction

Uses FPGA-native CARRY4 to compress partial product rows

🧮

Final Product Calculation

Employs LUT6_2 primitives for column-wise final product computation with approximation in LSBs

Research Team

Mahbub Hasan Apu

Student ID: 2019338518

B.Sc. in EEE

Sanghapriyo Choudhury

Student ID: 2019338506