![]() | Only 14 pages are availabe for public view |
Abstract Matrix multiplication is a frequently used kernel operation in a variety of graphic, image, robotics and signal processing applications. Increasing FPGAs complexities demands the need for fast and efficient designs. New algorithms and architectures have been developed for matrix multiplication on configurable hardware. These algorithms reduce latency as well as area. The focus of this work is to study a new implementation of matrix multiplication, enhance and implement on an FPGA device In this thesis we present a new architecture for matrix multiplication using new Xilinx Virtex4 device. The architecture effectively utilizes the hardware resources on the entire FPGA and makes use of DSP blocks inside the FPGA devices. The architecture also reduces the routing complexity. Our architecture can be implemented for non-square matrix multiplication. The proposed implementation shows improvement in area and latency compared to recent published work. An improvement by over 50% in maximum operating frequency and 20% in area using new FPGAs has been achieved. |