# Efficient MAC Unit Design for DSP Processors Using Multiplication and Accumulation Operations

Gayathri V<sup>1</sup> PG Student, Department of ECE KPR Institute of Engineering and Technology, Coimbatore. Kathirvelu M<sup>2</sup> Professor, Department of ECE KPR Institute of Engineering and Technology, Coimbatore. Yogeswari P<sup>3</sup> Research Scholar, Department of ECE, KPR Institute of Engineering and Technology, Coimbatore.

Abstract:- Digital signal processors are essential for intricate processes like filtering and convolution. CPU core integration into a single Integrated Circuit (IC) is widespread to satisfy growing processing needs. Multiply and Accumulate (MAC) units are essential for repeated addition and multiplication in DSP. Performance of the MAC unit has a big impact on the total speed of the DSP algorithm. It is suggested to create a high-speed MAC unit with a pipelined Brent Kung (BK) Adder design and the Vedic multiplier technique. A comparative study with a standard Brent Kung adder and a 32-bit MAC unit reveals that the suggested MAC unit has a speed boost of almost five times. The significance of novel designs, such as the pipelined brent kung Adder architecture and the Vedic multiplier technique, in improving MAC unit performance for digital signal processing applications is highlighted by the synthesis findings. All the designs were implemented on cadence genus EDA tool using Verilog code.

*Keywords:-* DSP Processors, Multiply And Accumulate (MAC) Unit, Vedic Multiplier, Pipelined Brent Kung Adder.

# I. INTRODUCTION

In the field of digital signal processing (DSP), the Multiply and Accumulate (MAC) unit is a crucial component. This is especially true for communication applications where real-time processing and effective arithmetic operations are critical. This all-encompassing project entails a careful investigation of several issues, starting with the creation of design goals that encompass a variety of critical elements. These include achieving faster processing speeds, stronger filtering powers, more efficient use of resources, lower power consumption, backward compatibility with current systems, scalability to meet changing needs, and steadfast dedication to reliable accuracy. The MAC unit's design is fundamentally influenced by the architectural decisions made, with a focus on the use of a pipelined structure. Compatibility, scalability, and smooth integration into larger signal processing systems are critical considerations that emphasize how comprehensive the MAC unit's implementation is. In conclusion, the process of implementing a pipelined MAC unit is intricate and subtle, balancing a wide range of design goals in order to satisfy the demanding requirements of vital signal processing applications. This story walks through the nuances of every step of the implementation process, explaining how architectural concerns, design decisions, and overall system performance interact.

## II. RELATED WORK

Basavoju Harish and M. S. S. Rukmini (2021) discuss the critical need for improved processing power in DSPs (Digital Signal Processors), which is accomplished by combining multiple CPU cores into a single integrated circuit [1]. The importance of multiply and accumulate units (MAC) in DSP is emphasized, especially for tasks like filtering and convolution. The Vedic multiplier approach and a pipelined Brent Kung adder architecture are used in the proposed paper to provide a high-speed MAC unit for arithmetic applications. The design exhibits a nearly fivefold boost in speed when compared to a 32-bit MAC unit utilizing a standard Brent Kung adder, highlighting the significance of novel techniques in improving MAC unit performance for DSP algorithms.

Shanmukh, S. Sai, and Madhu Sekar G (2020) presented a basic MAC Unit consisting of an accumulator, multiplier, and adder for DSP applications. The paper highlights the importance of efficiency in area and delay performances while highlighting the function of the MAC unit in DSP operations like convolution and signal filtering [2]. By limiting additions and multiplications, the Vedic Multiplier which was specially created with the Urdhava Thiryagbyam Sutra—is intended to minimize the area and latency of the MAC unit. After comparing several adders according to their Look-Up Tables (LUT) and delay factors, the study assesses the effectiveness of Vedic and Booth multipliers for the construction of MAC units, with all units being implemented in Verilog.

Rashmi Samanth and Vishnumurthy Kedlaya (2019) [3] highlight the critical role that Application-Specific Integrated Circuit (ASIC) designs play in determining the architecture of processors for cutting-edge technology, with a focus on the need for energy-efficient solutions. This work investigates the Volume 9, Issue 9, September – 2024

ISSN No:-2456-2165

https://doi.org/10.38124/ijisrt/IJISRT24SEP1652

design of a fixed-point multiply-accumulate unit with the goal of striking the best possible compromise between fast processing and low power consumption. A unique way to improve processing efficiency is shown, whereby a 2D image convolution process is accomplished by use of multiple Multiply-Accumulate (MAC) blocks strategically stacked and combined. Through the lens of ASIC exploration and strategic MAC unit use, this research holds significance in answering the changing demands of modern technologies by providing insights into energy-efficient processor design.

Ravi Shankar Mishra and Puran Gour (2020) concentrated on creating a low-power Multiply-Accumulate (MAC) architecture, which is crucial for processors used in digital signal processing (DSP). The multiplier, adder, and accumulator that make up the MAC unit are designed to minimize power dissipation in order to maximize energy efficiency. The study compares a traditional Baugh-Wooley multiplier utilizing an existing 2S-T full adder to a low-power multiplier employing a newly proposed 2S-T full adder design [5]. According to Cadence Virtuoso simulations, the suggested Baugh-Wooley multiplier with a complete adder achieves a notable power savings of 32.41 microwatts when compared to its traditional version, which uses 2.743 milliwatts. This study adds important knowledge to the development of MAC architectures for DSP processors that are power-efficient.

#### III. PROPOSED SYSTEM

#### A. Brent Kung Adder

A variation on carry look-ahead adders, the Brent-Kung (BK) adder was first presented by Hsiang Te Kung and Richard Peirce Brent. With less wiring complexity than the KS adder, it guarantees better performance and takes up less space. Prefixes are calculated for 2-bit groups in the BK adder, cascading through 4-bit and 8-bit groups. The 16-bit BK adder, which uses 2log2N-1 stages, uses 11 Gray Cells and 14 Black Cells, while the KS adder uses 15 Gray Cells and 36 Black Cells. When the BK adder runs at a high speed, [6-8] its performance is further improved by adding pipelined registers in four phases. In contrast to the bk adder, however, the design of Vedic multipliers and Multiply-Accumulate (MAC) units based on the pipelined brent-kung adder adds extra complexity and area. Pipelining is an essential digital circuit design approach that optimizes computing efficiency by segmenting complex processes into sequential steps. pipelining is very useful in the field of arithmetic circuits, where efficiency and speed are critical factors. Applying pipelining to the wellknown parallel prefix adder Brent-Kung adds a number of benefits that dramatically improve its overall performance. Fundamentally, pipelining is the division of a computation into several stages, each of which is responsible for managing a particular facet of the overall procedure [10].

### B. Vedic Multiplier

Vedic multiplier is a method of multiplication based on ancient Indian mathematical concepts from the Vedas, specifically from Tirthaji's "Vedic Mathematics." This multiplier uses a collection of equations called sutras to carry out multiplication operations quickly and effectively. One of the main Sutras in the Vedic is "Urdhva Tiryagbhyam," which means "Vertically and Crosswise." This method is renowned for being quick and easy to use when multiplying big numbers. In order to calculate the ultimate result, the Urdhva Tiryagbhyam Sutra multiplies integers both vertically and crosswise, uniting the partial products [11]. Compared to previous methods, the method reduces the number of individual multiplications necessary by breaking down the multiplication process into simpler phases.

The Vedic multiplier involves two crucial steps: multiplying numerals vertically and then multiplying pairs of digits crosswise. The ultimate product is formed by adding the partial products that were obtained in previous processes [12]. The method's simplicity makes it possible to implement hardware in digital circuits efficiently. The Vedic multiplier is efficient because it can take use of symmetries and patterns in the multiplication process, which minimizes the number of partial products and speeds up processing. The Vedic multiplier, which combines traditional mathematical ideas with contemporary computational requirements, is an interesting solution for digital systems that nevertheless require high-speed arithmetic operations. Its ease of use, quickness, and compatibility for hardware implementation make it an important tool in modern digital design.

### C. Proposed 32-Bit MAC Unit Using Brent Kung Adder

The foundation of the Multiply-Accumulate (MAC) unit that is being suggested is an intricately designed combination of cutting-edge architectural components. During the multiplication stage [13], the design cleverly incorporates a Pipelined Brent-Kung (BK) adder-based Vedic Multiplier (VM), taking advantage of its abilities to maximize throughput and speed. In addition, a Pipelined BK adder plays a crucial part in the adder phase, and a large 32-bit accumulator serves as the foundation for result storage, all of which work together to create a computational symphony. Two 32-bit numbers, A and B, are welcomed into the Vedic Multiplier, where the first product finds refuge in the accumulator's caverns, where the complex dance of the MAC unit takes place [14].

By removing the layers from this temporal tapestry, the delay may be divided into discrete periods, each of which adds to the MAC unit's spectacular performance. The pipelined bk adder dances through its four stages in four clock cycles, moving with the choreography and precision of a practiced ballet, while the accumulator takes centre stage in another clock cycle. The pipelining method, carefully integrated into the MAC unit's structure, works its magic in this carefully choreographed ballet of clock cycles. In addition to providing the luxury of parallel processing, where several activities Volume 9, Issue 9, September – 2024

International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

https://doi.org/10.38124/ijisrt/IJISRT24SEP1652

occur in unison, this tactical deployment also acts as a temporal alchemist, reducing the total delay by capturing the essence of time [15].

A comparison study, which compares the suggested 32bit MAC unit with a pipelined bk adder to its traditional counterparts with a standard bk adder, reveals a number of significant benefits. Using pipelined Brent-Kung adders as its Excalibur, the suggested design wins this high-tech battle and continues to operate at a speed that breaks through time constraints [16]. When compared to its stolid equivalent built on top of a standard Brent-Kung adder, it unravels computations five times faster.



This innovation in architectural approach, with a focus on the strategic integration of pipelining tactics, resonates as a harmonious chord in the melody of progress in the technological progression symphony. It acts as a lighthouse pointing the way ahead and promoting the use of cutting-edge design methods to negotiate the complex terrain of modern digital signal processing applications. The proposed MAC unit, the pinnacle of this technological symphony, breaks through traditional barriers to usher in a new era where architectural grace and computing efficiency unite to satisfy the ever-changing demands of the digital frontier.

## IV. RESULTS AND DISCUSSION

The experiment on the 32-bit MAC unit, which combines a Vedic multiplier with a brent-Kung adder, is a noteworthy development in digital signal processing. The parallelism and speed found in the Vedic multiplier and Brent-Kung adder are both utilized by this MAC unit. The Brent-Kung adder speeds up the addition and accumulation stages with its parallel structure, whereas the Vedic multiplier effectively computes the multiplication stage by decomposing it into simpler operations. When these elements are combined, a MAC unit with enhanced performance, decreased latency, and optimum resource utilization is produced. Figure 2 depicts a simulation of a conventional BK adder and vedic multiplier using a MAC unit, whereas Figure 3 shows a schematic diagram of the BK adder.

| Basine*=0<br>Caror-basine*=25m |            |          | ine e l |     | _      |         |     | _   |         |     |       |        |     |     |       |       |      |      |          | _      |        | 111 |
|--------------------------------|------------|----------|---------|-----|--------|---------|-----|-----|---------|-----|-------|--------|-----|-----|-------|-------|------|------|----------|--------|--------|-----|
| ana 4                          | e. Cour e. |          | 746     | 244 | 1944 1 | ers Des | 244 | pee | Nes Mer | 194 | P 144 | 2246 2 | 344 | 724 | 1.644 | 7.7%6 | 1944 | 94 P | 944 (21) | 66 (C) | e pres |     |
| -                              | *          |          |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| 1 40 H                         | <b>N</b> 1 |          |         |     |        | 10      |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| n 🕐 n la di                    |            |          |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| <ul> <li>April</li> </ul>      | 2.1        | <u>ا</u> |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| - 10 AD A                      | 2.1        |          |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| - <b>1</b> 17 11               | 5.0        | 10       |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| 1 TO 10                        | · h 17     | *        |         |     |        |         |     |     |         | 100 |       |        |     | 169 |       |       |      |      |          |        |        |     |
| × 💊 ±27.10                     |            | 10       |         |     |        | 145     |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |
| - 🗣 K07-18                     | 3.07       | 10       |         |     |        |         |     |     |         |     |       |        |     | 18  |       |       |      | - 0  | 17       |        |        |     |
|                                |            |          |         |     |        |         |     |     |         |     |       |        |     |     |       |       |      |      |          |        |        |     |

Fig 2. Simulation of a Conventional BK adder



Fig 3. Schematic Diagram of MAC Unit Using Regular BK Adder

A pipelined bk adder is an advanced digital circuit design that enhances computational efficiency through the application of pipelining techniques. A BK adder, which is a specific type of adder circuit known for its efficiency, each stage of the pipeline handles a specific part of the addition operation, and these stages operate concurrently on different sets of data [17].

As a result, the throughput of the adder is increased, allowing for faster computation of successive additions. The pipelined bk adder optimizes resource utilization by overlapping the execution of operations, Simulation results of MAC unit using pipelined BK adder and is shown in Fig 4 and the schematic diagram of 32-bit MAC unit using pipelined bk adder is shown in Fig 5 design that enhances computational efficiency through the application of pipelining techniques. A BK adder,



Fig 4. Simulation Results of MAC Unit Using Pipelined BK Adder



Fig. 5. Schematic Diagram of 32-Bit MAC Unit using Pipelined BK Adder

Breaking down this delay, it becomes evident that four clock cycles are attributed to the pipelined BK adder, operating across four stages, while an additional clock cycle is incurred by the accumulator. This meticulous orchestration of clock cycles not only facilitates parallel processing, where multiple operations unfold simultaneously, but also minimizes the overall delay in the MAC unit. Comparing this innovative design featuring a pipelined bk adder with a traditional MAC unit constructed with a regular bk adder reveals remarkable advantages. The proposed design, with its pipelined architecture, operates at a velocity five times faster than its non-pipelined counterpart. This significant enhancement in speed is particularly crucial in the realm of DSP algorithms, where rapid and efficient arithmetic computations are imperative for real-time processing. The proposed MAC unit, integrating a pipelined BK adder-based Vedic Multiplier, a pipelined adder, and a 32-bit accumulator, stands as a testament to the transformative power of advanced design techniques in meeting the escalating demands of modern digital signal processing applications [18].







Fig. 7. Power Comparison of 32-Bit MAC Unit Design





#### ISSN No:-2456-2165

https://doi.org/10.38124/ijisrt/IJISRT24SEP1652

The brent-kung adder constitutes an important advancement in digital circuit design, offering the promise of a reduced critical path delay and enhanced computational efficiency. Figure 5 serves as an illustrative guide, elucidating how this strategic incorporation of pipelining breaks down the computation into distinct pipeline stages. This architectural refinement holds the potential for a profound impact on throughput, particularly in scenarios where consecutive additions are executed. The fundamental principle underlying this enhancement lies in the segmentation of the computation process, allowing for parallelism and the concurrent execution of different stages, thereby mitigating the critical path delay. However, the adoption of pipelining does not come without trade-offs. As depicted in the introduction of pipelining may necessitate the incorporation of additional registers and interstage logic, potentially resulting in an increased area footprint compared to a regular Brent-Kung adder. The expansion of the circuit's spatial requirements must be carefully weighed against the benefits of reduced critical path delay and improved throughput. The diagram in Figure 6 visualizes this spatial trade-off, emphasizing the need for a judicious balance between computational efficiency and hardware resource utilization in the design process. The degree of pipelining and the determination of the optimal number of pipeline stages are critical considerations influencing overall area efficiency. Different applications and performance requirements may warrant varying degrees of pipelining, and finding the right balance is a nuanced task. Figure 7 shows that the power comparison and Fig 8 shows the area comparison of MAC unit.

| Table 1 Comparison | of 32-BIT BK Adder in MAC Unit Design  |
|--------------------|----------------------------------------|
| rable r comparison | 01 52-DIT DK Adder III MAC UIII Design |

| Parameters             | Regular BK Adder Based MAC Unit | Pipelined BK Adder Based MAC unit |  |  |  |  |
|------------------------|---------------------------------|-----------------------------------|--|--|--|--|
| No of Gates            | 803                             | 756                               |  |  |  |  |
| Timing (ns)            | 2.52 (ns)                       | 1.75 (ns)                         |  |  |  |  |
| Propagation Delay (ns) | 7.324                           | 1.972                             |  |  |  |  |
| Power Consumption (nW) | 789673.90                       | 515447.965                        |  |  |  |  |
| Area Utilization (%)   | 74                              | 65                                |  |  |  |  |

The integration of a pipelined brent-kung adder into a multiply-accumulate (MAC) unit represents a paradigm shift in digital circuit design, ushering in a host of improvements across key performance metrics such as timing, propagation delay, power consumption, and area utilization. The strategic introduction of pipeline stages in the Pipelined BK Adderbased MAC unit plays a pivotal role, fostering parallel processing and, in turn, elevating overall efficiency compared to its counterpart, the regular bk adder-based MAC unit.

# V. CONCLUSION

The synthesized results provide strong support for the suggested 32-bit Multiply-Accumulate (MAC)unit, which combines a Pipelined Brent-Kung adder and a Vedic Multiplier using the in a synergistic manner. The MAC unit is a very appealing option for high-performance Digital Signal Processing (DSP) applications due to its amazing five-fold speed boost. Although this speed increase is a noteworthy achievement, a thorough investigation into the effects on power consumption and area efficiency is necessary to determine the overall effect of the suggested MAC unit. The results of the synthesis demonstrate the effectiveness of the selected architectural components as well as the promising potential of the suggested MAC unit to further the development of digital signal processing technologies. The pipelined architecture operates about five times faster than the standard MAC unit with a regular Brent-Kung adder, indicating a significant speed enhancement when comparing the proposed MAC unit with the latter.

### REFERENCES

- [1]. Abdelgawad A, Harish I.," Low power multiply accumulate unit (MAC) for future wireless sensor networks". *In Proceedings of the IEEE sensors applications symposium* (pp. 129–132), Galveston, TX, USA., 2020.
- [2]. Abdelgawad, A., & Bayoumi, M.," High speed and area efficient multiply accumulate (MAC) unit for digital signal processing applications"., *In Proceedings of the IEEE international symposium on circuits and system* (pp. 3199–3202), New Orleans, LA, USA., 2021.
- [3]. Ahmed, H. O., Ghoneima, M., & Dessouky, M.," Concurrent MAC unit design using VHDL for deep learning networks on FPGA". In Proceedings of the IEEE symposium on computer applications industrial electronics (ISCAIE) (pp. 31–36), Penang, Malaysia.,2021.
- [4]. Balasubramanian, P., & Maskell, D. L. "Hardware optimized and error reduced approximate adder" *Electronics*, 8(11), 1212,https://doi.org/10. 3390/electronics8111212 .,2019.
- [5]. Bansal, Y., Madhu, C., & Kaur, P," High speed vedic multiplier designs"-A review. 2014 Recent Advances in Engineering and Computational Sciences (RAECS), 1-6. https://doi.org/10.1109/ RAECS.2014.6799502.,2018.

https://doi.org/10.38124/ijisrt/IJISRT24SEP1652

ISSN No:-2456-2165

- [6]. Camus V., Enz, C., & Verhelst, M." Survey of precisionscalable multiply-accumulate units for neural-network processing". *In Proceedings of the IEEE international conference on artificial intelligence circuits and systems* (AICAS) (pp. 57–61).,2021.
- [7]. Chan, P. K., Schlag, M. D. F., Thompson, C. D., & Oklobdzija, V. G." Delay optimization of carry skip adders and block carry-lookahead adders using multidimensional dynamic programming", *IEEE Transactions on Computers*, 41(8), 920-930, https://doi.org/10.1109/12.156534.,2020.
- [8]. Gomes SV, Sasipriya P, Bhaaskaran VSK. A low power multiplier using a 24-transistor latch adder. *Indian Journal of Science and Technology*. 2015 Aug; 8(18). DOI: 10.17485/ ijst/2015/v8i19/76866. 12.
- [9]. Gupta V., Mohanpatra D., Park S.P., Raghunathan A., and Roy K, "IMPACT: Precise adders for low power approximate computing," *in Proc. Int. Symp. Low Power Electron. Design*, pp. 409-414 3.,2018.
- [10]. Harish, B., Sivani, K., Rukmini, M. S. S., "Performance comparison of various CMOS full adders". In 2017 international conference on energy, communication, data analytics and soft computing (ICECDS) (pp. 3789–3792), Chennai. https://doi.org/10.1109/ICECDS.2017.8390172.,2021.
- [11]. Hoang, T. T., Sjalander, M., & Larsson-Edefors, P. "A highspeed, energy efcient two-cycle multiply-accumulate (MAC) architecture and its application to a doublethroughput MAC unit". *IEEE Transactions on Circuits* and Systems I, Regular Papers, 57(12), 3073– 3081.,2021.
- [12]. KakdeS, Khan S, Dakhole P, Badwaik S. Design of area and power aware reduced complexity Wallace tree multiplier.2015 *International Conference of IEEE*, *Pervasive Computing (ICPC); Pune*.2015 Jan 8-10. p. 1– 6. 10.
- [13]. Kulkarni P, Gupta P, and Ercegovac M.D., "Trading accuracy for power in a multiplier architecture," *J. Low Power Electron.*, vol. 7, no. 4, pp. 490-501., 2021.
- [14]. Kumar MS, Kumar DA, Samundiswary P. Design and performance analysis of Multiply-Accumulate (MAC) unit. 14th International Conference of IEEE, Circuits Power and Computing Technologies; Nagercoil. 2014 Mar 20-21. p. 1084–9. 8.
- [15]. Liang J, Han H, and Lomabardi F, "New metrics for the reliability of approximate and Probabilistic Adders," *IEEE Trans. Computers*, vol. 63, no.9, pp. 1760-17712.,20 2.,2021.
- [16]. Mukherjee A, Asati A. Generic modified baugh wooley multiplier. International Conference of IEEE, Circuits, Power and Computing Technologies; Nagercoil.2013 Mar 20-21. p. 746–51. 7.

- [17]. Rahman SA, Khanna G. Performance metrics analysis of 4-bit Array multiplier circuit using 2 PASCL logic. 2014 International Conference of IEEE, Green Computing Communication and Electrical Engineering (ICGCCEE); Coimbatore.2014 Mar 6-8. p. 1–5. 9.
- [18]. Senthilpari C, Diwakar K, Singh AK. High speed and high throughput 8x8 bit multiplier using a shannon-based adder cell.TENCON – *IEEE Region 10 Conference ;Singapore.* 2009 Jan 23-26. p. 1–5.