SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
BIT-SERIAL MULTIPLIER USING VERILOG HDL
A
Mini Project Report
Submitted in the Partial Fulfillment of the
Requirements
for the Award of the Degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted
By
K.BHARGAV 11885A0401
P.DEVSINGH 11885A0404
Under the Guidance of
Mr. S. RAJENDAR
Associate Professor
Department of ECE
Department of Electronics and Communication Engineering
VARDHAMAN COLLEGE OF ENGINEERING
(AUTONOMOUS)
(Approved by AICTE, Affiliated to JNTUH & Accredited by NBA)
2013- 14
VARDHAMAN COLLEGE OF ENGINEERING
(AUTONOMOUS)
Estd.1999 Shamshabad, Hyderabad – 501218
Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.) – 501 218, Hyderabad, A.P.
Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org
Department of Electronics and Communication Engineering
CERTIFICATE
This is to certify that the mini project report work entitled “Bit-Serial Multiplier Using
Verilog HDL” carried out by Mr. K.Bhargav, Roll Number 11885A0401, Mr. P.Devsingh, Roll
Number 11885A0404, submitted to the department of Electronics and Communication
Engineering, in partial fulfillment of the requirements for the award of degree of Bachelor of
Technology in Electronics and Communication Engineering during the year 2013 – 2014.
Name & Signature of the Supervisor
Mr. S. Rajendar
Associate Professor
Name & Signature of the HOD
Dr. J. V. R. Ravindra
Head, ECE
iii
ACKNOWLEDGEMENTS
The satisfaction that accompanies the successful completion of the task would be
put incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crown all the efforts with success.
I express my heartfelt thanks to Mr. S. Rajendar, Associate Professor, technical
seminar supervisor, for her suggestions in selecting and carrying out the in-depth study of
the topic. Her valuable guidance, encouragement and critical reviews really helped to
shape this report to perfection.
I wish to express my deep sense of gratitude to Dr. J. V. R. Ravindra, Head of
the Department for his able guidance and useful suggestions, which helped me in
completing the technical seminar on time.
I also owe my special thanks to our Director Prof. L. V. N. Prasad for his intense
support, encouragement and for having provided all the facilities and support.
Finally thanks to all my family members and friends for their continuous support
and enthusiastic help.
K.Bhargav 11885A0401
P.Devsingh 11885A0404
iv
ABSTRACT
Bit-serial arithmetic is attractive in view of it is smaller pin count, reduced wire
length, and lower floor space requirement in VLSI. In fact ,the compactness of the design
may allow us to run a bit-serial multiplier at a clock rate high enough to make the unit
almost competitive with much more complex designs with regard to speed. In addition, in
certain application contexts inputs are supplied bit-serially anyway. In such a case, using
a parallel multiplier would be quite wasteful, since the parallelism may not lead to any
speed benefit. Furthermore, in applications that call for a large number of independent
multiplications, multiple bit-serial multiplier may be more cost-effective than a complex
highly pipelined unit.
Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of
processing element that are interconnected by only short, local wires thus allowing very
high clock rates. Let us begin by introducing a semi systolic multiplier, so named because
its design involves broadcasting a single bit of the multiplier x to a number of circuit
element, thus violating the “short, local wires” requirement of pure systolic design.
v
CONTENTS
Acknowledgements (iii)
Abstract (iv)
List Of Figures (vii)
1 INTRODUCTION 1
1.1 The Context of Computer Arithmetic 1
1.2 What is computer arithmetic 2
1.3 Multiplication 4
1.4 Organization of report 5
2 VLSI 6
2.1 Introduction 6
2.2 What is VLSI? 7
2.2.1 History of Scale Integration 7
2.3 Advantages of ICs over discrete components 7
2.4 VLSI And Systems 8
2.5 Applications of VLSI 8
2.6 Conclusion 9
3 VERILOG HDL 10
3.1 Introduction 10
3.2 Major Capabilities 11
3.3 SYNTHESIS 12
3.4 Conclusion 12
4 BIT-SERIAL MULTIPLIER 14
4.1 Multiplier 14
4.2 Background 14
4.2.1 Binary Multiplication 15
vi
4.2.2 Hardware Multipliers 15
4.2.3 Array Multipliers 16
4.3 Variations in Multipliers 18
4.4 Bit-serial Multipliers 19
5 IMPLEMENTATION 22
5.1 Tools Used 22
5.2 Coding Steps 22
5.3 Simulation steps 22
5.4 Full adder code 23
5.5 Full adder flowchart 24
5.6 Full adder testbench 24
5.7 Bit-serial multiplier algorithm 25
5.8 Bit-Serial multiplier code 25
5.9 Full adder waveform 26
5.10 Bit-serial multiplier testbench 26
5.11 Bit-serial multiplier waveforms 27
6 CONCLUSIONS 28
REFERENCES 29
vii
LIST OF FIGURES
3.1 Mixed level modeling 11
3.2 Synthesis process 12
3.3 Typical design process 13
4.1 Basic Multiplication Data flow 15
4.2 Two Rows of an Array Multiplier 17
4.3 Data Flow through a Pipelined Array Multiplier 18
4.4 Bit-serial multiplier; 4x4 multiplication in 8 clock cycles 19
4.5 Bit Serial multiplier design in dot notation 21
5.1 Project directory structure 22
5.2 Simulation window 23
5.3 Waveform window 23
5.4 Full adder flowchart 24
5.5 Bit-Serial multiplier flowchart 25
5.6 Full adder output waveforms 26
5.7 Bit serial multiplier input/output waveforms 27
5.8 Bit serial multiplier with intermediate waveforms 27
1
CHAPTER 1
INTRODUCTION
1.1 The Context of Computer Arithmetic
Advances in computer architecture over the past two decades have allowed the
performance of digital computer hardware to continue its exponential growth, despite
increasing technological difficulty in speed improvement at the circuit level. This
phenomenal rate of growth, which is expected to continue in the near future, would not
have been possible without theoretical insights, experimental research, and tool-building
efforts that have helped transform computer architecture from an art into one of the most
quantitative branches of computer science and engineering. Better understanding of the
various forms of concurrency and the development of a reasonably efficient and user-
friendly programming model has been key enablers of this success story.
The downside of exponentially rising processor performance is an unprecedented
increase in hardware and software complexity. The trend toward greater complexity is not
only at odds with testability and verifiability but also hampers adaptability, performance
tuning, and evaluation of the various trade-offs, all of which contribute to soaring
development costs. A key challenge facing current and future computer designers is to
reverse this trend by removing layer after layer of complexity, opting instead for clean,
robust, and easily certifiable designs, while continuing to try to devise novel methods for
gaining performance and ease-of-use benefits from simpler circuits that can be readily
adapted to application requirements.
In the computer designers’ quest for user-friendliness, compactness, simplicity,
high performance, low cost, and low power, computer arithmetic plays a key role. It is
one of oldest subfields of computer architecture. The bulk of hardware in early digital
computers resided in accumulator and other arithmetic/logic circuits. Thus, first-
generation computer designers were motivated to simplify and share hardware to the
extent possible and to carry out detailed cost- performance analyses before proposing a
design. Many of the ingenious design methods that we use today have their roots in the
bulky, power-hungry machines of 30-50 years ago.
In fact computer arithmetic has been so successful that it has, at times, become
transparent. Arithmetic circuits are no longer dominant in terms of complexity; registers,
memory and memory management, instruction issue logic, and pipeline control have
become the dominant consumers of chip area in today’s processors. Correctness and high
performance of arithmetic circuits is routinely expected, and episodes such as the Intel
2
Pentium division bug are indeed rare.
The preceding context is changing for several reasons. First, at very high clock
rates, the interfaces between arithmetic circuits and the rest of the processor become
critical. Arithmetic units can no longer be designed and verified in isolation. Rather, an
integrated design optimization is required, which makes the development even more
complex and costly. Second, optimizing arithmetic circuits to meet design goals by taking
advantage of the strengths of new technologies, and making them tolerant to the
weaknesses, requires a reexamination of existing design paradigms. Finally, incorporation
of higher-level arithmetic primitives into hardware makes the design, optimization, and
verification efforts highly complex and interrelated.
This is why computer arithmetic is alive and well today. Designers and
researchers in this area produce novel structures with amazing regularity. Carry-
lookahead adders comprise a case in point. We used to think, in the not so distant past,
that we knew all there was to know about carry-lookahead fast adders. Yet, new designs,
improvements, and optimizations are still appearing. The ANSI/IEEE standard floating-
point format has removed many of the concerns with compatibility and error control in
floating-point computations, thus resulting in new designs and products with mass-market
appeal. Given the arithmetic-intensive nature of many novel application areas (such as
encryption, error checking, and multimedia), computer arithmetic will continue to thrive
for years to come.
1.2 What is computer arithmetic
A sequence of events, begun in late 1994 and extending into 1995, embarrassed
the world’s largest computer chip manufacturer and put the normally dry subject of
computer arithmetic on the front pages of major newspapers. The events were rooted in
the work of Thomas Nicely, a mathematician at the Lynchburg College in Virginia, who
is interested in twin primes (consecutive odd numbers such as 29 and 31 that are both
prime). Nicely’s work involves the distribution of twin primes and, particularly, the sum
of their reciprocals S = 1/5 + 1/7 1/11+1/13 +1/17 +1/19+1/29+1/31+-+1/P +1/(p +2) + -
- -. While it is known that the infinite sum S has a finite value, no one knows what the
value is.
Nicely was using several different computers for his work and in March 1994
added a machine based on the Intel Pentium processor to his collection. Soon he began
noticing inconsistencies in his calculations and was able to trace them back to the values
computed for 1 / p and 1 / (p + 2) on the Pentium processor. At first, he suspected his own
programs, the compiler, and the operating system, but by October, he became convinced
3
that the Intel Pentium chip was at fault. This suspicion was confirmed by several other
researchers following a barrage of e-mail exchanges and postings on the Internet. The
diagnosis finally came from Tim Coe, an engineer at Vitesse Semiconductor. Coe built a
model of Pentium’s floating-point division hardware based on the radix-4 SRT algorithm
and came up with an example that produces the worst-case error. Using double-precision
floating- point computation, the ratio c = 4 195 835/3 145 727 = 1.333 820 44- - - is
computed as 1.333 739 06 on the Pentium. This latter result is accurate to only 14 bits;
the error is even larger than that of single-precision floating-point and more than 10
orders of magnitude worse that what is expected of double-precision computation.
The rest, as they say, is history. Intel at first dismissed the severity of the problem
and admitted only a “subtle flaw,” with a probability of 1 in 9 billion, or once in 27,000
years for the average spreadsheet user, of leading to computational errors. It nevertheless
published a “white paper” that described the bug and its potential consequences and
announced a replacement policy for the defective chips based on “customer need”; that is,
customers had to show that they were doing a lot of mathematical calculations to get a
free replacement. Under heavy criticism from customers, manufacturers using the
Pentium chip in their products, and the on-line community, Intel later revised its policy to
no-questions-asked replacement.
Whereas supercomputing, microchips, computer networks, advanced applications
(particularly chess-playing programs), and many other aspects of computer technology
have made the news regularly in recent years, the Intel Pentium bug was the first instance
of arithmetic (or anything inside the CPU for that matter) becoming front-page news.
While this can be interpreted as a sign of pedantic dryness, it is more likely an indicator
of stunning technological success. Glaring software failures have come to be routine
events in our information-based society, but hardware bugs are rare and newsworthy.
Within the hardware realm, we will be dealing with both general-purpose
arithmetic/logic units (ALUS), of the type found in many commercially available
processors, and special-purpose structures for solving specific application problems. The
differences in the two areas are minor as far as the arithmetic algorithms are concerned.
However, in view of the specific technological constraints, production volumes, and
performance criteria, hardware implementations tend to be quite different. General-
purpose processor chips that are mass-produced have highly optimized custom designs.
Implementations of 1ow-volume, special-purpose systems, on the other hand, typically
rely on semicustom and off-the-shelf components. However, when critical and strict
requirements, such as extreme speed, very low power consumption, and miniature size,
4
preclude the use of semicustom or off-the shelf components, the much higher cost of a
custom design may be justified even for a special-purpose system.
1.3 Multiplication
Multiplication (often denoted by the cross symbol "×", or by the absence of
symbol) is the third basic mathematical operation of arithmetic, the others being addition,
subtraction and division (the division is the fourth one, because it requires multiplication
to be defined). The multiplication of two whole numbers is equivalent to the addition of
one of them with itself as many times as the value of the other one; for example, 3
multiplied by 4 (often said as "3 times 4") can be calculated by adding 4 copies of 3
together: 3 times 4 = 3 + 3 + 3 + 3 = 12 Here 3 and 4 are the "factors" and 12 is the
"product". One of the main properties of multiplication is that the result does not depend
on the place of the factor that is repeatedly added to it (commutative property). 3
multiplied by 4 can also be calculated by adding 3 copies of 4 together: 3 times 4 = 4 + 4
+ 4 = 12. The multiplication of integers (including negative numbers), rational numbers
(fractions) and real numbers is defined by a systematic generalization of this basic
definition. Multiplication can also be visualized as counting objects arranged in a
rectangle (for whole numbers) or as finding the area of a rectangle whose sides have
given lengths. The area of a rectangle does not depend on which side is measured first,
which illustrates the commutative property. In general, multiplying two measurements
gives a new type, depending on the measurements. For instance: 2.5 meters times 4.5
meters = 11.25 square meters 11 meters/second times 9 seconds = 99 meters The inverse
operation of the multiplication is the division. For example, since 4 multiplied by 3 equals
12, then 12 divided by 3 equals 4. Multiplication by 3, followed by division by 3, yields
the original number (since the division of a number other than 0 by itself equals 1).
Multiplication is also defined for other types of numbers, such as complex numbers, and
more abstract constructs, like matrices. For these more abstract constructs, the order that
the operands are multiplied sometimes does matter.
Multiplication often realized by k cycles of shifting and adding, is a heavily used
arithmetic operation that figures prominently in signal processing and scientific
applications. In this part, after examining shift/add multiplication schemes and their
various implementations, we note that there are but two ways to speed up the underlying
multi operand addition: reducing the number of operands to be added leads to high-radix
multipliers, and devising hardware multi operand adders that minimize the latency and/or
maximize the throughput leads to tree and array multipliers. Of course, speed is not the
only criterion of interest. Cost, VLSI area, and pin limitations favor bit-serial designs,
5
while the desire to use available building blocks leads to designs based on additive
multiply modules. Finally, the special case of squaring is of interest as it leads to
considerable simplification
1.4 Organization of report
This report starts with introduction to computer arithmetic and then introduces
multiplication. Then it explains implementation of one of the multiplier bit serial
multiplier.
Chapter 1: Introduction – This chapter explains importance of computer arithmetic and
multiplication in computations.
Chapter 2: VLSI – This chapter focuses on VLSI and its evolution, also its applications
and advantages
Chapter 3: Verilog HDL – This chapter explains how HDL’s reduce design cycle in VLSI
and automation makes faster implementation.
Chapter 4: Bit-serial multiplier – This chapter explains about multiplier and its types and
how bit serial multiplier is useful.
Chapter 5: Implementation – This chapter explains Implementation flow of Bit-serial
multiplier its Verilog code and output waveforms.
Chapter 6: Conclusions – This chapter summarizes Bit-serial multiplier and its future
improvements.
6
CHAPTER 2
VLSI
2.1 Introduction
Very-large-scale integration (VLSI) is the process of creating integrated
circuits by combining thousands of transistor-based circuits into a single chip. VLSI
began in the 1970s when complex semiconductor and communication technologies
were being developed. The microprocessor is a VLSI device. The term is no longer as
common as it once was, as chips have increased in complexity into the hundreds of
millions of transistors.
The first semiconductor chips held one transistor each. Subsequent advances
added more and more transistors, and, as a consequence, more individual functions or
systems were integrated over time. The first integrated circuits held only a few
devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
possible to fabricate one or more logic gates on a single device. Now known
retrospectively as "small-scale integration" (SSI), improvements in technique led to
devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.
systems with at least a thousand logic gates. Current technology has moved far past
this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-
scale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were
used. But the huge number of gates and transistors available on common devices has
rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of
integration are no longer in widespread use. Even VLSI is now somewhat quaint,
given the common assumption that all microprocessors are VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an
example of which is Intel's Montecito Itanium chip. This is expected to become more
commonplace as semiconductor fabrication moves from the current generation of 65 nm
processes to the next 45 nm generations (while experiencing new challenges such as
increased variation across process corners).
This microprocessor is unique in the fact that its 1.4 Billion transistor count,
capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's
transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to
the earliest devices, use extensive design automation and automated logic synthesis to
7
lay out the transistors, enabling higher levels of complexity in the resulting logic
functionality. Certain high-performance logic blocks like the SRAM cell, however, are
still designed by hand to ensure the highest efficiency (sometimes by bending or
breaking established design rules to obtain the last bit of performance by trading
stability).
2.2 What is VLSI?
VLSI stands for "Very Large Scale Integration". This is the field which involves
packing more and more logic devices into smaller and smaller areas.
• Simply we say Integrated circuit is many transistors on one chip.
• Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material
• Integrated circuit (IC) may contain millions of transistors, each a few mm in size
• Applications wide ranging: most electronic logic devices
2.2.1 History of Scale Integration
• late 40s Transistor invented at Bell Labs
• late 50s First IC (JK-FF by Jack Kilby at TI)
• early 60s Small Scale Integration (SSI)
o 10s of transistors on a chip
o late 60s Medium Scale Integration (MSI)
o 100s of transistors on a chip
• early 70s Large Scale Integration (LSI)
o 1000s of transistor on a chip
• early 80s VLSI 10,000s of transistors on a chip (later 100,000s & now 1,000,000s)
• Ultra LSI is sometimes used for 1,000,000s
2.3 Advantages of ICs over discrete components
While we will concentrate on integrated circuits, the properties of integrated
circuits-what we can and cannot efficiently put in an integrated circuit- largely
determine the architecture of the entire system. Integrated circuits improve system
characteristics in several critical ways. ICs have three key advantages over digital
circuits built from discrete components:
Size: Integrated circuits are much smaller-both transistors and wires are shrunk to
micrometer sizes, compared to the millimeter or centimeter scales of discrete
8
components. Small size leads to advantages in speed and power consumption, since
smaller components have smaller parasitic resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip
than they can between chips. Communication within a chip can occur hundreds of
times faster than communication between chips on a printed circuit board. The high
speed of circuits on- chip is due to their small size-smaller components and wires have
smaller parasitic capacitances to slow down the signal.
Power consumption: Logic operations within a chip also take much less power. Once
again, lower power consumption is largely due to the small size of circuits on the chip-
smaller parasitic capacitances and resistances require less power to drive them
2.4 VLSI And Systems
These advantages of integrated circuits translate into advantages at the system
level:
Smaller physical size: Smallness is often an advantage in itself- consider portable
televisions or handheld cellular telephones.
Lower power consumption: Replacing a handful of standard parts with a single chip
reduces total power consumption. Reducing power consumption has a ripple effect on
the rest of the system: a smaller, cheaper power supply can be used; since less power
consumption means less heat, a fan may no longer be necessary; a simpler cabinet with
less shielding for electromagnetic shielding may be feasible, too.
Reduced cost: Reducing the number of components, the power supply requirements,
cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of
integration is such that the cost of a system built from custom ICs can be less, even
though the individual ICs cost more than the standard parts they replace.
Communication within a chip can occur hundreds of times faster than communication
between chips on a printed circuit board.
Understanding why integrated circuit technology has such profound influence
on the design of digital systems requires understanding both the technology of IC
manufacturing and the economics of ICs and digital systems.
2.5 Applications of VLSI
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically,
hydraulically, or by other means; electronics are usually smaller, more flexible, and
easier to service. In other cases electronic systems have created totally new applications.
9
Electronic systems perform a variety of tasks, some of them visible, some more hidden.
Electronic systems in cars operate stereo systems and displays; they also control fuel
injection systems, adjust suspensions to varying terrain, and perform the control
functions required for anti-lock braking (ABS) systems.
• Digital electronics compress and decompress video, even at high-definition data
rates, on-the-fly in consumer electronics.
• Low-cost terminals for Web browsing still require sophisticated electronics,
despite their dedicated function.
• Personal computers and workstations provide word-processing, financial analysis,
and games. Computers include both central processing units (CPUs) and special-
purpose hardware for disk access, faster screen display, etc.
• Medical electronic systems measure bodily functions and perform complex
processing algorithms to warn about unusual conditions. The availability of these
complex systems, far from overwhelming consumers, only creates demand for
even more complex systems.
2.6 Conclusion
The growing sophistication of applications continually pushes the design and
manufacturing of integrated circuits and electronic systems to new levels of complexity.
And perhaps the most amazing characteristic of this collection of systems is its variety-
as systems become more complex, we build not a few general-purpose computers but
an ever wider range of special-purpose systems. Our ability to do so is a testament to
our growing mastery of both integrated circuit manufacturing and design, but the
increasing demands of customers continue to test the limits of design and
manufacturing.
10
CHAPTER 3
VERILOG HDL
3.1 Introduction
Verilog HDL is a hardware description language that can be used to model a
digital system at many levels of abstraction ranging from the algorithmic-level to the
gate-level to the switch-level. The complexity of the digital system being modeled
could vary from that of a simple gate to a complete electronic digital system, or
anything in between. The digital system can be described hierarchically and timing
can be explicitly modeled within the same description.
The Verilog HDL language includes capabilities to describe the behavior-al
nature of a design, the dataflow nature of a design, a design's structural composition,
delays and a waveform generation mechanism including aspects of response monitoring
and verification, all modeled using one single language. In addition, the language
provides a programming language interface through which the internals of a design can
be accessed during simulation including the control of a simulation run.
The language not only defines the syntax but also defines very clear simulation
semantics for each language construct. Therefore, models written in this language
can be verified using a Verilog simulator. The language inherits many of its operator
symbols and constructs from the C programming language. Verilog HDL provides an
extensive range of modeling capabilities, some of which are quite difficult to
comprehend initially. However, a core subset of the language is quite easy to learn and
use. This is sufficient to model most applications.
The Verilog HDL language was first developed by Gateway Design Automation
in 1983 as hardware are modeling language for their simulator product, At that time ,it
was a proprietary language. The Verilog HDL language includes capabilities to describe
the behavior-al nature of a design, the dataflow nature of a design, a design's structural
Because of the popularity of the, simulator product, Verilog HDL gained acceptance as a
usable and practical language by a number of designers. In an effort to increase the
popularity of the language, the language was placed in the public domain in 1990.
Open Verilog International (OVI) was formed to promote Verilog. In 1992 OVI
decided to pursue standardization of Verilog HDL as an IEEE standard. This effort was
successful and the language became an IEEE standard in 1995. The complete standard is
described in the Verilog hardware description language reference manual. The standard
is called std. 1364-1995.
11
3.2 Major Capabilities
Listed below are the major capabilities of the Verilog hardware description:
• Primitive logic gates, such as and, or and nand, are built-in into the language.
• Flexibility of creating a user-defined primitive (UDP). Such a primitive could
either be a combinational logic primitive or a sequential logic primitive.
• Switch-level modeling primitive gates, such as pmos and nmos, are also built- in
into the language.
• A design can be modeled in three different styles or in a mixed style. These
styles are: behavioral style modeled using procedural constructs; dataflow style
- modeled using continuous assignments; and structural style modeled using
gate and module instantiations.
• There are two data types in Verilog HDL; the net data type and the register
data type. The net type represents a physical connection between structural
elements while a register type represents an abstract data storage element.
• Figure.3-1 shows the mixed-level modeling capability of Verilog HDL, that is, in
one design; each module may be modeled at a different level.
Figure 3.1 Mixed level modeling
• Verilog HDL also has built-in logic functions such as & (bitwise-and) and I
(bitwise-or).
• High-level programming language constructs such as conditionals, case
statements, and loops are available in the language.
• Notion of concurrency and time can be explicitly modeled.
• Powerful file read and write capabilities fare provided.
• The language is non-deterministic under certain situations, that is, a model may
produce different results on different simulators; for example, the ordering of
events on an event queue is not defined by the standard.
12
3.3 SYNTHESIS
Synthesis is the process of constructing a gate level netlist from a register-
transfer level model of a circuit described in Verilog HDL. Figure.3-2 shows such a
process. A synthesis system may as an intermediate step, generate a netlist that is
comprised of register-transfer level blocks such as flip-flops, arithmetic-logic-units,
and multiplexers, interconnected by wires. In such a case, a second program called the
RTL module builder is necessary. The purpose of this builder is to build, or acquire
from a library of predefined components, each of the required RTL blocks in the user-
specified target technology.
Figure 3.2 Synthesis process
The above figure shows the basic elements of Verilog HDL and the elements
used in hardware. A mapping mechanism or a construction mechanism has to be
provided that translates the Verilog HDL elements into their corresponding hardware
elements as shown in figure.3-3
3.4 Conclusion
The Verilog HDL language includes capabilities to describe the behavior-al
nature of a design, the dataflow nature of a design, a design's structural composition,
delays and a waveform generation mechanism including aspects of response monitoring
and verification, all modeled using one single language. The language not only defines
the syntax but also defines very clear simulation semantics for each language construct.
Therefore, models written in this language can be verified using a Verilog simulator.
The Verilog HDL language includes capabilities to describe the behavior-al nature of
a design, the dataflow nature of a design, a design's structural composition, delays.
13
Figure 3.3: Typical design process
14
CHAPTER 4
BIT-SERIAL MULTIPLIER
4.1 Multiplier
Multipliers are key components of many high performance systems such as FIR
filters, microprocessors, digital signal processors, etc. A system’s performance is
generally determined by the performance of the multiplier because the multiplier is
generally the slowest clement in the system. Furthermore, it is generally the most area
consuming. Hence, optimizing the speed and area of the multiplier is a major design
issue. However, area and speed are usually conflicting constraints so that improving
speed results mostly in larger areas. As a result, whole spectrums of multipliers with
different area-speed constraints are designed with fully parallel processing. In between
are digit serial multipliers where single digits consisting of several bits are operated on.
These multipliers have moderate performance in both speed and area. However, existing
digit serial multipliers have been plagued by complicated switching systems and/or
irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion
instead of bits bring the pipelining to the digit level and avoid most of the above
problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative
and modular. The pipelining done at the digit level brings the benefit of constant
operation speed irrespective of the size of’ the multiplier. The clock speed is only
determined by the digit size which is already fixed before the design is implemented.
The growing market for fast floating-point co-processors, digital signal processing
chips, and graphics processors has created a demand for high speed, area-efficient
multipliers. Current architectures range from small, low-performance shift and add
multipliers, to large, high performance array and tree multipliers. Conventional linear
array multipliers achieve high performance in a regular structure, but require large
amounts of silicon. Tree structures achieve even higher performance than linear arrays
but the tree interconnection is more complex and less regular, making them even larger
than linear arrays. Ideally, one would want the speed benefits of a tree structure, the
regularity of an array multiplier, and the small size of a shift and add multiplier.
4.2 Background
Webster’s dictionary defines multiplication as “a mathematical operation that at
its simplest is an abbreviated process of adding an integer to itself a specified number of
times”. A number (multiplicand) is added to itself a number of times as specified by
15
another number (multiplier) to form a result (product). In elementary school, students
learn to multiply by placing the multiplicand on top of the multiplier. The multiplicand is
then multiplied by each digit of the multiplier beginning with the rightmost, Least
Significant Digit (LSD). Intermediate results (partial-products) are placed one atop the
other, offset by one digit to align digits of the same weight. The final product is
determined by summation of all the partial-products. Although most people think of
multiplication only in base 10, this technique applies equally to any base, including
binary. Figure 1.2.1 shows the data flow for the basic multiplication technique just
described. Each black dot represents a single digit.
Figure 4.1: Basic Multiplication Data flow
4.2.1 Binary Multiplication
In the binary number system the digits, called bits, are limited to the set. The
result of multiplying any binary number by a single binary bit is either 0, or the original
number. This makes forming the intermediate partial-products simple and efficient.
Summing these partial- products is the time consuming task for binary multipliers. One
logical approach is to form the partial-products one at a time and sum them as they are
generated. Often implemented by software on processors that do not have a hardware
multiplier, this technique works fine, but is slow because at least one machine cycle is
required to sum each additional partial-product.
For applications where this approach does not provide enough performance,
multipliers can be implemented directly in hardware.
4.2.2 Hardware Multipliers
Direct hardware implementations of shift and add multipliers can increase
performance over software synthesis, but are still quite slow. The reason is that as each
additional partial- product is summed a carry must be propagated from the least
significant bit (LSB) to the most significant bit (MSB). This carry propagation is time
16
consuming, and must be repeated for each partial product to be summed.
One method to increase multiplier performance is by using encoding techniques to
reduce the number of partial products to be summed. Just such a technique was first
proposed by Booth. The original Booth’s algorithm ships over contiguous strings of l’s by
using the property that: 2” + 2(n-1) + 2(n-2) + . . . + 2hm) = 2(n+l) - 2(n-m). Although
Booth’s algorithm produces at most N/2 encoded partial products from an N bit operand,
the number of partial products produced varies. This has caused designers to use modified
versions of Booth’s algorithm for hardware multipliers. Modified 2-bit Booth encoding
halves the number of partial products to be summed.
Since the resulting encoded partial-products can then be summed using any
suitable method, modified 2 bit Booth encoding is used on most modern floating-point
chips LU 881, MCA 861. A few designers have even turned to modified 3 bit Booth
encoding, which reduces the number of partial products to be summed by a factor of three
IBEN 891. The problem with 3 bit encoding is that the
Carry-propagate addition required to form the 3X multiples often overshadows the
potential gains of 3 bit Booth encoding.
To achieve even higher performance advanced hardware multiplier architectures
search for faster and more efficient methods for summing the partial-products. Most
increase performance by eliminating the time consuming carry propagate additions. To
accomplish this, they sum the partial-products in a redundant number representation. The
advantage of a redundant representation is that two numbers, or partial-products, can be
added together without propagating a carry across the entire width of the number. Many
redundant number representations are possible. One commonly used representation is
known as carry-save form. In this redundant representation two bits, known as the carry
and sum, are used to represent each bit position. When two numbers in carry-save form
are added together any carries that result are never propagated more than one bit position.
This makes adding two numbers in carry-save form much faster than adding two normal
binary numbers where a carry may propagate. One common method that has been
developed for summing rows of partial products using a carry-save representation is the
array multiplier.
4.2.3 Array Multipliers
Conventional linear array multipliers consist of rows of carry-save adders (CSA).
A portion of an array multiplier with the associated routing can be seen in Figure 4.2.
17
Figure 4.2: Two Rows of an Array Multiplier
In a linear array multiplier, as the data propagates down through the array, each
row of CSA’s adds one additional partial-product to the partial sum. Since the
intermediate partial sum is kept in a redundant, carry-save form there is no carry
propagation. This means that the delay of an array multiplier is only dependent upon the
depth of the array, and is independent of the partial-product width. Linear array
multipliers are also regular, consisting of replicated rows of CSA’s. Their high
performance and regular structure have perpetuated the use of array multipliers for VLSI
math co-processors and special purpose DSP chips.
The biggest problem with full linear array multipliers is that they are very large.
As operand sizes increase, linear arrays grow in size at a rate equal to the square of the
operand size. This is because the number of rows in the array is equal to the length of the
multiplier, with the width of each row equal to the width of multiplicand. The large size
of full arrays typically prohibits their use, except for small operand sizes, or on special
purpose math chips where a major portion of the silicon area can be assigned to the
multiplier array.
Another problem with array multipliers is that the hardware is underutilized. As
the sum is propagated down through the array, each row of CSA’s computes a result only
once, when the active computation front passes that row. Thus, the hardware is doing
useful work only a very small percentage of the time. This low hardware utilization in
conventional linear array multipliers makes performance gains possible through increased
efficiency. For example, by overlapping calculations pipelining can achieve a large gain
in throughput Figure 4.3 shows a full array pipelined after each row of CSA’s. Once the
partial sum has passed the first row of CSA’s, represented by the shaded row of GSA’s in
18
cycle 1, a subsequent multiply can be started on the next cycle. In cycle 2, the first partial
sum has passed to the second row of CM’s, and the second multiply, represented by the
cross hatched row of CSA’s, has begun. Although pipelining a full array can greatly
increase throughput, both the size and latency are increased due to the additional latches
While high throughput is desirable, for general purpose computers size and latency tend
to be more important; thus, fully pipelined linear array multipliers are seldom found.
Figure 4.3: Data Flow through a Pipelined Array Multiplier
4.3 Variations in Multipliers
We do not always synthesize our multipliers from scratch but may desire, or be
required, to use building blocks such as adders, small multipliers, or lookup tables.
Furthermore, limited chip area and/or pin availability may dictate the use of bit-serial
designs. In this chapter, we discuss such variations and also deal with modular
multipliers, the special case of squaring, and multiply-accumulators.
• Divide-and-Conquer Designs
• Additive Multiply Modules
• Bit-Serial Multipliers
• Modular Multipliers
• The Special Case of Squaring
• Combined Multiply-Add Units
19
4.4 Bit-serial Multipliers
Bit-serial arithmetic is attractive in view of its smaller pin count, reduced wire
length, and lower floor space requirements in VLSI. In fact, the compactness of the
design may allow us to run a bit-serial multiplier at a clock rate high enough to make the
unit almost competitive with much more complex designs with regard to speed. In
addition, in certain application contexts inputs are supplied bit-serially anyway. In such a
case, using a parallel multiplier would be quite wasteful, since the parallelism may not
lead to any speed benefit. Furthermore, in applications that call for a large number of
independent multiplications, multiple bit-serial multipliers may be more cost-effective
than a complex highly pipelined unit.
Figure 4.4: Bit-serial multiplier; 4x4 multiplication in 8 clock cycles
Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of
processing elements that are interconnected by only short, local wires thus allowing very
high clock rates. Let us begin by introducing a semisystolic multiplier, so named because
its design involves broadcasting a single bit of the multiplier x to a number of circuit
elements, thus violating the “short, local wires” requirement of pure systolic design.
Figure 4.4 shows a semisystolic 4 x 4 multiplier. The multiplicand a is supplied in
parallel from above and the multiplier x is supplied bit-serially from the right, with its
least significant bit arriving first. Each bit xi of the multiplier is multiplied by a and the
20
result added to the cumulative partial product, kept in carry-save form in the carry and
sum latches. The carry bit stays in its current position, while the sum bit is passed on to
the neighboring cell on the right. This corresponds to shifting the partial product to the
right before the next addition step (normally the sum bit would stay put and the carry bit
would be shifted to the left). Bits of the result emerge serially from the right as they
become available.
A k-bit unsigned multiplier x must be padded with k zeros to allow the carries to
propagate to the output, yielding the correct 2k-bit product. Thus, the semisystolic
multiplier of Figure 4.4 can perform one k x k unsigned integer multiplication every 2k
clock cycles. If k-bit fractions need to be multiplied, the first k output bits are discarded
or used to properly round the most significant k bits.
To make the multiplier of Figure 4.4 fully systolic, we must remove the
broadcasting of the multiplier bits. This can be accomplished by a process known as
systolic retiming, which is briefly explained below
Consider a synchronous (clocked) circuit, with each line between two functional
parts having an integral number of unit delays (possibly 0). Then, if we cut the circuit into
two parts CL and CR, we can delay (advance) all the signals going in one direction and
advance (delay) the ones going in the opposite direction by the same amount without
affecting the correct functioning or external timing relations of the circuit. Of course, the
primary inputs and outputs to the two parts CL and cg must be correspondingly advanced
or delayed, too.
For the retiming to be possible, all the signals that are advanced by d must have
had original delays of d or more (negative delays are not allowed). Note that all the
signals going into CL have been delayed by d time units. Thus, CL will work as before,
except that everything, including output production, occurs d time units later than before
retiming. Advancing the outputs by d time units will keep the external view of the circuit
unchanged.
We apply the preceding process to the multiplier circuit of Figure 4.4 in three
successive steps corresponding to cuts 1, 2, and 3, each time delaying the left-moving
signal by one unit and advancing the right-moving signal by one unit. Verifying that the
multiplier in Fig. 12.9 works correctly is left as an exercise. This new version of our
multiplier does not have the fan-out problem of the design in Figure 4.4 but it suffers
from long signal propagation delay through the four FAs in each clock cycle, leading to
inferior operating speed. Note that the culprits are zero-delay lines that lead to signal
propagation through multiple circuit elements.
21
One way of avoiding zero-delay lines in our design is to begin by doubling all the
delays in Figure 4.4. This is done by simply replacing each of the sum and carry flip-flops
with two cascaded flip-flops before retiming is applied. Since the circuit is now operating
at half its original speed, the multiplier x must also be applied on alternate clock cycles.
The resulting design is fully systolic, inasmuch as signals move only between adjacent
cells in each clock cycle. However, twice as many cycles are needed.
The easiest way to derive a multiplier with both inputs entering bit-serially is to
allow k clock ticks for the multiplicand bits to be put into place in a shift register and then
use the design of Figure 4.4 to compute the product. This increases the total delay by k
cycles.
Figure 4.5 uses dot notation to show the justification for the bit-serial multiplier
design above. Figure 4.5 depicts the meanings of the various partial operands and results.
Figure 4.5: Bit Serial multiplier design in dot notation
22
CHAPTER 5
IMPLEMENTATION
5.1 Tools Used
1) Pc installed with linux operating system
2) Installed cadence tools:
• Ncvlog – For checking errors
• Ncverilog – For execution of code
• Simvision – To View waveforms
5.2 Coding Steps
1) Create directory structure for the project as below
Figure 5.1: Project directory structure
2) Write RTL code in a text file and save it as .v extension in RTL directory
3) Write code for testbench and store in TB directory
5.3 Simulation steps
The Commands that are used in cadence for the execution are
1) Initially we should mount the server using “mount -a”.
2) Go to the C environment with the command “csh” //c shell.
3) The source file should be opened by the command “source /root/cshrc”.
4) The next command is to go to the directory of cadence_dgital_labs
#cd .../../cadence_digital_labs/
5) Then check the file for errors by the command “ncvlog ../rtl/filename.v -mess”.
6) Then execute the file using “ncverilog +access +rwc ../rtl/filename.v ../tb/file_tb.v
+nctimescale +1ns/1ps
Rwc –read write command Gui- graphical unit interface
7) After running the program we open simulation window by command “simvision
&".
23
Figure 5.2: Simulation window
8) After the simulation the waveforms are shown in the other window.
Figure 5.3: Waveform window
5.4 Full adder code
module fulladder(output reg cout,sum,input a,b,cin,rst);
always@(a,b,cin)
{cout,sum}=a+b+cin;
always@(posedge rst)
begin
sum<=0;
cout<=0;
end
endmodule
24
5.5 Full adder flowchart
Figure 5.4: Full adder flowchart
5.6 Full adder testbench
module full_adder_tb;
wire cout,sum;
reg a,b,cin,rst;
//dut
fulladder fa(cout,sum,a,b,cin,rst);
initial
begin
#2 rst=1'b1;
#(period/2) rst=1'b0;
a=1'b1;
b=1'b0;
cin=1'b1;
#5 a=1'b0;
b=1'b1;
cin=1'b1;
$finish;
end
endmodule
25
5.7 Bit-serial multiplier algorithm
Figure 5.5: Bit-Serial multiplier flowchart
5.8 Bit-Serial multiplier code
module serial_mult(output product,input [3:0] a,input b,clk,rst);
wire s1,s2,s3;
reg s1o,s2o,s3o; //latches for sum at various stages
wire c0,c1,c2,c3;
reg c0o,c1o,c2o,c3o;//latches for carry at various stages
wire a3o,a2o,a1o,a0o;
reg s;
fulladder fa0(c0,product,a0o,s1o,c0o,rst);
fulladder fa1(c1,s1,a1o,s2o,c1o,rst);
fulladder fa2(c2,s2,a2o,s3o,c2o,rst);
fulladder fa3(c3,s3,a3o,s,c3o,rst);
and n0(a0o,a[0],b);
and n1(a1o,a[1],b);
and n2(a2o,a[2],b);
and n3(a3o,a[3],b);
always@(posedge clk, posedge rst)
begin
26
if(rst)
begin
s=0;
c0o<=1'b0;
c1o<=1'b0;
c2o<=1'b0;
c3o<=1'b0;
s1o<=1'b0;
s2o<=1'b0;
s3o<=1'b0;
end
else //moving all sums to reg
begin
c0o<=c0;
c1o<=c1;
c2o<=c2;
c3o<=c3;
s1o<=s1;
s2o<=s2;
s3o<=s3;
end
end
endmodule
5.9 Full adder waveform
Figure 5.6: Full adder output waveforms
5.10 Bit-serial multiplier testbench
module serial_mult_tb;
reg [3:0] a;
reg b;
wire product;
reg clk,rst;
parameter period=10;
serial_mult dut(product,a,b,clk,rst); //dut
//clock
27
initial clk=0;
always #period clk=~clk;
initial
begin
#2 rst=1'b1;
#(period/2) rst=1'b0;
a=4'b1101;
b=1;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=1;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=0;
#period $finish;
end
endmodule
5.11 Bit-serial multiplier waveforms
Figure 5.7: Bit serial multiplier input/output waveforms
Figure 5.8: Bit serial multiplier with intermediate waveforms
28
CHAPTER 6
CONCLUSIONS
Multipliers play an important role in today’s digital signal processing and various
other applications. With advances in technology, many researchers have tried and are
trying to design multipliers which offer either of the following design targets – high
speed, low power consumption, regularity of layout and hence less area or even
combination of them in one multiplier thus making them suitable for various high speed,
low power and compact VLSI implementation. The common multiplication method is
“add and shift” algorithm. In parallel multipliers number of partial products to be added is
the main parameter that determines the performance of the multiplier. To reduce the
number of partial products to be added, Modified Booth algorithm is one of the most
popular algorithms. To achieve speed improvements Wallace Tree algorithm can be used
to reduce the number of sequential adding stages. Further by combining both Modified
Booth algorithm and Wallace Tree technique we can see advantage of both algorithms in
one multiplier. However with increasing parallelism, the amount of shifts between the
partial products and intermediate sums to be added will increase which may result in
reduced speed, increase in silicon area due to irregularity of structure and also increased
power consumption due to increase in interconnect resulting from complex routing. On
the other hand “serial-parallel” multipliers compromise speed to achieve better
performance for area and power consumption. The selection of a parallel or serial
multiplier actually depends on the nature of application.
A key challenge facing current and future computer designers is to reverse the
trend by removing layer after layer of complexity, opting instead for clean, robust, and
easily certifiable designs, while continuing to try to devise novel methods for gaining
performance and ease-of-use benefits from simpler circuits that can be readily adapted to
application requirements.
This is achieved by using Bit Serial multipliers.
29
REFERENCES
[1] Behrooz Parhami, Computer arithmetic: algorithms and hardware designs, Oxford
University Press, 2009
[2] F. Sadiq M. Sait, Gerhard Beckoff, “A Novel Technique for Fast Multiplication”.
IEEE Fourteenth Annual International Phoenix Conference on Computers and
Communications, vol. 7803-2492-7, pp. 109-114, 1995.
[3] Ghest, C., Multiplying Made Easy for Digital Assemblies, Electronics, Vol. 44,
pp.56-61. November 22. 1971.
[4] Ienne, P., and M. A. Viredaz, “Bit-Seria1 Multipliers and Squarers,” IEEE Trans.
Computers, Vol. 43, No. 12, pp. 1445-1450, 1994
[5] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, Prentice
Hall Professional, 2003

Contenu connexe

Tendances

DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...Saikiran Panjala
 
Verilog HDL Training Course
Verilog HDL Training CourseVerilog HDL Training Course
Verilog HDL Training CoursePaul Laskowski
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applicationselprocus
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems DesignReza Sameni
 
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptx
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptxDLD Lecture No 18 Analysis and Design of Combinational Circuit.pptx
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptxSaveraAyub2
 
Design of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCEDesign of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCEnandivashishth
 
Overview of digital design with Verilog HDL
Overview of digital design with Verilog HDLOverview of digital design with Verilog HDL
Overview of digital design with Verilog HDLanand hd
 
Programmable logic array
Programmable logic arrayProgrammable logic array
Programmable logic arrayHuba Akhtar
 
prom,pld problems
prom,pld problemsprom,pld problems
prom,pld problemsAnish Gupta
 

Tendances (20)

DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
 
PAL And PLA ROM
PAL And PLA ROMPAL And PLA ROM
PAL And PLA ROM
 
Verilog HDL Training Course
Verilog HDL Training CourseVerilog HDL Training Course
Verilog HDL Training Course
 
PLD's
PLD'sPLD's
PLD's
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
 
Pass Transistor Logic
Pass Transistor LogicPass Transistor Logic
Pass Transistor Logic
 
Vlsi design
Vlsi designVlsi design
Vlsi design
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
 
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptx
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptxDLD Lecture No 18 Analysis and Design of Combinational Circuit.pptx
DLD Lecture No 18 Analysis and Design of Combinational Circuit.pptx
 
FPGA
FPGAFPGA
FPGA
 
Booth Multiplier
Booth MultiplierBooth Multiplier
Booth Multiplier
 
Design of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCEDesign of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCE
 
VEDIC MULTIPLIER FOR "FPGA"
VEDIC MULTIPLIER FOR "FPGA"VEDIC MULTIPLIER FOR "FPGA"
VEDIC MULTIPLIER FOR "FPGA"
 
Overview of digital design with Verilog HDL
Overview of digital design with Verilog HDLOverview of digital design with Verilog HDL
Overview of digital design with Verilog HDL
 
Verilog ques
Verilog quesVerilog ques
Verilog ques
 
Programmable logic array
Programmable logic arrayProgrammable logic array
Programmable logic array
 
Fpga 03-cpld-and-fpga
Fpga 03-cpld-and-fpgaFpga 03-cpld-and-fpga
Fpga 03-cpld-and-fpga
 
Dcs unit 2
Dcs unit 2Dcs unit 2
Dcs unit 2
 
prom,pld problems
prom,pld problemsprom,pld problems
prom,pld problems
 
FPGA
FPGAFPGA
FPGA
 

En vedette

En vedette (6)

Multipliers in VLSI
Multipliers in VLSIMultipliers in VLSI
Multipliers in VLSI
 
Mux based array mul ppt
Mux based array mul pptMux based array mul ppt
Mux based array mul ppt
 
VERILOG CODE
VERILOG CODEVERILOG CODE
VERILOG CODE
 
The Multipliers Seminar
The Multipliers SeminarThe Multipliers Seminar
The Multipliers Seminar
 
Array multiplier
Array multiplierArray multiplier
Array multiplier
 
All VLSI programs
All VLSI programsAll VLSI programs
All VLSI programs
 

Similaire à Bit-Serial Multiplier Report Using Verilog HDL

Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsMadhav Reddy Chintapalli
 
elecworks for PTC Creo - Mechatronics software - 3D CAD software
elecworks for PTC Creo - Mechatronics software - 3D CAD softwareelecworks for PTC Creo - Mechatronics software - 3D CAD software
elecworks for PTC Creo - Mechatronics software - 3D CAD softwareGuillem Fiter
 
Seminar report of ewt
Seminar report of ewtSeminar report of ewt
Seminar report of ewtRanol R C
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERIRJET Journal
 
Review of Development done in Computerization of Electrical Specifications fo...
Review of Development done in Computerization of Electrical Specifications fo...Review of Development done in Computerization of Electrical Specifications fo...
Review of Development done in Computerization of Electrical Specifications fo...ijsrd.com
 
ARC White paper: Schneider Electric Introduces First ePAC, Combines PAC with...
ARC White paper:  Schneider Electric Introduces First ePAC, Combines PAC with...ARC White paper:  Schneider Electric Introduces First ePAC, Combines PAC with...
ARC White paper: Schneider Electric Introduces First ePAC, Combines PAC with...Schneider Electric
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATION
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATIONSIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATION
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATIONRicky Drk
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterIRJET Journal
 
IRJET- A Review: To Design Efficient 32 Bits Carry Select Adder by using ...
IRJET-  	  A Review: To Design Efficient 32 Bits Carry Select Adder by using ...IRJET-  	  A Review: To Design Efficient 32 Bits Carry Select Adder by using ...
IRJET- A Review: To Design Efficient 32 Bits Carry Select Adder by using ...IRJET Journal
 
IC Design Physical Verification
IC Design Physical VerificationIC Design Physical Verification
IC Design Physical VerificationIRJET Journal
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
Implementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLImplementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLpaperpublications3
 
client-server-computing.pdf
client-server-computing.pdfclient-server-computing.pdf
client-server-computing.pdfscribdrg
 
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAA LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
 
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive Survey
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive SurveyPeripheral Libraries and Drivers for Microcontrollers: A Comprehensive Survey
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive SurveyIRJET Journal
 
Arduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemArduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemMadhav Reddy Chintapalli
 
Architecture & data acquisition by embedded systems in automobiles seminar re...
Architecture & data acquisition by embedded systems in automobiles seminar re...Architecture & data acquisition by embedded systems in automobiles seminar re...
Architecture & data acquisition by embedded systems in automobiles seminar re...Ankit Kaul
 

Similaire à Bit-Serial Multiplier Report Using Verilog HDL (20)

Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between Stations
 
elecworks for PTC Creo - Mechatronics software - 3D CAD software
elecworks for PTC Creo - Mechatronics software - 3D CAD softwareelecworks for PTC Creo - Mechatronics software - 3D CAD software
elecworks for PTC Creo - Mechatronics software - 3D CAD software
 
Seminar report of ewt
Seminar report of ewtSeminar report of ewt
Seminar report of ewt
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIER
 
Review of Development done in Computerization of Electrical Specifications fo...
Review of Development done in Computerization of Electrical Specifications fo...Review of Development done in Computerization of Electrical Specifications fo...
Review of Development done in Computerization of Electrical Specifications fo...
 
ARC White paper: Schneider Electric Introduces First ePAC, Combines PAC with...
ARC White paper:  Schneider Electric Introduces First ePAC, Combines PAC with...ARC White paper:  Schneider Electric Introduces First ePAC, Combines PAC with...
ARC White paper: Schneider Electric Introduces First ePAC, Combines PAC with...
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATION
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATIONSIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATION
SIMPLIFIED SECURED WIRELESS RAILWAY / AIRWAY FOR RESERVATION
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip Router
 
IRJET- A Review: To Design Efficient 32 Bits Carry Select Adder by using ...
IRJET-  	  A Review: To Design Efficient 32 Bits Carry Select Adder by using ...IRJET-  	  A Review: To Design Efficient 32 Bits Carry Select Adder by using ...
IRJET- A Review: To Design Efficient 32 Bits Carry Select Adder by using ...
 
IC Design Physical Verification
IC Design Physical VerificationIC Design Physical Verification
IC Design Physical Verification
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
Implementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLImplementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDL
 
client-server-computing.pdf
client-server-computing.pdfclient-server-computing.pdf
client-server-computing.pdf
 
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAA LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
 
Thesis_Final
Thesis_FinalThesis_Final
Thesis_Final
 
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive Survey
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive SurveyPeripheral Libraries and Drivers for Microcontrollers: A Comprehensive Survey
Peripheral Libraries and Drivers for Microcontrollers: A Comprehensive Survey
 
Arduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning SystemArduino Based Collision Prevention Warning System
Arduino Based Collision Prevention Warning System
 
Architecture & data acquisition by embedded systems in automobiles seminar re...
Architecture & data acquisition by embedded systems in automobiles seminar re...Architecture & data acquisition by embedded systems in automobiles seminar re...
Architecture & data acquisition by embedded systems in automobiles seminar re...
 

Dernier

Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Dernier (20)

Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 

Bit-Serial Multiplier Report Using Verilog HDL

  • 1. BIT-SERIAL MULTIPLIER USING VERILOG HDL A Mini Project Report Submitted in the Partial Fulfillment of the Requirements for the Award of the Degree of BACHELOR OF TECHNOLOGY IN ELECTRONICS AND COMMUNICATION ENGINEERING Submitted By K.BHARGAV 11885A0401 P.DEVSINGH 11885A0404 Under the Guidance of Mr. S. RAJENDAR Associate Professor Department of ECE Department of Electronics and Communication Engineering VARDHAMAN COLLEGE OF ENGINEERING (AUTONOMOUS) (Approved by AICTE, Affiliated to JNTUH & Accredited by NBA) 2013- 14
  • 2. VARDHAMAN COLLEGE OF ENGINEERING (AUTONOMOUS) Estd.1999 Shamshabad, Hyderabad – 501218 Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.) – 501 218, Hyderabad, A.P. Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org Department of Electronics and Communication Engineering CERTIFICATE This is to certify that the mini project report work entitled “Bit-Serial Multiplier Using Verilog HDL” carried out by Mr. K.Bhargav, Roll Number 11885A0401, Mr. P.Devsingh, Roll Number 11885A0404, submitted to the department of Electronics and Communication Engineering, in partial fulfillment of the requirements for the award of degree of Bachelor of Technology in Electronics and Communication Engineering during the year 2013 – 2014. Name & Signature of the Supervisor Mr. S. Rajendar Associate Professor Name & Signature of the HOD Dr. J. V. R. Ravindra Head, ECE
  • 3. iii ACKNOWLEDGEMENTS The satisfaction that accompanies the successful completion of the task would be put incomplete without the mention of the people who made it possible, whose constant guidance and encouragement crown all the efforts with success. I express my heartfelt thanks to Mr. S. Rajendar, Associate Professor, technical seminar supervisor, for her suggestions in selecting and carrying out the in-depth study of the topic. Her valuable guidance, encouragement and critical reviews really helped to shape this report to perfection. I wish to express my deep sense of gratitude to Dr. J. V. R. Ravindra, Head of the Department for his able guidance and useful suggestions, which helped me in completing the technical seminar on time. I also owe my special thanks to our Director Prof. L. V. N. Prasad for his intense support, encouragement and for having provided all the facilities and support. Finally thanks to all my family members and friends for their continuous support and enthusiastic help. K.Bhargav 11885A0401 P.Devsingh 11885A0404
  • 4. iv ABSTRACT Bit-serial arithmetic is attractive in view of it is smaller pin count, reduced wire length, and lower floor space requirement in VLSI. In fact ,the compactness of the design may allow us to run a bit-serial multiplier at a clock rate high enough to make the unit almost competitive with much more complex designs with regard to speed. In addition, in certain application contexts inputs are supplied bit-serially anyway. In such a case, using a parallel multiplier would be quite wasteful, since the parallelism may not lead to any speed benefit. Furthermore, in applications that call for a large number of independent multiplications, multiple bit-serial multiplier may be more cost-effective than a complex highly pipelined unit. Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of processing element that are interconnected by only short, local wires thus allowing very high clock rates. Let us begin by introducing a semi systolic multiplier, so named because its design involves broadcasting a single bit of the multiplier x to a number of circuit element, thus violating the “short, local wires” requirement of pure systolic design.
  • 5. v CONTENTS Acknowledgements (iii) Abstract (iv) List Of Figures (vii) 1 INTRODUCTION 1 1.1 The Context of Computer Arithmetic 1 1.2 What is computer arithmetic 2 1.3 Multiplication 4 1.4 Organization of report 5 2 VLSI 6 2.1 Introduction 6 2.2 What is VLSI? 7 2.2.1 History of Scale Integration 7 2.3 Advantages of ICs over discrete components 7 2.4 VLSI And Systems 8 2.5 Applications of VLSI 8 2.6 Conclusion 9 3 VERILOG HDL 10 3.1 Introduction 10 3.2 Major Capabilities 11 3.3 SYNTHESIS 12 3.4 Conclusion 12 4 BIT-SERIAL MULTIPLIER 14 4.1 Multiplier 14 4.2 Background 14 4.2.1 Binary Multiplication 15
  • 6. vi 4.2.2 Hardware Multipliers 15 4.2.3 Array Multipliers 16 4.3 Variations in Multipliers 18 4.4 Bit-serial Multipliers 19 5 IMPLEMENTATION 22 5.1 Tools Used 22 5.2 Coding Steps 22 5.3 Simulation steps 22 5.4 Full adder code 23 5.5 Full adder flowchart 24 5.6 Full adder testbench 24 5.7 Bit-serial multiplier algorithm 25 5.8 Bit-Serial multiplier code 25 5.9 Full adder waveform 26 5.10 Bit-serial multiplier testbench 26 5.11 Bit-serial multiplier waveforms 27 6 CONCLUSIONS 28 REFERENCES 29
  • 7. vii LIST OF FIGURES 3.1 Mixed level modeling 11 3.2 Synthesis process 12 3.3 Typical design process 13 4.1 Basic Multiplication Data flow 15 4.2 Two Rows of an Array Multiplier 17 4.3 Data Flow through a Pipelined Array Multiplier 18 4.4 Bit-serial multiplier; 4x4 multiplication in 8 clock cycles 19 4.5 Bit Serial multiplier design in dot notation 21 5.1 Project directory structure 22 5.2 Simulation window 23 5.3 Waveform window 23 5.4 Full adder flowchart 24 5.5 Bit-Serial multiplier flowchart 25 5.6 Full adder output waveforms 26 5.7 Bit serial multiplier input/output waveforms 27 5.8 Bit serial multiplier with intermediate waveforms 27
  • 8. 1 CHAPTER 1 INTRODUCTION 1.1 The Context of Computer Arithmetic Advances in computer architecture over the past two decades have allowed the performance of digital computer hardware to continue its exponential growth, despite increasing technological difficulty in speed improvement at the circuit level. This phenomenal rate of growth, which is expected to continue in the near future, would not have been possible without theoretical insights, experimental research, and tool-building efforts that have helped transform computer architecture from an art into one of the most quantitative branches of computer science and engineering. Better understanding of the various forms of concurrency and the development of a reasonably efficient and user- friendly programming model has been key enablers of this success story. The downside of exponentially rising processor performance is an unprecedented increase in hardware and software complexity. The trend toward greater complexity is not only at odds with testability and verifiability but also hampers adaptability, performance tuning, and evaluation of the various trade-offs, all of which contribute to soaring development costs. A key challenge facing current and future computer designers is to reverse this trend by removing layer after layer of complexity, opting instead for clean, robust, and easily certifiable designs, while continuing to try to devise novel methods for gaining performance and ease-of-use benefits from simpler circuits that can be readily adapted to application requirements. In the computer designers’ quest for user-friendliness, compactness, simplicity, high performance, low cost, and low power, computer arithmetic plays a key role. It is one of oldest subfields of computer architecture. The bulk of hardware in early digital computers resided in accumulator and other arithmetic/logic circuits. Thus, first- generation computer designers were motivated to simplify and share hardware to the extent possible and to carry out detailed cost- performance analyses before proposing a design. Many of the ingenious design methods that we use today have their roots in the bulky, power-hungry machines of 30-50 years ago. In fact computer arithmetic has been so successful that it has, at times, become transparent. Arithmetic circuits are no longer dominant in terms of complexity; registers, memory and memory management, instruction issue logic, and pipeline control have become the dominant consumers of chip area in today’s processors. Correctness and high performance of arithmetic circuits is routinely expected, and episodes such as the Intel
  • 9. 2 Pentium division bug are indeed rare. The preceding context is changing for several reasons. First, at very high clock rates, the interfaces between arithmetic circuits and the rest of the processor become critical. Arithmetic units can no longer be designed and verified in isolation. Rather, an integrated design optimization is required, which makes the development even more complex and costly. Second, optimizing arithmetic circuits to meet design goals by taking advantage of the strengths of new technologies, and making them tolerant to the weaknesses, requires a reexamination of existing design paradigms. Finally, incorporation of higher-level arithmetic primitives into hardware makes the design, optimization, and verification efforts highly complex and interrelated. This is why computer arithmetic is alive and well today. Designers and researchers in this area produce novel structures with amazing regularity. Carry- lookahead adders comprise a case in point. We used to think, in the not so distant past, that we knew all there was to know about carry-lookahead fast adders. Yet, new designs, improvements, and optimizations are still appearing. The ANSI/IEEE standard floating- point format has removed many of the concerns with compatibility and error control in floating-point computations, thus resulting in new designs and products with mass-market appeal. Given the arithmetic-intensive nature of many novel application areas (such as encryption, error checking, and multimedia), computer arithmetic will continue to thrive for years to come. 1.2 What is computer arithmetic A sequence of events, begun in late 1994 and extending into 1995, embarrassed the world’s largest computer chip manufacturer and put the normally dry subject of computer arithmetic on the front pages of major newspapers. The events were rooted in the work of Thomas Nicely, a mathematician at the Lynchburg College in Virginia, who is interested in twin primes (consecutive odd numbers such as 29 and 31 that are both prime). Nicely’s work involves the distribution of twin primes and, particularly, the sum of their reciprocals S = 1/5 + 1/7 1/11+1/13 +1/17 +1/19+1/29+1/31+-+1/P +1/(p +2) + - - -. While it is known that the infinite sum S has a finite value, no one knows what the value is. Nicely was using several different computers for his work and in March 1994 added a machine based on the Intel Pentium processor to his collection. Soon he began noticing inconsistencies in his calculations and was able to trace them back to the values computed for 1 / p and 1 / (p + 2) on the Pentium processor. At first, he suspected his own programs, the compiler, and the operating system, but by October, he became convinced
  • 10. 3 that the Intel Pentium chip was at fault. This suspicion was confirmed by several other researchers following a barrage of e-mail exchanges and postings on the Internet. The diagnosis finally came from Tim Coe, an engineer at Vitesse Semiconductor. Coe built a model of Pentium’s floating-point division hardware based on the radix-4 SRT algorithm and came up with an example that produces the worst-case error. Using double-precision floating- point computation, the ratio c = 4 195 835/3 145 727 = 1.333 820 44- - - is computed as 1.333 739 06 on the Pentium. This latter result is accurate to only 14 bits; the error is even larger than that of single-precision floating-point and more than 10 orders of magnitude worse that what is expected of double-precision computation. The rest, as they say, is history. Intel at first dismissed the severity of the problem and admitted only a “subtle flaw,” with a probability of 1 in 9 billion, or once in 27,000 years for the average spreadsheet user, of leading to computational errors. It nevertheless published a “white paper” that described the bug and its potential consequences and announced a replacement policy for the defective chips based on “customer need”; that is, customers had to show that they were doing a lot of mathematical calculations to get a free replacement. Under heavy criticism from customers, manufacturers using the Pentium chip in their products, and the on-line community, Intel later revised its policy to no-questions-asked replacement. Whereas supercomputing, microchips, computer networks, advanced applications (particularly chess-playing programs), and many other aspects of computer technology have made the news regularly in recent years, the Intel Pentium bug was the first instance of arithmetic (or anything inside the CPU for that matter) becoming front-page news. While this can be interpreted as a sign of pedantic dryness, it is more likely an indicator of stunning technological success. Glaring software failures have come to be routine events in our information-based society, but hardware bugs are rare and newsworthy. Within the hardware realm, we will be dealing with both general-purpose arithmetic/logic units (ALUS), of the type found in many commercially available processors, and special-purpose structures for solving specific application problems. The differences in the two areas are minor as far as the arithmetic algorithms are concerned. However, in view of the specific technological constraints, production volumes, and performance criteria, hardware implementations tend to be quite different. General- purpose processor chips that are mass-produced have highly optimized custom designs. Implementations of 1ow-volume, special-purpose systems, on the other hand, typically rely on semicustom and off-the-shelf components. However, when critical and strict requirements, such as extreme speed, very low power consumption, and miniature size,
  • 11. 4 preclude the use of semicustom or off-the shelf components, the much higher cost of a custom design may be justified even for a special-purpose system. 1.3 Multiplication Multiplication (often denoted by the cross symbol "×", or by the absence of symbol) is the third basic mathematical operation of arithmetic, the others being addition, subtraction and division (the division is the fourth one, because it requires multiplication to be defined). The multiplication of two whole numbers is equivalent to the addition of one of them with itself as many times as the value of the other one; for example, 3 multiplied by 4 (often said as "3 times 4") can be calculated by adding 4 copies of 3 together: 3 times 4 = 3 + 3 + 3 + 3 = 12 Here 3 and 4 are the "factors" and 12 is the "product". One of the main properties of multiplication is that the result does not depend on the place of the factor that is repeatedly added to it (commutative property). 3 multiplied by 4 can also be calculated by adding 3 copies of 4 together: 3 times 4 = 4 + 4 + 4 = 12. The multiplication of integers (including negative numbers), rational numbers (fractions) and real numbers is defined by a systematic generalization of this basic definition. Multiplication can also be visualized as counting objects arranged in a rectangle (for whole numbers) or as finding the area of a rectangle whose sides have given lengths. The area of a rectangle does not depend on which side is measured first, which illustrates the commutative property. In general, multiplying two measurements gives a new type, depending on the measurements. For instance: 2.5 meters times 4.5 meters = 11.25 square meters 11 meters/second times 9 seconds = 99 meters The inverse operation of the multiplication is the division. For example, since 4 multiplied by 3 equals 12, then 12 divided by 3 equals 4. Multiplication by 3, followed by division by 3, yields the original number (since the division of a number other than 0 by itself equals 1). Multiplication is also defined for other types of numbers, such as complex numbers, and more abstract constructs, like matrices. For these more abstract constructs, the order that the operands are multiplied sometimes does matter. Multiplication often realized by k cycles of shifting and adding, is a heavily used arithmetic operation that figures prominently in signal processing and scientific applications. In this part, after examining shift/add multiplication schemes and their various implementations, we note that there are but two ways to speed up the underlying multi operand addition: reducing the number of operands to be added leads to high-radix multipliers, and devising hardware multi operand adders that minimize the latency and/or maximize the throughput leads to tree and array multipliers. Of course, speed is not the only criterion of interest. Cost, VLSI area, and pin limitations favor bit-serial designs,
  • 12. 5 while the desire to use available building blocks leads to designs based on additive multiply modules. Finally, the special case of squaring is of interest as it leads to considerable simplification 1.4 Organization of report This report starts with introduction to computer arithmetic and then introduces multiplication. Then it explains implementation of one of the multiplier bit serial multiplier. Chapter 1: Introduction – This chapter explains importance of computer arithmetic and multiplication in computations. Chapter 2: VLSI – This chapter focuses on VLSI and its evolution, also its applications and advantages Chapter 3: Verilog HDL – This chapter explains how HDL’s reduce design cycle in VLSI and automation makes faster implementation. Chapter 4: Bit-serial multiplier – This chapter explains about multiplier and its types and how bit serial multiplier is useful. Chapter 5: Implementation – This chapter explains Implementation flow of Bit-serial multiplier its Verilog code and output waveforms. Chapter 6: Conclusions – This chapter summarizes Bit-serial multiplier and its future improvements.
  • 13. 6 CHAPTER 2 VLSI 2.1 Introduction Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have increased in complexity into the hundreds of millions of transistors. The first semiconductor chips held one transistor each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now known retrospectively as "small-scale integration" (SSI), improvements in technique led to devices with hundreds of logic gates, known as large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and hundreds of millions of individual transistors. At one time, there was an effort to name and calibrate various levels of large- scale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use. Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are VLSI or better. As of early 2008, billion-transistor processors are commercially available, an example of which is Intel's Montecito Itanium chip. This is expected to become more commonplace as semiconductor fabrication moves from the current generation of 65 nm processes to the next 45 nm generations (while experiencing new challenges such as increased variation across process corners). This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use extensive design automation and automated logic synthesis to
  • 14. 7 lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest efficiency (sometimes by bending or breaking established design rules to obtain the last bit of performance by trading stability). 2.2 What is VLSI? VLSI stands for "Very Large Scale Integration". This is the field which involves packing more and more logic devices into smaller and smaller areas. • Simply we say Integrated circuit is many transistors on one chip. • Design/manufacturing of extremely small, complex circuitry using modified semiconductor material • Integrated circuit (IC) may contain millions of transistors, each a few mm in size • Applications wide ranging: most electronic logic devices 2.2.1 History of Scale Integration • late 40s Transistor invented at Bell Labs • late 50s First IC (JK-FF by Jack Kilby at TI) • early 60s Small Scale Integration (SSI) o 10s of transistors on a chip o late 60s Medium Scale Integration (MSI) o 100s of transistors on a chip • early 70s Large Scale Integration (LSI) o 1000s of transistor on a chip • early 80s VLSI 10,000s of transistors on a chip (later 100,000s & now 1,000,000s) • Ultra LSI is sometimes used for 1,000,000s 2.3 Advantages of ICs over discrete components While we will concentrate on integrated circuits, the properties of integrated circuits-what we can and cannot efficiently put in an integrated circuit- largely determine the architecture of the entire system. Integrated circuits improve system characteristics in several critical ways. ICs have three key advantages over digital circuits built from discrete components: Size: Integrated circuits are much smaller-both transistors and wires are shrunk to micrometer sizes, compared to the millimeter or centimeter scales of discrete
  • 15. 8 components. Small size leads to advantages in speed and power consumption, since smaller components have smaller parasitic resistances, capacitances, and inductances. Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip than they can between chips. Communication within a chip can occur hundreds of times faster than communication between chips on a printed circuit board. The high speed of circuits on- chip is due to their small size-smaller components and wires have smaller parasitic capacitances to slow down the signal. Power consumption: Logic operations within a chip also take much less power. Once again, lower power consumption is largely due to the small size of circuits on the chip- smaller parasitic capacitances and resistances require less power to drive them 2.4 VLSI And Systems These advantages of integrated circuits translate into advantages at the system level: Smaller physical size: Smallness is often an advantage in itself- consider portable televisions or handheld cellular telephones. Lower power consumption: Replacing a handful of standard parts with a single chip reduces total power consumption. Reducing power consumption has a ripple effect on the rest of the system: a smaller, cheaper power supply can be used; since less power consumption means less heat, a fan may no longer be necessary; a simpler cabinet with less shielding for electromagnetic shielding may be feasible, too. Reduced cost: Reducing the number of components, the power supply requirements, cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that the cost of a system built from custom ICs can be less, even though the individual ICs cost more than the standard parts they replace. Communication within a chip can occur hundreds of times faster than communication between chips on a printed circuit board. Understanding why integrated circuit technology has such profound influence on the design of digital systems requires understanding both the technology of IC manufacturing and the economics of ICs and digital systems. 2.5 Applications of VLSI Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or by other means; electronics are usually smaller, more flexible, and easier to service. In other cases electronic systems have created totally new applications.
  • 16. 9 Electronic systems perform a variety of tasks, some of them visible, some more hidden. Electronic systems in cars operate stereo systems and displays; they also control fuel injection systems, adjust suspensions to varying terrain, and perform the control functions required for anti-lock braking (ABS) systems. • Digital electronics compress and decompress video, even at high-definition data rates, on-the-fly in consumer electronics. • Low-cost terminals for Web browsing still require sophisticated electronics, despite their dedicated function. • Personal computers and workstations provide word-processing, financial analysis, and games. Computers include both central processing units (CPUs) and special- purpose hardware for disk access, faster screen display, etc. • Medical electronic systems measure bodily functions and perform complex processing algorithms to warn about unusual conditions. The availability of these complex systems, far from overwhelming consumers, only creates demand for even more complex systems. 2.6 Conclusion The growing sophistication of applications continually pushes the design and manufacturing of integrated circuits and electronic systems to new levels of complexity. And perhaps the most amazing characteristic of this collection of systems is its variety- as systems become more complex, we build not a few general-purpose computers but an ever wider range of special-purpose systems. Our ability to do so is a testament to our growing mastery of both integrated circuit manufacturing and design, but the increasing demands of customers continue to test the limits of design and manufacturing.
  • 17. 10 CHAPTER 3 VERILOG HDL 3.1 Introduction Verilog HDL is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic-level to the gate-level to the switch-level. The complexity of the digital system being modeled could vary from that of a simple gate to a complete electronic digital system, or anything in between. The digital system can be described hierarchically and timing can be explicitly modeled within the same description. The Verilog HDL language includes capabilities to describe the behavior-al nature of a design, the dataflow nature of a design, a design's structural composition, delays and a waveform generation mechanism including aspects of response monitoring and verification, all modeled using one single language. In addition, the language provides a programming language interface through which the internals of a design can be accessed during simulation including the control of a simulation run. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified using a Verilog simulator. The language inherits many of its operator symbols and constructs from the C programming language. Verilog HDL provides an extensive range of modeling capabilities, some of which are quite difficult to comprehend initially. However, a core subset of the language is quite easy to learn and use. This is sufficient to model most applications. The Verilog HDL language was first developed by Gateway Design Automation in 1983 as hardware are modeling language for their simulator product, At that time ,it was a proprietary language. The Verilog HDL language includes capabilities to describe the behavior-al nature of a design, the dataflow nature of a design, a design's structural Because of the popularity of the, simulator product, Verilog HDL gained acceptance as a usable and practical language by a number of designers. In an effort to increase the popularity of the language, the language was placed in the public domain in 1990. Open Verilog International (OVI) was formed to promote Verilog. In 1992 OVI decided to pursue standardization of Verilog HDL as an IEEE standard. This effort was successful and the language became an IEEE standard in 1995. The complete standard is described in the Verilog hardware description language reference manual. The standard is called std. 1364-1995.
  • 18. 11 3.2 Major Capabilities Listed below are the major capabilities of the Verilog hardware description: • Primitive logic gates, such as and, or and nand, are built-in into the language. • Flexibility of creating a user-defined primitive (UDP). Such a primitive could either be a combinational logic primitive or a sequential logic primitive. • Switch-level modeling primitive gates, such as pmos and nmos, are also built- in into the language. • A design can be modeled in three different styles or in a mixed style. These styles are: behavioral style modeled using procedural constructs; dataflow style - modeled using continuous assignments; and structural style modeled using gate and module instantiations. • There are two data types in Verilog HDL; the net data type and the register data type. The net type represents a physical connection between structural elements while a register type represents an abstract data storage element. • Figure.3-1 shows the mixed-level modeling capability of Verilog HDL, that is, in one design; each module may be modeled at a different level. Figure 3.1 Mixed level modeling • Verilog HDL also has built-in logic functions such as & (bitwise-and) and I (bitwise-or). • High-level programming language constructs such as conditionals, case statements, and loops are available in the language. • Notion of concurrency and time can be explicitly modeled. • Powerful file read and write capabilities fare provided. • The language is non-deterministic under certain situations, that is, a model may produce different results on different simulators; for example, the ordering of events on an event queue is not defined by the standard.
  • 19. 12 3.3 SYNTHESIS Synthesis is the process of constructing a gate level netlist from a register- transfer level model of a circuit described in Verilog HDL. Figure.3-2 shows such a process. A synthesis system may as an intermediate step, generate a netlist that is comprised of register-transfer level blocks such as flip-flops, arithmetic-logic-units, and multiplexers, interconnected by wires. In such a case, a second program called the RTL module builder is necessary. The purpose of this builder is to build, or acquire from a library of predefined components, each of the required RTL blocks in the user- specified target technology. Figure 3.2 Synthesis process The above figure shows the basic elements of Verilog HDL and the elements used in hardware. A mapping mechanism or a construction mechanism has to be provided that translates the Verilog HDL elements into their corresponding hardware elements as shown in figure.3-3 3.4 Conclusion The Verilog HDL language includes capabilities to describe the behavior-al nature of a design, the dataflow nature of a design, a design's structural composition, delays and a waveform generation mechanism including aspects of response monitoring and verification, all modeled using one single language. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified using a Verilog simulator. The Verilog HDL language includes capabilities to describe the behavior-al nature of a design, the dataflow nature of a design, a design's structural composition, delays.
  • 20. 13 Figure 3.3: Typical design process
  • 21. 14 CHAPTER 4 BIT-SERIAL MULTIPLIER 4.1 Multiplier Multipliers are key components of many high performance systems such as FIR filters, microprocessors, digital signal processors, etc. A system’s performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest clement in the system. Furthermore, it is generally the most area consuming. Hence, optimizing the speed and area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. As a result, whole spectrums of multipliers with different area-speed constraints are designed with fully parallel processing. In between are digit serial multipliers where single digits consisting of several bits are operated on. These multipliers have moderate performance in both speed and area. However, existing digit serial multipliers have been plagued by complicated switching systems and/or irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion instead of bits bring the pipelining to the digit level and avoid most of the above problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative and modular. The pipelining done at the digit level brings the benefit of constant operation speed irrespective of the size of’ the multiplier. The clock speed is only determined by the digit size which is already fixed before the design is implemented. The growing market for fast floating-point co-processors, digital signal processing chips, and graphics processors has created a demand for high speed, area-efficient multipliers. Current architectures range from small, low-performance shift and add multipliers, to large, high performance array and tree multipliers. Conventional linear array multipliers achieve high performance in a regular structure, but require large amounts of silicon. Tree structures achieve even higher performance than linear arrays but the tree interconnection is more complex and less regular, making them even larger than linear arrays. Ideally, one would want the speed benefits of a tree structure, the regularity of an array multiplier, and the small size of a shift and add multiplier. 4.2 Background Webster’s dictionary defines multiplication as “a mathematical operation that at its simplest is an abbreviated process of adding an integer to itself a specified number of times”. A number (multiplicand) is added to itself a number of times as specified by
  • 22. 15 another number (multiplier) to form a result (product). In elementary school, students learn to multiply by placing the multiplicand on top of the multiplier. The multiplicand is then multiplied by each digit of the multiplier beginning with the rightmost, Least Significant Digit (LSD). Intermediate results (partial-products) are placed one atop the other, offset by one digit to align digits of the same weight. The final product is determined by summation of all the partial-products. Although most people think of multiplication only in base 10, this technique applies equally to any base, including binary. Figure 1.2.1 shows the data flow for the basic multiplication technique just described. Each black dot represents a single digit. Figure 4.1: Basic Multiplication Data flow 4.2.1 Binary Multiplication In the binary number system the digits, called bits, are limited to the set. The result of multiplying any binary number by a single binary bit is either 0, or the original number. This makes forming the intermediate partial-products simple and efficient. Summing these partial- products is the time consuming task for binary multipliers. One logical approach is to form the partial-products one at a time and sum them as they are generated. Often implemented by software on processors that do not have a hardware multiplier, this technique works fine, but is slow because at least one machine cycle is required to sum each additional partial-product. For applications where this approach does not provide enough performance, multipliers can be implemented directly in hardware. 4.2.2 Hardware Multipliers Direct hardware implementations of shift and add multipliers can increase performance over software synthesis, but are still quite slow. The reason is that as each additional partial- product is summed a carry must be propagated from the least significant bit (LSB) to the most significant bit (MSB). This carry propagation is time
  • 23. 16 consuming, and must be repeated for each partial product to be summed. One method to increase multiplier performance is by using encoding techniques to reduce the number of partial products to be summed. Just such a technique was first proposed by Booth. The original Booth’s algorithm ships over contiguous strings of l’s by using the property that: 2” + 2(n-1) + 2(n-2) + . . . + 2hm) = 2(n+l) - 2(n-m). Although Booth’s algorithm produces at most N/2 encoded partial products from an N bit operand, the number of partial products produced varies. This has caused designers to use modified versions of Booth’s algorithm for hardware multipliers. Modified 2-bit Booth encoding halves the number of partial products to be summed. Since the resulting encoded partial-products can then be summed using any suitable method, modified 2 bit Booth encoding is used on most modern floating-point chips LU 881, MCA 861. A few designers have even turned to modified 3 bit Booth encoding, which reduces the number of partial products to be summed by a factor of three IBEN 891. The problem with 3 bit encoding is that the Carry-propagate addition required to form the 3X multiples often overshadows the potential gains of 3 bit Booth encoding. To achieve even higher performance advanced hardware multiplier architectures search for faster and more efficient methods for summing the partial-products. Most increase performance by eliminating the time consuming carry propagate additions. To accomplish this, they sum the partial-products in a redundant number representation. The advantage of a redundant representation is that two numbers, or partial-products, can be added together without propagating a carry across the entire width of the number. Many redundant number representations are possible. One commonly used representation is known as carry-save form. In this redundant representation two bits, known as the carry and sum, are used to represent each bit position. When two numbers in carry-save form are added together any carries that result are never propagated more than one bit position. This makes adding two numbers in carry-save form much faster than adding two normal binary numbers where a carry may propagate. One common method that has been developed for summing rows of partial products using a carry-save representation is the array multiplier. 4.2.3 Array Multipliers Conventional linear array multipliers consist of rows of carry-save adders (CSA). A portion of an array multiplier with the associated routing can be seen in Figure 4.2.
  • 24. 17 Figure 4.2: Two Rows of an Array Multiplier In a linear array multiplier, as the data propagates down through the array, each row of CSA’s adds one additional partial-product to the partial sum. Since the intermediate partial sum is kept in a redundant, carry-save form there is no carry propagation. This means that the delay of an array multiplier is only dependent upon the depth of the array, and is independent of the partial-product width. Linear array multipliers are also regular, consisting of replicated rows of CSA’s. Their high performance and regular structure have perpetuated the use of array multipliers for VLSI math co-processors and special purpose DSP chips. The biggest problem with full linear array multipliers is that they are very large. As operand sizes increase, linear arrays grow in size at a rate equal to the square of the operand size. This is because the number of rows in the array is equal to the length of the multiplier, with the width of each row equal to the width of multiplicand. The large size of full arrays typically prohibits their use, except for small operand sizes, or on special purpose math chips where a major portion of the silicon area can be assigned to the multiplier array. Another problem with array multipliers is that the hardware is underutilized. As the sum is propagated down through the array, each row of CSA’s computes a result only once, when the active computation front passes that row. Thus, the hardware is doing useful work only a very small percentage of the time. This low hardware utilization in conventional linear array multipliers makes performance gains possible through increased efficiency. For example, by overlapping calculations pipelining can achieve a large gain in throughput Figure 4.3 shows a full array pipelined after each row of CSA’s. Once the partial sum has passed the first row of CSA’s, represented by the shaded row of GSA’s in
  • 25. 18 cycle 1, a subsequent multiply can be started on the next cycle. In cycle 2, the first partial sum has passed to the second row of CM’s, and the second multiply, represented by the cross hatched row of CSA’s, has begun. Although pipelining a full array can greatly increase throughput, both the size and latency are increased due to the additional latches While high throughput is desirable, for general purpose computers size and latency tend to be more important; thus, fully pipelined linear array multipliers are seldom found. Figure 4.3: Data Flow through a Pipelined Array Multiplier 4.3 Variations in Multipliers We do not always synthesize our multipliers from scratch but may desire, or be required, to use building blocks such as adders, small multipliers, or lookup tables. Furthermore, limited chip area and/or pin availability may dictate the use of bit-serial designs. In this chapter, we discuss such variations and also deal with modular multipliers, the special case of squaring, and multiply-accumulators. • Divide-and-Conquer Designs • Additive Multiply Modules • Bit-Serial Multipliers • Modular Multipliers • The Special Case of Squaring • Combined Multiply-Add Units
  • 26. 19 4.4 Bit-serial Multipliers Bit-serial arithmetic is attractive in view of its smaller pin count, reduced wire length, and lower floor space requirements in VLSI. In fact, the compactness of the design may allow us to run a bit-serial multiplier at a clock rate high enough to make the unit almost competitive with much more complex designs with regard to speed. In addition, in certain application contexts inputs are supplied bit-serially anyway. In such a case, using a parallel multiplier would be quite wasteful, since the parallelism may not lead to any speed benefit. Furthermore, in applications that call for a large number of independent multiplications, multiple bit-serial multipliers may be more cost-effective than a complex highly pipelined unit. Figure 4.4: Bit-serial multiplier; 4x4 multiplication in 8 clock cycles Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of processing elements that are interconnected by only short, local wires thus allowing very high clock rates. Let us begin by introducing a semisystolic multiplier, so named because its design involves broadcasting a single bit of the multiplier x to a number of circuit elements, thus violating the “short, local wires” requirement of pure systolic design. Figure 4.4 shows a semisystolic 4 x 4 multiplier. The multiplicand a is supplied in parallel from above and the multiplier x is supplied bit-serially from the right, with its least significant bit arriving first. Each bit xi of the multiplier is multiplied by a and the
  • 27. 20 result added to the cumulative partial product, kept in carry-save form in the carry and sum latches. The carry bit stays in its current position, while the sum bit is passed on to the neighboring cell on the right. This corresponds to shifting the partial product to the right before the next addition step (normally the sum bit would stay put and the carry bit would be shifted to the left). Bits of the result emerge serially from the right as they become available. A k-bit unsigned multiplier x must be padded with k zeros to allow the carries to propagate to the output, yielding the correct 2k-bit product. Thus, the semisystolic multiplier of Figure 4.4 can perform one k x k unsigned integer multiplication every 2k clock cycles. If k-bit fractions need to be multiplied, the first k output bits are discarded or used to properly round the most significant k bits. To make the multiplier of Figure 4.4 fully systolic, we must remove the broadcasting of the multiplier bits. This can be accomplished by a process known as systolic retiming, which is briefly explained below Consider a synchronous (clocked) circuit, with each line between two functional parts having an integral number of unit delays (possibly 0). Then, if we cut the circuit into two parts CL and CR, we can delay (advance) all the signals going in one direction and advance (delay) the ones going in the opposite direction by the same amount without affecting the correct functioning or external timing relations of the circuit. Of course, the primary inputs and outputs to the two parts CL and cg must be correspondingly advanced or delayed, too. For the retiming to be possible, all the signals that are advanced by d must have had original delays of d or more (negative delays are not allowed). Note that all the signals going into CL have been delayed by d time units. Thus, CL will work as before, except that everything, including output production, occurs d time units later than before retiming. Advancing the outputs by d time units will keep the external view of the circuit unchanged. We apply the preceding process to the multiplier circuit of Figure 4.4 in three successive steps corresponding to cuts 1, 2, and 3, each time delaying the left-moving signal by one unit and advancing the right-moving signal by one unit. Verifying that the multiplier in Fig. 12.9 works correctly is left as an exercise. This new version of our multiplier does not have the fan-out problem of the design in Figure 4.4 but it suffers from long signal propagation delay through the four FAs in each clock cycle, leading to inferior operating speed. Note that the culprits are zero-delay lines that lead to signal propagation through multiple circuit elements.
  • 28. 21 One way of avoiding zero-delay lines in our design is to begin by doubling all the delays in Figure 4.4. This is done by simply replacing each of the sum and carry flip-flops with two cascaded flip-flops before retiming is applied. Since the circuit is now operating at half its original speed, the multiplier x must also be applied on alternate clock cycles. The resulting design is fully systolic, inasmuch as signals move only between adjacent cells in each clock cycle. However, twice as many cycles are needed. The easiest way to derive a multiplier with both inputs entering bit-serially is to allow k clock ticks for the multiplicand bits to be put into place in a shift register and then use the design of Figure 4.4 to compute the product. This increases the total delay by k cycles. Figure 4.5 uses dot notation to show the justification for the bit-serial multiplier design above. Figure 4.5 depicts the meanings of the various partial operands and results. Figure 4.5: Bit Serial multiplier design in dot notation
  • 29. 22 CHAPTER 5 IMPLEMENTATION 5.1 Tools Used 1) Pc installed with linux operating system 2) Installed cadence tools: • Ncvlog – For checking errors • Ncverilog – For execution of code • Simvision – To View waveforms 5.2 Coding Steps 1) Create directory structure for the project as below Figure 5.1: Project directory structure 2) Write RTL code in a text file and save it as .v extension in RTL directory 3) Write code for testbench and store in TB directory 5.3 Simulation steps The Commands that are used in cadence for the execution are 1) Initially we should mount the server using “mount -a”. 2) Go to the C environment with the command “csh” //c shell. 3) The source file should be opened by the command “source /root/cshrc”. 4) The next command is to go to the directory of cadence_dgital_labs #cd .../../cadence_digital_labs/ 5) Then check the file for errors by the command “ncvlog ../rtl/filename.v -mess”. 6) Then execute the file using “ncverilog +access +rwc ../rtl/filename.v ../tb/file_tb.v +nctimescale +1ns/1ps Rwc –read write command Gui- graphical unit interface 7) After running the program we open simulation window by command “simvision &".
  • 30. 23 Figure 5.2: Simulation window 8) After the simulation the waveforms are shown in the other window. Figure 5.3: Waveform window 5.4 Full adder code module fulladder(output reg cout,sum,input a,b,cin,rst); always@(a,b,cin) {cout,sum}=a+b+cin; always@(posedge rst) begin sum<=0; cout<=0; end endmodule
  • 31. 24 5.5 Full adder flowchart Figure 5.4: Full adder flowchart 5.6 Full adder testbench module full_adder_tb; wire cout,sum; reg a,b,cin,rst; //dut fulladder fa(cout,sum,a,b,cin,rst); initial begin #2 rst=1'b1; #(period/2) rst=1'b0; a=1'b1; b=1'b0; cin=1'b1; #5 a=1'b0; b=1'b1; cin=1'b1; $finish; end endmodule
  • 32. 25 5.7 Bit-serial multiplier algorithm Figure 5.5: Bit-Serial multiplier flowchart 5.8 Bit-Serial multiplier code module serial_mult(output product,input [3:0] a,input b,clk,rst); wire s1,s2,s3; reg s1o,s2o,s3o; //latches for sum at various stages wire c0,c1,c2,c3; reg c0o,c1o,c2o,c3o;//latches for carry at various stages wire a3o,a2o,a1o,a0o; reg s; fulladder fa0(c0,product,a0o,s1o,c0o,rst); fulladder fa1(c1,s1,a1o,s2o,c1o,rst); fulladder fa2(c2,s2,a2o,s3o,c2o,rst); fulladder fa3(c3,s3,a3o,s,c3o,rst); and n0(a0o,a[0],b); and n1(a1o,a[1],b); and n2(a2o,a[2],b); and n3(a3o,a[3],b); always@(posedge clk, posedge rst) begin
  • 33. 26 if(rst) begin s=0; c0o<=1'b0; c1o<=1'b0; c2o<=1'b0; c3o<=1'b0; s1o<=1'b0; s2o<=1'b0; s3o<=1'b0; end else //moving all sums to reg begin c0o<=c0; c1o<=c1; c2o<=c2; c3o<=c3; s1o<=s1; s2o<=s2; s3o<=s3; end end endmodule 5.9 Full adder waveform Figure 5.6: Full adder output waveforms 5.10 Bit-serial multiplier testbench module serial_mult_tb; reg [3:0] a; reg b; wire product; reg clk,rst; parameter period=10; serial_mult dut(product,a,b,clk,rst); //dut //clock
  • 34. 27 initial clk=0; always #period clk=~clk; initial begin #2 rst=1'b1; #(period/2) rst=1'b0; a=4'b1101; b=1; @(posedge clk) b=0; @(posedge clk) b=0; @(posedge clk) b=1; @(posedge clk) b=0; @(posedge clk) b=0; @(posedge clk) b=0; @(posedge clk) b=0; #period $finish; end endmodule 5.11 Bit-serial multiplier waveforms Figure 5.7: Bit serial multiplier input/output waveforms Figure 5.8: Bit serial multiplier with intermediate waveforms
  • 35. 28 CHAPTER 6 CONCLUSIONS Multipliers play an important role in today’s digital signal processing and various other applications. With advances in technology, many researchers have tried and are trying to design multipliers which offer either of the following design targets – high speed, low power consumption, regularity of layout and hence less area or even combination of them in one multiplier thus making them suitable for various high speed, low power and compact VLSI implementation. The common multiplication method is “add and shift” algorithm. In parallel multipliers number of partial products to be added is the main parameter that determines the performance of the multiplier. To reduce the number of partial products to be added, Modified Booth algorithm is one of the most popular algorithms. To achieve speed improvements Wallace Tree algorithm can be used to reduce the number of sequential adding stages. Further by combining both Modified Booth algorithm and Wallace Tree technique we can see advantage of both algorithms in one multiplier. However with increasing parallelism, the amount of shifts between the partial products and intermediate sums to be added will increase which may result in reduced speed, increase in silicon area due to irregularity of structure and also increased power consumption due to increase in interconnect resulting from complex routing. On the other hand “serial-parallel” multipliers compromise speed to achieve better performance for area and power consumption. The selection of a parallel or serial multiplier actually depends on the nature of application. A key challenge facing current and future computer designers is to reverse the trend by removing layer after layer of complexity, opting instead for clean, robust, and easily certifiable designs, while continuing to try to devise novel methods for gaining performance and ease-of-use benefits from simpler circuits that can be readily adapted to application requirements. This is achieved by using Bit Serial multipliers.
  • 36. 29 REFERENCES [1] Behrooz Parhami, Computer arithmetic: algorithms and hardware designs, Oxford University Press, 2009 [2] F. Sadiq M. Sait, Gerhard Beckoff, “A Novel Technique for Fast Multiplication”. IEEE Fourteenth Annual International Phoenix Conference on Computers and Communications, vol. 7803-2492-7, pp. 109-114, 1995. [3] Ghest, C., Multiplying Made Easy for Digital Assemblies, Electronics, Vol. 44, pp.56-61. November 22. 1971. [4] Ienne, P., and M. A. Viredaz, “Bit-Seria1 Multipliers and Squarers,” IEEE Trans. Computers, Vol. 43, No. 12, pp. 1445-1450, 1994 [5] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, Prentice Hall Professional, 2003