You are on page 1of 30

ADSD Fall 2011

Lecture # 10

Dr. Rehan Hafiz

<rehan.hafiz@seecs.edu.pk>

Course Website for ADSD Fall 2011


2

http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Ercegovacs Book: Digital Arithmetic 2004 5. Dr. Shoab A Khans CASE Lectures on Advanced Digital System Design
Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Lectures: Contact: Office:

Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library

Lecture Overview
3

Last Lecture

Signed/Unsigned Number Representation Sign Extension, Truncation, Fixed Point Addition

This Lecture
Adders Ripple Carry Adder (RCA) Pipelined Adder Bit Serial Adder Fast Adders

Carry Select Adders (CSA) Group CLAs Conditional Sum Adders

Basic Adder Review


4

Logic Equations : HA C = x y S = x y

tc = txor + tand + tor ts = 2txor Critical Path: Max(tc,ts)

Ripple Carry Adder


5

Delay:

Assign {cout, sum}= a + b + c_in;


a[4] b[4] a[3] b[3] a[2] b[2] a[1] b[1] a[0] b[0]

a[5] b[5]

cout FA

C4

FA C S

C3

FA C S

C2

FA C S

C1

FA C S

C0

FA C S cin

C S
c5

c4

c3

c2

c1

c0

S[5]

S[4]

S[3]

S[2]

S[1]

S[0]

RCA Characteristics
6

Implements the conventional way of adding two numbers Slowest parallel Adder / Takes minimum area N-bit full adders are required to add two N-bit operands Speed is linear with word length O(N)
4

Carry Delays for a 4 bit RCA

Optimization..
7

So how can we optimize for


Throughput
Area Timing/

Latency

Remember -- High Throughput Pipelining using the Delay Transfer Theorem


8

Remember Area Effcient/Reusing Resources

Bit Serial Adder


9

Carry Shift reg A


clk Load regA N 1 1

FF

FA
clk

Shift reg B
clk 1 Load regB

Sum
1 clk

Shift reg C

Reg C
Load regs

Bit Serial Adder (Two adders)


10

Fast Adders
11

Pipelined adder is great BUT


Increases the

latency

Way to Low Latency Adders


Do

we really need to wait for Carries


start processing the data

Pre-compute Carries
OR we can at least

Some observations - RCA


12

The ripple-carry adder introduces too much delay into a system. The longest path through the adder is from the inputs of the least significant full adder to the outputs of the most significant full adder. However
the

process of summing the inputs at each bit position is relatively fast (a small two-level circuit suffices)

Carry Look Ahead Adder (CLA)


13

Generate all incoming carries in advance Idea: A carry is either generated or propagated Carry at ith location depends on the carry & inputs at (i-1)th location & not on the previous sum

Carry Look Ahead Adder (CLA)


14

Pi = ai ^ bi Gi = ai bi

Sum and Cout can be re-expressed in terms of generate/propagate: Ci+1 = Gi + Pi Ci Si = Ci ^ Pi (^ =xor)

Parallel Look Ahead Generation of all carries


15

CLA look ahead eqs.


Ci+1 = Gi + Pi Ci Si = Ci ^ Pi

Look ahead Carries C1 = G0 + P0C0 C2 = G1 + P1C1 = G1 + P1(G0 + P0C0) = G1 + P1G0 + P0P1C0 C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0 C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0

1-Gate -Delay Pi = ai ^ bi Gi = ai bi
16

2-Gate -Delay c0 = 0 c1 = G0 c2 = G1 + P1c1 c3 = G2 + P2G1 + P2P1c1 c4 = G3 + P3G2 + P3P2G1 + P3P2P1c1

2-Gate Delay for a Full Adder

$ Plz. Correct gate notations * Gate delays assuming 1 gate delay for xor gate

Final Result
17

Each of the carry equations can be implemented with two-level logic All inputs are now directly derived from data inputs and not from intermediate carries this allows computation of all sum outputs to proceed in parallel

Carry Lookahead Adder


Maximum gate delay for the carry generation is only 3. The full adders introduce two more gate delays. Worst case path is 5 gate delays (To final sum bit to be generated !)

In general, the maximum fan-in/out of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16.

Fan IN/OUT Effects


19

Fundamentals of digital logic with Verilog design By Stephen D Brown

CLA
20

As n increases Fan IN/OUT becomes an issue Options


Ripple

the carry across blocks(groups) of CLA adders of limited size Or we may again pre-compute in parallel Group Carry of each block

Group Carry Look-ahead Adder


A16-bit GCLA is composed of four 4-bit CLAs, with additional logic that generates the carries between the four-bit groups. GG0 = G3 + P3G2 + P3P2G1 + P3P2P1G0 GP0 = P3P2P1P0 c4 = GG0 + GP0c0
No carries are required to generate Group G & Group P We just need single-xor-gate-delay G & P signals ! Total Delay = 3 Gate Delays for GG/GP To generate carries just use Group G & Group P with 2 Gate Delays

c8 = GG1 + GP1c4 = GG1 + GP1GG0 + GP1GP0c0 c12 = GG2 + GP2c8 = GG2 + GP2GG1 + GP2GP1GG0 + GP2GP1GP0c0
Red part will constitute Ripple based Group CLA Black Part will result into CLA based GCLA

c16 = GG3 + GP3c12 = GG3 + GP3GG2 + GP3GP2GG1 + GP3GP2GP1GG0 + GP3GP2GP1GP0c0

16-Bit Group Carry Lookahead Adder


Each CLA has a longest path of 5 gate delays

In the GCLL section, GG and GP signals are generated in 3 gate delays; carry signals are generated in 2 more gate delays, resulting in 5 gate delays to generate the carry out of each GCLA group and 10 gates delays on the worst case path (which is s15 not c16).

FAN in / FAN out


23

In general, the maximum fan-in of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16. In comparison, the maximum fan-in for a 16-bit GCLA is five (for generating c16). The fan-outs for both cases are the same as the fan-ins.

Carry Select Adder


24

Partition the adder into K groups Two values of sum with cin (1 and 0) are precomputed for each adder group Actual sum is selected using a 2-to-1 MUX by the carry of the previous group Allows computation of possible results in parallel Requires internal carry for blocks, e.g. ripple

Carry Select Adder


25

Three partitions have been made of 4 bits each Outputs of each 4 bit adder block would be ready simultaneously including the Cout of the first adder
Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder

C0
S0 Cin = 1 4 - bit Adder

C0
S0 Cin = 1 4 - bit Adder

C0
S0 Cin = 1 4 - bit Adder

C1
2-to-1 Mux

S1 4-bit 2-to- 1 Mux

C1
2-to-1 Mux

S1 4-bit 2-to- 1 Mux

C1
2-to-1 Mux

S1 4-bit 2-to- 1 Mux Carry in

Cout[11]

Cout[7]

Cout[3]

SUM [11-8]

SUM [7-4]

SUM [3-0]

Non Uniform Group Carry Select Adder


26

Delay: Approx. 5RCA Delay + 2-to-1 Mux Delay

1 1 1 1 1 0 1 1 1 1 0 0 (a)

CSA: Example
27

1 1 1 1 1 1 0 1 0 0 1 1 (b)
1 1 1 1 0 0 0 0 1 1 1 1 (cin=0) 1 1 1 1 1 0 0 1 0 0 0 0 (cin=1)

1 1

11111

1 1
11110

0111

0 1
000 0001

100

11111
11111

1010

011
111

0010

11111

0001

111

28

If we keep on reducing the number of bits per adder we reach Conditional sum adder

Conditional sum adder


29

References & Further Reading


30

Ercegovacs Book: Digital Arithmetic 2004 Another Useful Link


http://www.aoki.ecei.tohoku.ac.jp/arith/mg/algori

thm.html

You might also like