ADSD Fall2011 10 Adders

ADSD Fall 2011
Lecture # 10
Dr. Rehan Hafiz
<rehan.hafiz@seecs.edu.pk>
Course Website for ADSD Fall 2011

2
http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Ercegovacs Book: Digital Arithmetic 2004 5. Dr. Shoab A Khans CASE Lectures on Advanced Digital System Design
Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Lectures: Contact: Office:
Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library
Lecture Overview
3
Last Lecture

Signed/Unsigned Number Representation Sign Extension, Truncation, Fixed Point Addition
This Lecture
Adders Ripple Carry Adder (RCA) Pipelined Adder Bit Serial Adder Fast Adders
Carry Select Adders (CSA) Group CLAs Conditional Sum Adders
Basic Adder Review

4
Logic Equations : HA C = x y S = x y
tc = txor + tand + tor ts = 2txor Critical Path: Max(tc,ts)
Ripple Carry Adder

5
Delay:
Assign {cout, sum}= a + b + c_in;

a[4] b[4] a[3] b[3] a[2] b[2] a[1] b[1] a[0] b[0]
a[5] b[5]
cout FA
C4
FA C S
C3
FA C S
C2
FA C S
C1
FA C S
C0
FA C S cin
C S
c5
c4
c3
c2
c1
c0
S[5]
S[4]
S[3]
S[2]
S[1]
S[0]
RCA Characteristics
6
Implements the conventional way of adding two numbers Slowest parallel Adder / Takes minimum area N-bit full adders are required to add two N-bit operands Speed is linear with word length O(N)
4
Carry Delays for a 4 bit RCA
Optimization..
7
So how can we optimize for

Throughput
Area Timing/
Latency
Remember -- High Throughput Pipelining using the Delay Transfer Theorem

8
Remember Area Effcient/Reusing Resources
Bit Serial Adder

9
Carry Shift reg A

clk Load regA N 1 1
FF
FA
clk
Shift reg B
clk 1 Load regB
Sum
1 clk
Shift reg C
Reg C
Load regs
Bit Serial Adder (Two adders)

10
Fast Adders
11
Pipelined adder is great BUT

Increases the
latency
Way to Low Latency Adders

Do
we really need to wait for Carries

start processing the data
Pre-compute Carries
OR we can at least
Some observations - RCA

12
The ripple-carry adder introduces too much delay into a system. The longest path through the adder is from the inputs of the least significant full adder to the outputs of the most significant full adder. However
the
process of summing the inputs at each bit position is relatively fast (a small two-level circuit suffices)
Carry Look Ahead Adder (CLA)

13
Generate all incoming carries in advance Idea: A carry is either generated or propagated Carry at ith location depends on the carry & inputs at (i-1)th location & not on the previous sum
Carry Look Ahead Adder (CLA)

14
Pi = ai ^ bi Gi = ai bi
Sum and Cout can be re-expressed in terms of generate/propagate: Ci+1 = Gi + Pi Ci Si = Ci ^ Pi (^ =xor)
Parallel Look Ahead Generation of all carries

15
CLA look ahead eqs.

Ci+1 = Gi + Pi Ci Si = Ci ^ Pi
Look ahead Carries C1 = G0 + P0C0 C2 = G1 + P1C1 = G1 + P1(G0 + P0C0) = G1 + P1G0 + P0P1C0 C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0 C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0
1-Gate -Delay Pi = ai ^ bi Gi = ai bi
16
2-Gate -Delay c0 = 0 c1 = G0 c2 = G1 + P1c1 c3 = G2 + P2G1 + P2P1c1 c4 = G3 + P3G2 + P3P2G1 + P3P2P1c1
2-Gate Delay for a Full Adder
$ Plz. Correct gate notations * Gate delays assuming 1 gate delay for xor gate
Final Result
17
Each of the carry equations can be implemented with two-level logic All inputs are now directly derived from data inputs and not from intermediate carries this allows computation of all sum outputs to proceed in parallel
Carry Lookahead Adder

Maximum gate delay for the carry generation is only 3. The full adders introduce two more gate delays. Worst case path is 5 gate delays (To final sum bit to be generated !)
In general, the maximum fan-in/out of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16.
Fan IN/OUT Effects

19
Fundamentals of digital logic with Verilog design By Stephen D Brown
CLA
20
As n increases Fan IN/OUT becomes an issue Options

Ripple
the carry across blocks(groups) of CLA adders of limited size Or we may again pre-compute in parallel Group Carry of each block
Group Carry Look-ahead Adder

A16-bit GCLA is composed of four 4-bit CLAs, with additional logic that generates the carries between the four-bit groups. GG0 = G3 + P3G2 + P3P2G1 + P3P2P1G0 GP0 = P3P2P1P0 c4 = GG0 + GP0c0
No carries are required to generate Group G & Group P We just need single-xor-gate-delay G & P signals ! Total Delay = 3 Gate Delays for GG/GP To generate carries just use Group G & Group P with 2 Gate Delays
c8 = GG1 + GP1c4 = GG1 + GP1GG0 + GP1GP0c0 c12 = GG2 + GP2c8 = GG2 + GP2GG1 + GP2GP1GG0 + GP2GP1GP0c0
Red part will constitute Ripple based Group CLA Black Part will result into CLA based GCLA
c16 = GG3 + GP3c12 = GG3 + GP3GG2 + GP3GP2GG1 + GP3GP2GP1GG0 + GP3GP2GP1GP0c0
16-Bit Group Carry Lookahead Adder

Each CLA has a longest path of 5 gate delays
In the GCLL section, GG and GP signals are generated in 3 gate delays; carry signals are generated in 2 more gate delays, resulting in 5 gate delays to generate the carry out of each GCLA group and 10 gates delays on the worst case path (which is s15 not c16).
FAN in / FAN out

23
In general, the maximum fan-in of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16. In comparison, the maximum fan-in for a 16-bit GCLA is five (for generating c16). The fan-outs for both cases are the same as the fan-ins.
Carry Select Adder

24
Partition the adder into K groups Two values of sum with cin (1 and 0) are precomputed for each adder group Actual sum is selected using a 2-to-1 MUX by the carry of the previous group Allows computation of possible results in parallel Requires internal carry for blocks, e.g. ripple
Carry Select Adder

25
Three partitions have been made of 4 bits each Outputs of each 4 bit adder block would be ready simultaneously including the Cout of the first adder
Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder
C0
S0 Cin = 1 4 - bit Adder
C0
C0
C1
2-to-1 Mux
S1 4-bit 2-to- 1 Mux
C1
2-to-1 Mux
S1 4-bit 2-to- 1 Mux
C1
2-to-1 Mux
S1 4-bit 2-to- 1 Mux Carry in
Cout[11]
Cout[7]
Cout[3]
SUM [11-8]
SUM [7-4]
SUM [3-0]
Non Uniform Group Carry Select Adder

26
Delay: Approx. 5RCA Delay + 2-to-1 Mux Delay
1 1 1 1 1 0 1 1 1 1 0 0 (a)
CSA: Example
27
1 1 1 1 1 1 0 1 0 0 1 1 (b)
1 1 1 1 0 0 0 0 1 1 1 1 (cin=0) 1 1 1 1 1 0 0 1 0 0 0 0 (cin=1)
1 1
11111
1 1
11110
0111
0 1
000 0001
100
11111
11111
1010
011
111
0010
11111
0001
111
28
If we keep on reducing the number of bits per adder we reach Conditional sum adder
Conditional sum adder

29
References & Further Reading

30
Ercegovacs Book: Digital Arithmetic 2004 Another Useful Link

http://www.aoki.ecei.tohoku.ac.jp/arith/mg/algori
thm.html

ADSD Fall2011 10 Adders

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ADSD Fall2011 10 Adders

Uploaded by

Copyright:

Available Formats

ADSD Fall 2011

Dr. Rehan Hafiz

Course Website for ADSD Fall 2011

Lectures: Contact: Office:

Signed/Unsigned Number Representation Sign Extension, Truncation, Fixed Point Addition

Carry Select Adders (CSA) Group CLAs Conditional Sum Adders

Basic Adder Review

tc = txor + tand + tor ts = 2txor Critical Path: Max(tc,ts)

Ripple Carry Adder

Assign {cout, sum}= a + b + c_in;

Carry Delays for a 4 bit RCA

So how can we optimize for

Remember -- High Throughput Pipelining using the Delay Transfer Theorem

Remember Area Effcient/Reusing Resources

Bit Serial Adder

Carry Shift reg A

Bit Serial Adder (Two adders)

Pipelined adder is great BUT

Way to Low Latency Adders

we really need to wait for Carries

Some observations - RCA

Carry Look Ahead Adder (CLA)

Carry Look Ahead Adder (CLA)

Sum and Cout can be re-expressed in terms of generate/propagate: Ci+1 = Gi + Pi Ci Si = Ci ^ Pi (^ =xor)

Parallel Look Ahead Generation of all carries

CLA look ahead eqs.

2-Gate -Delay c0 = 0 c1 = G0 c2 = G1 + P1c1 c3 = G2 + P2G1 + P2P1c1 c4 = G3 + P3G2 + P3P2G1 + P3P2P1c1

2-Gate Delay for a Full Adder

Carry Lookahead Adder

Fan IN/OUT Effects

Fundamentals of digital logic with Verilog design By Stephen D Brown

As n increases Fan IN/OUT becomes an issue Options

Group Carry Look-ahead Adder

c16 = GG3 + GP3c12 = GG3 + GP3GG2 + GP3GP2GG1 + GP3GP2GP1GG0 + GP3GP2GP1GP0c0

16-Bit Group Carry Lookahead Adder

FAN in / FAN out

Carry Select Adder

Carry Select Adder

S1 4-bit 2-to- 1 Mux

S1 4-bit 2-to- 1 Mux

S1 4-bit 2-to- 1 Mux Carry in

Non Uniform Group Carry Select Adder

Delay: Approx. 5RCA Delay + 2-to-1 Mux Delay

Conditional sum adder

References & Further Reading

Ercegovacs Book: Digital Arithmetic 2004 Another Useful Link

You might also like