Professional Documents
Culture Documents
Lecture # 10
<rehan.hafiz@seecs.edu.pk>
http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Ercegovacs Book: Digital Arithmetic 2004 5. Dr. Shoab A Khans CASE Lectures on Advanced Digital System Design
Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library
Lecture Overview
3
Last Lecture
This Lecture
Adders Ripple Carry Adder (RCA) Pipelined Adder Bit Serial Adder Fast Adders
Logic Equations : HA C = x y S = x y
Delay:
a[5] b[5]
cout FA
C4
FA C S
C3
FA C S
C2
FA C S
C1
FA C S
C0
FA C S cin
C S
c5
c4
c3
c2
c1
c0
S[5]
S[4]
S[3]
S[2]
S[1]
S[0]
RCA Characteristics
6
Implements the conventional way of adding two numbers Slowest parallel Adder / Takes minimum area N-bit full adders are required to add two N-bit operands Speed is linear with word length O(N)
4
Optimization..
7
Latency
FF
FA
clk
Shift reg B
clk 1 Load regB
Sum
1 clk
Shift reg C
Reg C
Load regs
Fast Adders
11
latency
Pre-compute Carries
OR we can at least
The ripple-carry adder introduces too much delay into a system. The longest path through the adder is from the inputs of the least significant full adder to the outputs of the most significant full adder. However
the
process of summing the inputs at each bit position is relatively fast (a small two-level circuit suffices)
Generate all incoming carries in advance Idea: A carry is either generated or propagated Carry at ith location depends on the carry & inputs at (i-1)th location & not on the previous sum
Pi = ai ^ bi Gi = ai bi
Look ahead Carries C1 = G0 + P0C0 C2 = G1 + P1C1 = G1 + P1(G0 + P0C0) = G1 + P1G0 + P0P1C0 C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0 C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0
1-Gate -Delay Pi = ai ^ bi Gi = ai bi
16
$ Plz. Correct gate notations * Gate delays assuming 1 gate delay for xor gate
Final Result
17
Each of the carry equations can be implemented with two-level logic All inputs are now directly derived from data inputs and not from intermediate carries this allows computation of all sum outputs to proceed in parallel
In general, the maximum fan-in/out of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16.
CLA
20
the carry across blocks(groups) of CLA adders of limited size Or we may again pre-compute in parallel Group Carry of each block
c8 = GG1 + GP1c4 = GG1 + GP1GG0 + GP1GP0c0 c12 = GG2 + GP2c8 = GG2 + GP2GG1 + GP2GP1GG0 + GP2GP1GP0c0
Red part will constitute Ripple based Group CLA Black Part will result into CLA based GCLA
In the GCLL section, GG and GP signals are generated in 3 gate delays; carry signals are generated in 2 more gate delays, resulting in 5 gate delays to generate the carry out of each GCLA group and 10 gates delays on the worst case path (which is s15 not c16).
In general, the maximum fan-in of any gate in an n-bit CLA is n. Thus, the maximum fan-in of any gate in a 16-bit CLA is 16. In comparison, the maximum fan-in for a 16-bit GCLA is five (for generating c16). The fan-outs for both cases are the same as the fan-ins.
Partition the adder into K groups Two values of sum with cin (1 and 0) are precomputed for each adder group Actual sum is selected using a 2-to-1 MUX by the carry of the previous group Allows computation of possible results in parallel Requires internal carry for blocks, e.g. ripple
Three partitions have been made of 4 bits each Outputs of each 4 bit adder block would be ready simultaneously including the Cout of the first adder
Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder Cin = 0 4 - bit Adder
C0
S0 Cin = 1 4 - bit Adder
C0
S0 Cin = 1 4 - bit Adder
C0
S0 Cin = 1 4 - bit Adder
C1
2-to-1 Mux
C1
2-to-1 Mux
C1
2-to-1 Mux
Cout[11]
Cout[7]
Cout[3]
SUM [11-8]
SUM [7-4]
SUM [3-0]
1 1 1 1 1 0 1 1 1 1 0 0 (a)
CSA: Example
27
1 1 1 1 1 1 0 1 0 0 1 1 (b)
1 1 1 1 0 0 0 0 1 1 1 1 (cin=0) 1 1 1 1 1 0 0 1 0 0 0 0 (cin=1)
1 1
11111
1 1
11110
0111
0 1
000 0001
100
11111
11111
1010
011
111
0010
11111
0001
111
28
If we keep on reducing the number of bits per adder we reach Conditional sum adder
thm.html