ADSD Fall2011 06 Optimizing Area

Lecture # 06
Dr. Rehan Hafiz
<rehan.hafiz@seecs.edu.pk>
Course Website for ADSD Fall 2011

2
http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Some slides from : [ECEN 248 Dr Shi]
Lectures: Contact: Office:
Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library
1 2 3
3
Introduction Verilog+ Combinational Logic Verilog + Sequential Logic Synthesis in Verilog Micro-Architecture <Micro-Coded-Machines> Optimizing Speed Optimizing Area FIR Implementation CDC Issues Fixed-Point Arithmetic Adders Multipliers CORDIC Algorithmic Transformations for System Design Algorithmic Transformations Project Project
4 5 6 7 8 10 11 12 13 13 14
Outline & Introduction, Initial Assessment of students, Digital design methodology & design flow Combinational Logic Review + Verilog Introduction, Combinational Building Blocks in Verilog Sequential Common Structure in Verilog (LFSR /CRC+ Counters + RAMS), Sequential Logic in Verilog Synthesis of Blocking/Non-Blocking Statements Design Partitioning + RISC Microprocessor + Micro architecture Document Architecting Speed in Digital System Design: [Throughput, Latency, Timing] Architecting Area in Digital System Design: [Area Optimization] FIR Implementations + Pipelining & Parallelism in Non Recursive DFGs Cross-Clock Domain Issues & RESET circuits Arithmetic Operations: Review Fixed Point Representation Adders & Fast Adders Multi-Operand Addition Multiplication , Multiplication by Constants + BOOTH Multipliers CORDIC (sine, cosine, magnitude, division, etc), CORDIC in HW DFG representation of DSP Algorithms, Iteration Bound & Retiming
15
16 17
Unfolding Look ahead transformations Course Review & Project Presentations

Project Presentations
Optimizing Logic Area

4
Basic Idea : REUSE the Logic Resources

May
come at cost of speed/ throughput
Requires additional control circuitry to implement hardware reuse Reuse Factor

Let
Tsample = nTCLK ; n being an integer N is the reuse factor !
Rule
If
n = 1, one-to-one mapping is the only option. If n >1, there are opportunities to save hardware
Algorithm Mapping
5
n=1
n>1
recursive
Techniques for Optimizing Logic Area

6
Time Folding
Rolling
up the pipeline
Logic Reuse the required components
Function Multiplexing
Control based
Resource Sharing
Intelligently minimizing
<Technique-1>
Time Folding
7
Sharing logic resources that are repeated across pipeline stages Useful for recursive dataflow So how can we Roll-up ???
[STV]
Unfolding (Loop Unrolling) Vs. Folding (ROLLING-UP)

8
LOOP UNROLLING XPower = 1; XPower1 = X * 1; XPower2 = X * XPower1; XPower 3 = X * XPower1;
XPower = 1; for (i=0;i < 3; i++) XPower = X * XPower;
ROLLING-UP XPower = 1; <F> XPower = X * XPower; Feedback to <F> 3 times
[STV]
<Technique-1>
Rolling-Up Pipeline Example- 8 bit Multiplier

9
A multiplier may be architected with an accumulator that adds a shifted version of A depending on the bits of B
No
special control signals A counter to tell: when to stop the shift and add
Very compact multiplier but will requires 8 clocks to complete a multiplication.
a3 a3 a3b0 a3b1 a3b2 a3b3 a2b3 a2b2 a1b3 a2b1 a1b2 a0b0
a2 a2 a2b0 a1b1 a0b2 b3b0
a1 a1 a1b0 a0b1 b3b2 b2b0
a0 a0 a0b0 b3b1 b2b2 b1b0
b3 b3 b3b0 b2b1 b1b2 b0b0
b2 b2 b2b0 b1b1 b0b2
b1 b1 b1b0 b0b1
b0 b0 b0b0
a0a3
a0a3 a0a3 a0a3 a0a2 a0a2 a0a1 a0a2 a0a1 a0a0
a0a2
a0a1 a0a0 a0b3
a0a1
a0a0 a0b3 a0b2
a0a0
a0b3 a0b2 a0b1
a0b3
a0b2 a0b1 a0b0
a0b2
a0b1 a0b0
a0b1
a0b0
a0b0
B*B 2*A*B A*A
<Technique-2> Function Multiplexing CONTROL-BASED LOGIC REUSE

11
When there is no natural flow/sequence Need special control signals

To
determine which elements are input to the particular structure.
ALU is a good example as well
[STV]
12
[STV]
Area Optimized FIR

13
Can afford this design only when To = Ts/3
This is confusing
[STV]
Time Multiplexed Single MAC FIR

14
[REF-Required-to-be-added]
Shift Sample Memory ONLY on arrival of a new Sample During every cycle compute 1 product
<Technique-3> RESOURCE SHARING

15
higher-level architectural resource sharing Can be used whenever there are functional blocks that can be used in other areas of the design or even in different modules

16
100 MHz Clock 10 n sec 6ffH = 1791 (d) 55.8KHz
Any idea ?
55.8KHz

17
System Timer ?
55.8KHz

ADSD Fall2011 06 Optimizing Area

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ADSD Fall2011 06 Optimizing Area

Uploaded by

Copyright:

Available Formats

Lecture # 06

Dr. Rehan Hafiz

Course Website for ADSD Fall 2011

Lectures: Contact: Office:

Unfolding Look ahead transformations Course Review & Project Presentations

Optimizing Logic Area

Basic Idea : REUSE the Logic Resources

come at cost of speed/ throughput

Requires additional control circuitry to implement hardware reuse Reuse Factor

Tsample = nTCLK ; n being an integer N is the reuse factor !

Techniques for Optimizing Logic Area

Unfolding (Loop Unrolling) Vs. Folding (ROLLING-UP)

LOOP UNROLLING XPower = 1; XPower1 = X * 1; XPower2 = X * XPower1; XPower 3 = X * XPower1;

XPower = 1; for (i=0;i < 3; i++) XPower = X * XPower;

ROLLING-UP XPower = 1; <F> XPower = X * XPower; Feedback to <F> 3 times

Rolling-Up Pipeline Example- 8 bit Multiplier

Very compact multiplier but will requires 8 clocks to complete a multiplication.

a2 a2 a2b0 a1b1 a0b2 b3b0

a1 a1 a1b0 a0b1 b3b2 b2b0

a0 a0 a0b0 b3b1 b2b2 b1b0

b3 b3 b3b0 b2b1 b1b2 b0b0

b2 b2 b2b0 b1b1 b0b2

B*B 2*A*B A*A

<Technique-2> Function Multiplexing CONTROL-BASED LOGIC REUSE

When there is no natural flow/sequence Need special control signals

determine which elements are input to the particular structure.

ALU is a good example as well

Area Optimized FIR

Can afford this design only when To = Ts/3

Time Multiplexed Single MAC FIR

<Technique-3> RESOURCE SHARING

<Technique-3> RESOURCE SHARING

100 MHz Clock 10 n sec 6ffH = 1791 (d) 55.8KHz

<Technique-3> RESOURCE SHARING

You might also like

BB 2AB AA