You are on page 1of 17

Lecture # 06

Dr. Rehan Hafiz

<rehan.hafiz@seecs.edu.pk>

Course Website for ADSD Fall 2011


2

http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Some slides from : [ECEN 248 Dr Shi]

Lectures: Contact: Office:

Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library

1 2 3
3

Introduction Verilog+ Combinational Logic Verilog + Sequential Logic Synthesis in Verilog Micro-Architecture <Micro-Coded-Machines> Optimizing Speed Optimizing Area FIR Implementation CDC Issues Fixed-Point Arithmetic Adders Multipliers CORDIC Algorithmic Transformations for System Design Algorithmic Transformations Project Project

4 5 6 7 8 10 11 12 13 13 14

Outline & Introduction, Initial Assessment of students, Digital design methodology & design flow Combinational Logic Review + Verilog Introduction, Combinational Building Blocks in Verilog Sequential Common Structure in Verilog (LFSR /CRC+ Counters + RAMS), Sequential Logic in Verilog Synthesis of Blocking/Non-Blocking Statements Design Partitioning + RISC Microprocessor + Micro architecture Document Architecting Speed in Digital System Design: [Throughput, Latency, Timing] Architecting Area in Digital System Design: [Area Optimization] FIR Implementations + Pipelining & Parallelism in Non Recursive DFGs Cross-Clock Domain Issues & RESET circuits Arithmetic Operations: Review Fixed Point Representation Adders & Fast Adders Multi-Operand Addition Multiplication , Multiplication by Constants + BOOTH Multipliers CORDIC (sine, cosine, magnitude, division, etc), CORDIC in HW DFG representation of DSP Algorithms, Iteration Bound & Retiming

15
16 17

Unfolding Look ahead transformations Course Review & Project Presentations


Project Presentations

Optimizing Logic Area


4

Basic Idea : REUSE the Logic Resources


May

come at cost of speed/ throughput

Requires additional control circuitry to implement hardware reuse Reuse Factor


Let

Tsample = nTCLK ; n being an integer N is the reuse factor !

Rule
If

n = 1, one-to-one mapping is the only option. If n >1, there are opportunities to save hardware

Algorithm Mapping
5

n=1

n>1

recursive

Techniques for Optimizing Logic Area


6

Time Folding
Rolling

up the pipeline
Logic Reuse the required components

Function Multiplexing
Control based

Resource Sharing
Intelligently minimizing

<Technique-1>

Time Folding
7

Sharing logic resources that are repeated across pipeline stages Useful for recursive dataflow So how can we Roll-up ???

[STV]

Unfolding (Loop Unrolling) Vs. Folding (ROLLING-UP)


8

LOOP UNROLLING XPower = 1; XPower1 = X * 1; XPower2 = X * XPower1; XPower 3 = X * XPower1;

XPower = 1; for (i=0;i < 3; i++) XPower = X * XPower;

ROLLING-UP XPower = 1; <F> XPower = X * XPower; Feedback to <F> 3 times

[STV]

<Technique-1>

Rolling-Up Pipeline Example- 8 bit Multiplier


9

A multiplier may be architected with an accumulator that adds a shifted version of A depending on the bits of B
No

special control signals A counter to tell: when to stop the shift and add

Very compact multiplier but will requires 8 clocks to complete a multiplication.

a3 a3 a3b0 a3b1 a3b2 a3b3 a2b3 a2b2 a1b3 a2b1 a1b2 a0b0

a2 a2 a2b0 a1b1 a0b2 b3b0

a1 a1 a1b0 a0b1 b3b2 b2b0

a0 a0 a0b0 b3b1 b2b2 b1b0

b3 b3 b3b0 b2b1 b1b2 b0b0

b2 b2 b2b0 b1b1 b0b2

b1 b1 b1b0 b0b1

b0 b0 b0b0

a0a3
a0a3 a0a3 a0a3 a0a2 a0a2 a0a1 a0a2 a0a1 a0a0

a0a2
a0a1 a0a0 a0b3

a0a1
a0a0 a0b3 a0b2

a0a0
a0b3 a0b2 a0b1

a0b3
a0b2 a0b1 a0b0

a0b2
a0b1 a0b0

a0b1
a0b0

a0b0

B*B 2*A*B A*A

<Technique-2> Function Multiplexing CONTROL-BASED LOGIC REUSE


11

When there is no natural flow/sequence Need special control signals


To

determine which elements are input to the particular structure.

ALU is a good example as well

[STV]

12

[STV]

Area Optimized FIR


13

Can afford this design only when To = Ts/3

This is confusing

[STV]

Time Multiplexed Single MAC FIR


14

[REF-Required-to-be-added]

Shift Sample Memory ONLY on arrival of a new Sample During every cycle compute 1 product

<Technique-3> RESOURCE SHARING


15

higher-level architectural resource sharing Can be used whenever there are functional blocks that can be used in other areas of the design or even in different modules

<Technique-3> RESOURCE SHARING


16

100 MHz Clock 10 n sec 6ffH = 1791 (d) 55.8KHz

Any idea ?

55.8KHz

<Technique-3> RESOURCE SHARING


17

System Timer ?

55.8KHz

You might also like