You are on page 1of 103

Lecture # 04

Dr. Rehan Hafiz

<rehan.hafiz@seecs.edu.pk>

Course Website for ADSD Fall 2011


2

http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts

Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Lectures: Contact: Office:

Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library

This Lecture .
3

ASM Algorithmic State Machine Understanding Design Partition Controllers


FSM

Finite State Machines Programmed

Mealy & Moore

Micro

Algorithm State Machine


4

ASMs: Usually the 1st step towards algorithm to hardware mapping FSMs : More Controller oriented

ASM- Algorithm State Machine Example


5

Up/Down Counter

[CIL]

Implicit Coding

Up/Down Counter
6

Understanding Design Partitioning


Systematically Porting an Algorithm to H/W

Greatest Common Divisor


8

Steps:

Swap

Check
Process

Slides from MIT Course 6.375 Complex Digital Systems http://csg.csail.mit.edu/6.375/

GCD -Algorithm
9

Steps:

A = 100, B= 60 B !=0

(s) (c)

Swap if req

Check if B != 0
Process

A = 40, B= 60
A = 60, B= 40 B !=0 A = 20, B= 40

(p)
(s) (c) (p)

A = 40, B= 20
B !=0 A = 20, B= 20 A = 20, B= 20 B !=0 A = 0, B= 20 A = 20, B= 0 B !=0

(s)
(c) (p) (s) (c) (p) (s) (c)

GCD Behavioral Example


module gcd_behavioral #( parameter width = 16 ) ( input [width-1:0] A_in, B_in, output [width-1:0] Y ); reg [width-1:0] A, B, Y, swap; integer done; always @( A_in or B_in ) begin done = 0; A = A_in; B = B_in;

while ( !done ) begin if ( A < B ) begin swap = A; A = B; B = swap; end else if ( B != 0 ) A = A - B; else done = 1; end
Y = A; end

We start by identifying DATA Processing Elements & Controlling Signals !

endmodule
6.884 Spring 2005 02/04/05 L02 Verilog 10

Reference Slides
Slides from MIT Course 6.375 Complex Digital Systems http://csg.csail.mit.edu/6.375/

Slides 11-46
6.884 Spring 2005 02/04/05 L02 Verilog 11

Summary

Define higher level block diagram Define its interface Decompose into smaller blocks if required Decompose into Datapath & Controller
Use

different modules to implement Data path & Controller Define their interface

Connect them in higher level block

13

Controller Vs. Data-path Partitioning

Design Partitioning
14

Data path:

The pipe that carries the data from the input of the design to the output and performs the necessary operations on the data. ALUs, Storage Registers & logic for moving data Determines the sequence Congure the data path for various operations

Controller

Data path and control blocks should be partitioned into different modules.

Allows module re-use Controller updates without requiring to update the Datapath Datapath Critical Timing

Allows dedicated floor planning for Datapath Logic

15

2002 Dr. James P. Davis

Logic systems consist of two basic elements: Control logic consists of state machines (FSM) Datapath logic consists of functions like counters, arithmetic, multiplexers, decoders and memory (Wired Connected Datapaths)
16

Finite State Machines


Moore Vs. Mealy Machine
17

Moore Machine

Mealy Machine

Output function only of present state May have more states Synchronous outputs No glitching One cycle delay Full cycle of stable output

Output function of both present states & input May have fewer states Asynchronous outputs If input glitches, so does output Output immediately available Output may not be stable long enough to be useful

ASMs Moore Machine: No Oval, No Conditional Output List


18

Example: Output a ONE after detecting FOUR 1s in a binary sequence


19

State Transition Graph How shall be its Moore equivalent

[SHO]

ASM Mealy Vs. Moore


20

Architectures of Mealy & Moore Machines !!


21

The choice between Mealy and Moore machine implementations is usually the designers will. When some of the inputs are expected to glitch and outputs are required to be stable for one complete cycle MOORE is the best choice [SHO]

// This module implements FSM for the detection of four ones in a serial input stream of data
22

// Always block for State Assignment

always @(posedge clk)


begin if(reset) current _state < STATE 0; else current _state < next _state; end endmodule

module fsm mealy( input input clk, //system clock reset, //system reset

input

data in, //1-bit input stream

output reg four_ones det //1-bit output to indicate 4 ones are detected or not );

// Internal Variables
reg [1:0] current _state, //4-bit current state register next _state; //4-bit next state register // State tags assigned using binary encoding

parameter STATE _0 = 2'b00,


STATE _1 = 2'b01, STATE _2 = 2'b10, STATE _3 = 2'b11;

//State Assignment Block // This block implements the combination cloud of next state assignment logic always @(*) 23 begin case(current state) STATE 0 : begin if(data _in) begin //transition to next state next _state = STATE 1; four _ones _det = 1'b0; end else begin //retain same state next _state = STATE 0; four _ones _det = 1'b0; end End

STATE 1: begin if(data_ in) begin //transition to next state next _state = STATE 2; four _ones _det = 1'b0; end else begin //retain same state next state = STATE 1; four ones det = 1'b0; end end STATE 2 : begin if(data in) begin //transition to next state next state = STATE 3; four ones det = 1'b0; end else

begin //retain same state next state = STATE 2; four ones det = 1'b0; end end STATE 3 : begin if(data in) begin //transition to next state next state = STATE 0; four ones det = 1'b1; end else begin //retain same state next state = STATE 3; four ones det = 1'b0; end end endcase end

24

To make this machine MOORE; output should be a function of current_state not next_state

State Encoding Schemes


25

One Hot: Very light on resources. Infact, a sequence can be defined using a simple shift register

Binary-coded counter sequences often change multiple bits on one count transition. That can lead to decoding glitches. Gray codes ensure minimum glitches since just one bit changes

Need to keep care of Illegal States with One Hot Encoding


26

It is important to handle illegal states by checking whether more than one bit of the state register is 1.

Guidelines - Summary
27

Design Partitioning in Datapath and Controller


Datapath and control parts have different design objects so keep in different blocks !

Datapath usually synthesized for better timing; controller synthesized to take minimum area.
Two always blocks are preferred, where one implements the sequential part that assigns the next state to the state register, and the second block implements the combinational logic that computes the next state The designer can include the output computations for Mealy or Moore machines, respectively, in the same combinational block. Alternatively, if the output is easy to compute, they can be computed separately in a continuous assignment outside the combinational procedural block.

FSM Coding in Procedural Blocks

State Encoding

Use meaningful tags using dene or parameter statements for all possible states.
Select the best encoding scheme

Detect a pair of 1's or 0's in the single bit input. That is, input will be a series of one's and zero's. If two one's or two zero's comes one after another, output should go high. Otherwise output should be low.
28

http://electrosofts.com/verilog/fsm.html

29

30

31

Micro-programmed State Machines

Micro-Programmed State Machines


32

In hardwired state machine based designs, the controller is implemented as a Mealy or Moore nite state machine (FSM) Makes the design rigid What can we do if updates to algorithm or sequencing is expected ?

Make the controller programmable

How

Idea
33

We DO NOT implement the logic for next state --- WE Simply store the outputs & next state for the current state in a memory --- Just like a lookup table The combinational logic is replaced by a sequence of control signals that are stored in program memory (PM)

The PM may be a read only (ROM) or random access (RAM). The address of the contents in the memory is determined by the current state and input to the FSM.

General Architecture
34

The designer evaluates all possible state transitions based on inputs and the current state and tabulates the outputs and next states as micro coding for PM. These values are placed in the PM such that the inputs and the current state provide the index or address to the PM.

Example (MEALY)
35

Verilog Code
36

Micro Programmed MOORE


37

The micro program memory is split into two parts Combinational logic I and logic II are replaced by PM I and PM II.

The input and the current state constitute the address for PM I.
The memory contents of PM I are lled to appropriately generate the next state according to the ASM chart. The width of PM I is equal to the size of the current state register, whereas its depth is number of bits for {input & current state} Only the current state acts as the address for PM II. The contents of PM II generate output signals for the datapath

Example
38

Variations: Counter based State Machines


39

Many controller designs do not depend on the external inputs. May require a sequence of control signals

To read a value, the design only needs to generate addresses to the PM


Simply Use Counters !! Remember the difference b/w micro-processor and these microprogrammed state machines for upcomg slides

Variations : Adding Jumps Loadable Counter based State Machines


40

State machines also have jumps & may also have explicit jumps decided at runtime !!!

Controller should be capable of jumping to start generating control signals from a new address in the PM.
Make branching address part of micro-code ! Unconditional Branching
Load bit provides a programmable way of deciding if JUMP should be associated with a particular state
Branch_addr provides the address

Variations Loadable Counter based State Machines with Conditional Branch Support
41

Algorithms may require conditional Jump support as a result of for example some ALU operation

Some sort of Status and Control register (SCR) may be sued

Good Idea to have a centralized Status Register in your controller

Not all status signals are always useful

We increase the load bits to have a programmable way to test various options from the available status bits

Example Design Scenario


42

Variation : Register-based Controllers


43

Similar to PC (Program Counter Approach)

Adding Subroutine Support


44

Subroutine, needs to return to the next micro code instruction. So we need to store return address in a register.

The state machine saves the contents of micro PC in a special register called the subroutine return address (SRA) register.

Load SRA on CALL to subroutine


45

Automatically updates the next PC

Parity bits are some time added to check false conditions . Again this helps in keeping the datapath as much independent as possible Allows us to branch on both true and false states & its programmable

PC ADDr RET ADDr JMP ADDr PC Address (00) JMP Address & Load SRA on CALL (01) RET Address (10) Select SRA Address

Adding Nested Sub Routine Support


46

Add a STACK !! Level of nesting ??

PC ADDr

RET ADDr JMP ADDr

Logic for Subroutine Address Stack


47

On CALL Write is enabled to save the RET address & the correct LIFO address is selected based upon the MUX value (simple increment is fine for STACK ADDRESSING)

Read_lifo_addr points to top of stack Write_lifo_addr points to top+1 of stack Assumed no error handling

Complete System !
48

LOOPs in State Machines Example : Filtering !


49

State 1

Reset Wait for Data Wait for Complete Data Packet

State 2

What if you want to apply a cascaded filter

State 3

State 4

Start Processing : Repeat State 5 6, 256 times Convolve filter with data at location x,y

State 5

State 6

x++, y++
End

State 7

LOOPs in State Machines


50

State 1

Reset Wait for Data Wait for Complete Data Packet

State 2

State 3

State 3.5

Start Filtering : Repeat State 4, 2 times (For two filters)


Start Processing : Repeat State 5 6, 256 times Convolve filter with data at location x,y x++, y++ End Need Nested LOOP Support ! Imagine doing this for a Hard Wired State Machine !

State 4

State 5

State 6

State 7

Adding LOOP Support


51

Consider a LOOP instruction Need a counter now

Loop counter loads the value on loop command

End address in a loop instance reached Why need this ?

LOOP Ended Why need this ?

Adding NESTED LOOP Support


52

Add STACKs to your architecture ! Good thing :


All

stacks need the same global address logic controller !!! Why ?

Adding NESTED LOOP Support


53

Complete System !
LOOP & Subroutine Address Stack

55

Design Example I Microcoded Machine FIFO/LIFO

Example Design-1 LIFO/FIFO Architecture


56

A traditional four deep FIFO shall require 4 states Working:

WRITE Gets the new value from IN_BUS on the next available space DEL updates the read address for the OUT_BUS ERROR = Any Error condition, for example : DEL on Empty

Micro-Code for FIFO

Micro-Code for FIFO

Micro-Code for LIFO


59

60

Design Example-II Design for Block Based Estimation !

Example-2 Design for Block based Motion Estimation


61

Image Source: http://www-sipl.technion.ac.il/Info/News&Events_1_e.php?id=373

Block based exhaustive Motion Estimation searches a block in the whole image & computes some similarity measure, e.g. Sum of Absolute Difference

Example-2 Design for Block based Motion Estimation


62

Image Source: http://www-sipl.technion.ac.il/Info/News&Events_1_e.php?id=373

Raster Scanning

[SHO]Fig 10.22

Sample Design
63

[SHO]

System Design for a Complex System !!


64

1.

2.

From where shall I start Follow a Top Down Hierarchical Model with iterative refinement Define the interface with the external world {other components and memory e.t.c. !}
1.

The way of your memory arrangement can be tricky but again we identify incrementally

3.

Define major functional blocks & reiterate Step 1-3 for each of them until you constitute your complete data path

Consider Block based Motion Estimation


65

Lets assume we wish to have a micro-coded design. We wish to have flexibility to change the raster scan direction !!! The FUN Part : Lets start the design right now Divide & Conquer
[SHO]Fig 10.22

Developing a RASTER Machine !!!!! Consider Block based Motion Estimation


66

Considerations: Describe what your block shall do Shall read an image and a reference block, both from memory; & shall raster scan the target image completely and report the x,y for lowest computed SAD. Define its I/O & Draw the block diagram ! Any particular specs Customer want it programmable and may change rater style & starting position in future ! Start studying your Algorithm to go further deep in the design. Requires Four nested Loops so you need a nice looking controller with loop support ! Shall require some ALU to the real data crunching ! Requires Register file to store data read from the memory RASTER MACHINE Need a lot of Address Logic to generate the right logic depending upon the current state ! Need to store tx,ty

Tx & Ty Register

Target & REF Register File Addressing controlled by Address Generator (above)

Reference RAM (Single Ported)

Address Generator for ALU for the processing state To : ALU From: Reg File

Address Generator for Extra Column/Row (EAG)

Block Address Generator (BAG) For generating addresses for memory access during initial loading Needs to keep care for Row Major Addressing Input : From TAG Output : To Reg File & Memory

RASTER Control Controls the Address Generation Logic

Controller (Micro-Coded, Supporting Nested Loops)

67 Address Generator for :Tx,Ty ,ALU, RAMs, Register File Inputs :Current State, Tx, Ty,
Target RAM (Single Ported)

tX,tY Address Generator (TAG)

ALU Performing SAD on each cycle & on storing the corresponding tx & ty with minimum SAD

Row Major Addressing for Matrices


68

a e i m
Row Major (Row * C)+Col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

b f j n

c g k o

d h l p

0 0 0 0 1 How can you implement for a square image 1 - A Row Major to Linear Address Mapper 1 - A Linear to Row-Major Mapper 1 2 2 2 2 Solution : 3 3 Concatenation & De-Concatenation ! 3 3

C = Number of Columns Suppose your loop is over i,j ; where i is the loop index for current row and j is the loop index for current column

ith row Jth col

Linear Address Add [0000] [0001] [0010] [0011] [0100] [0101] [0110] [0111] [1000] [1001] [1010] [1011] [1100] [1101] [1110] [1111] Data a b c d e f g h i j k l m n o p

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

i = tx J = ty
69

For i = 0: (255- (N-1)


For

j = 0: (255- (N-1)

Need to get data from RAM assuming Row Major Order Shall need a Row Major to Linear Converter if required Once the blocks are loaded it requires a simple one-to-one mapping (address generation) for ALU (SAD Block) ! N = elements per row or col assuming a square block !

For k = 0: (N-1)
For l = 0 : (N-1)

SAD(k,l) = S(k,l)-R(k,l) SAD(i,j) = SAD(i,j) + SAD(k,l) End

End If (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

End

End

Raster Algo !

70

For i = 0:2: (255-(N-1))/2

For j = 0: (255- (N-1)

For k = 0: (N-1) For l = 0 : (N-1) SAD(k,l) = |S(k,l)-R(k,l)| SAD(i,j) = SAD(i,j) + SAD(k,l) End

End If (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

End tx = tx +1
For j = (255- (N-1):0

For k = 0: (N-1) For l = 0 : (N-1) SAD(k,l) = S(k,l)-R(k,l) SAD(i,j) = SAD(i,j) + SAD(k,l) End

End If (SAD(I,j) < Min_SAD ); Min_SAD = SAD(i,j)

End

End

Rastering efficiently !
71

1 2

5 3

3 7

7 5

1 2

3
4 5

7
3 1

4
5 6

3
2 1

3
4 5

Tx & Ty Register

Target & REF Register File Addressing controlled by Address Generator (above)

Reference RAM (Single Ported)

Address Generator for ALU for the processing state To : ALU From: Reg File

Address Generator for Extra Column/Row (EAG)

Block Address Generator (BAG) For generating addresses for memory access during initial loading Needs to keep care for Row Major Addressing Input : From TAG Output : To Reg File & Memory

RASTER Control Controls the Address Generation Logic

Controller (Micro-Coded, Supporting Nested Loops)

72 Address Generator for :Tx,Ty ,ALU, RAMs, Register File Inputs :Current State, Tx, Ty,
Target RAM (Single Ported)

tX,tY Address Generator (TAG)

ALU Performing SAD on each cycle & on storing the corresponding tx & ty with minimum SAD

73

ALU-In Depth
74

(Sample) Controller In-Depth


LOOP & Subroutine Address Stack

Instruction/State Reset Set tx Set ty RASTER Lp InitBlk Lp R 76 Lp C Lc+Pr Pr Pr_dne Update_ty SHIFT LpC_Dne RASTER Update_tx Load R/C SHIFT

State S0

Value

Loop

Start

End

Comments Reset Everything

0 0 RIGHT S1 S2 S3 S4 S5 Block size (256)/2-8 256-8 = c size b c size Lp InitBlk Lp C Lc+Pr Lc+Pr Lp InitBlk LpR_Dne LpC_Dne Lc+Pr

Initialize tx (Starting x co-ordinate) Initialize ty (Starting y co-ordinate) Tell processor you are traversing right initially Load initial Blocks for REF & TARGET. Will take clks equal to the number of elements Run till State Lp R Dne equal to half of number of rows Run till State Lp C Dne equal to number of columns for each row traversed in RIGHT Direction Process & Load RIGHT/LEFT Coulmn Due to RASTER value Process only

Pr

Pr

Store Result
Update ty based upon previous RASTER Direction LEFT S7 DOWN Shift Left Done with one row --- (over all the coulmns) Block needs to move down ! As defined by previous RASTER ! = c size UP Lc+Pr Lc+Pr Load Row Due to RASTER value

Update REG files !


This step can be avoided by adding a XORING to a predefined bit of Counter : Useful for RASTER ! 256-8 = c size b c size Lc+Pr Lc+Pr LpC_Dne Lc+Pr Process & Load LEFT Coulmn Due to RASTER value Process only Store Result Update ty based upon previous RASTER Direction

RASTER
Lp C Lc+Pr Pr Pr_dne Update_ty SHIFT S3 S4 S5

LEFT

RIGHT

Shift right - take the extra coulmn to the other end !

LpC_Dne
RASTER Update_tx Load R/C SHIFT RASTER Lp R_Dne

S7
DOWN

= c size UP RIGHT

Lc+Pr

Lc+Pr

Load Row Due to RASTER value

77

Designing your own microprocessor Datapath Vs. Control Logic Partitioning

78

Micro Architecture Documenting Case Study: RISC-SPM (A mini RISC Stored Program Machine)
<Legacy Slides from last year !>

Design Spec or Micro-Architecture

Partitioning of functions into blocks, clock/reset requirements, pipelining of registers, memory buffers, state machines and interface details.

Micro Architecture Documents Template


Part 1 Block name, Owner, Version control Part 2 Overview Part 3 Functional/Requirement Specifications
Operation

details, Interfacing signals, .

Part 4 Detailed Functional description of key circuitry with drawings Part 5 Verification list of assertions, formal verification rules, etc. Part 6 Comments

Micro-Architecture Template

Part 1 Block name, Owner, Version control

Block Name A mini RISC Stored Program Machine ,Dual Port


RAM

Version Control
Modification Author/s Date Remarks

Version

1.0 2.0

Initial Draft

Ossama

10th Aug,09 13th Aug,09 It was found that ..

Updated FSM for Saad Bulk Transfer, Page No

Micro-Architecture Template

Part 2 Overview

Describe what you block is supposed to do A mini RISC Stored Program Machine that performs basic arithmetic . Give enough information for people to recognize the functionality in a glance Should List
Abbreviations References

Micro Architecture Template

Part 3 Functional/Requirement Specification What are the functional demands / requirements / constraints of your block
Examples:

mini RISC SPM should operate at 2.5 GHz Interface with the external world
Interfacing

The

Signals, Any specific interface

Instruction Set
84

Interface with the external world


85

RISC SPM Rst Clk

Int

Micro Architecture Template

Part 3 Functional/Requirement Specification Interface Signal List


Every interface signal should be listed Dont forget comments:

for example if a system clock is gated low

Remember to fill in information which is helpful to the designers interfacing to you.

Micro Architecture Template

Part 4 Detailed Functional description

(a)Block Level Diagram Hirarchical (b) Datapath (c) Controller

Block Level Diagram


Identify You major Functional Blocks
88

Processor

Controller

Rst Clk

Memory

Micro Architecture Template

Part 4 Detailed Functional description a) Block level diagram

If your block is top level you can go gradually to lower levels Block diagram/ Macro-Architecture
Highlighting

the flow of data and control

signals

Draw Control path and Data path for each ground level block For each & every block specify:
Overview,

Interfacing Signals,

Block Level Diagram


Identify You major Functional Blocks
90

Processor

Controller

Rst Clk

Memory

Moving further down into design


91

Further add the functionality Show how your block is structured Dont necessarily draw every wire rather a qualitative approach All interface signals should be present on your drawing. Show all storage elements/registers/pipeline stages

Block Level Diagram


Identify You major Functional Blocks
92

Processor
Register File

Controller

ALU

Rst Clk

Instruction Reg. Program Counter

Memory

(b) Datapath
93

(c) Controller
94

Control Signals Generation Finite State Machines ASM Charts

95

96

97

98

99

100

101

(d) Timing waveforms of interfacing signals

E.g. Interfacing with external RAM


Status registers, Defined I/O ports etc

(e) Memory Map

Micro Architecture Template

Part 5 Verification

Describe the rules for the correct behaviour of your block Take your time and describe rules
Example:

2 cycles after signal A goes down, signal B should also go down.

Micro Architecture Document Summary


Part 1 Block name, Owner, Version control Part 2 Overview Part 3 Functional/Requirement Specifications
Operation

details/requirements, Interfacing signals

Part 4 Detailed Functional description


State

diagrams for Control & Data path & waveforms

Part 5 Verification list of assertions, formal verification rules, etc. Part 6 Comments

You might also like