Professional Documents
Culture Documents
<rehan.hafiz@seecs.edu.pk>
http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts
Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library
This Lecture .
3
Micro
ASMs: Usually the 1st step towards algorithm to hardware mapping FSMs : More Controller oriented
Up/Down Counter
[CIL]
Implicit Coding
Up/Down Counter
6
Steps:
Swap
Check
Process
GCD -Algorithm
9
Steps:
A = 100, B= 60 B !=0
(s) (c)
Swap if req
Check if B != 0
Process
A = 40, B= 60
A = 60, B= 40 B !=0 A = 20, B= 40
(p)
(s) (c) (p)
A = 40, B= 20
B !=0 A = 20, B= 20 A = 20, B= 20 B !=0 A = 0, B= 20 A = 20, B= 0 B !=0
(s)
(c) (p) (s) (c) (p) (s) (c)
while ( !done ) begin if ( A < B ) begin swap = A; A = B; B = swap; end else if ( B != 0 ) A = A - B; else done = 1; end
Y = A; end
endmodule
6.884 Spring 2005 02/04/05 L02 Verilog 10
Reference Slides
Slides from MIT Course 6.375 Complex Digital Systems http://csg.csail.mit.edu/6.375/
Slides 11-46
6.884 Spring 2005 02/04/05 L02 Verilog 11
Summary
Define higher level block diagram Define its interface Decompose into smaller blocks if required Decompose into Datapath & Controller
Use
different modules to implement Data path & Controller Define their interface
13
Design Partitioning
14
Data path:
The pipe that carries the data from the input of the design to the output and performs the necessary operations on the data. ALUs, Storage Registers & logic for moving data Determines the sequence Congure the data path for various operations
Controller
Data path and control blocks should be partitioned into different modules.
Allows module re-use Controller updates without requiring to update the Datapath Datapath Critical Timing
15
Logic systems consist of two basic elements: Control logic consists of state machines (FSM) Datapath logic consists of functions like counters, arithmetic, multiplexers, decoders and memory (Wired Connected Datapaths)
16
Moore Machine
Mealy Machine
Output function only of present state May have more states Synchronous outputs No glitching One cycle delay Full cycle of stable output
Output function of both present states & input May have fewer states Asynchronous outputs If input glitches, so does output Output immediately available Output may not be stable long enough to be useful
[SHO]
The choice between Mealy and Moore machine implementations is usually the designers will. When some of the inputs are expected to glitch and outputs are required to be stable for one complete cycle MOORE is the best choice [SHO]
// This module implements FSM for the detection of four ones in a serial input stream of data
22
module fsm mealy( input input clk, //system clock reset, //system reset
input
output reg four_ones det //1-bit output to indicate 4 ones are detected or not );
// Internal Variables
reg [1:0] current _state, //4-bit current state register next _state; //4-bit next state register // State tags assigned using binary encoding
//State Assignment Block // This block implements the combination cloud of next state assignment logic always @(*) 23 begin case(current state) STATE 0 : begin if(data _in) begin //transition to next state next _state = STATE 1; four _ones _det = 1'b0; end else begin //retain same state next _state = STATE 0; four _ones _det = 1'b0; end End
STATE 1: begin if(data_ in) begin //transition to next state next _state = STATE 2; four _ones _det = 1'b0; end else begin //retain same state next state = STATE 1; four ones det = 1'b0; end end STATE 2 : begin if(data in) begin //transition to next state next state = STATE 3; four ones det = 1'b0; end else
begin //retain same state next state = STATE 2; four ones det = 1'b0; end end STATE 3 : begin if(data in) begin //transition to next state next state = STATE 0; four ones det = 1'b1; end else begin //retain same state next state = STATE 3; four ones det = 1'b0; end end endcase end
24
To make this machine MOORE; output should be a function of current_state not next_state
One Hot: Very light on resources. Infact, a sequence can be defined using a simple shift register
Binary-coded counter sequences often change multiple bits on one count transition. That can lead to decoding glitches. Gray codes ensure minimum glitches since just one bit changes
It is important to handle illegal states by checking whether more than one bit of the state register is 1.
Guidelines - Summary
27
Datapath and control parts have different design objects so keep in different blocks !
Datapath usually synthesized for better timing; controller synthesized to take minimum area.
Two always blocks are preferred, where one implements the sequential part that assigns the next state to the state register, and the second block implements the combinational logic that computes the next state The designer can include the output computations for Mealy or Moore machines, respectively, in the same combinational block. Alternatively, if the output is easy to compute, they can be computed separately in a continuous assignment outside the combinational procedural block.
State Encoding
Use meaningful tags using dene or parameter statements for all possible states.
Select the best encoding scheme
Detect a pair of 1's or 0's in the single bit input. That is, input will be a series of one's and zero's. If two one's or two zero's comes one after another, output should go high. Otherwise output should be low.
28
http://electrosofts.com/verilog/fsm.html
29
30
31
In hardwired state machine based designs, the controller is implemented as a Mealy or Moore nite state machine (FSM) Makes the design rigid What can we do if updates to algorithm or sequencing is expected ?
How
Idea
33
We DO NOT implement the logic for next state --- WE Simply store the outputs & next state for the current state in a memory --- Just like a lookup table The combinational logic is replaced by a sequence of control signals that are stored in program memory (PM)
The PM may be a read only (ROM) or random access (RAM). The address of the contents in the memory is determined by the current state and input to the FSM.
General Architecture
34
The designer evaluates all possible state transitions based on inputs and the current state and tabulates the outputs and next states as micro coding for PM. These values are placed in the PM such that the inputs and the current state provide the index or address to the PM.
Example (MEALY)
35
Verilog Code
36
The micro program memory is split into two parts Combinational logic I and logic II are replaced by PM I and PM II.
The input and the current state constitute the address for PM I.
The memory contents of PM I are lled to appropriately generate the next state according to the ASM chart. The width of PM I is equal to the size of the current state register, whereas its depth is number of bits for {input & current state} Only the current state acts as the address for PM II. The contents of PM II generate output signals for the datapath
Example
38
Many controller designs do not depend on the external inputs. May require a sequence of control signals
State machines also have jumps & may also have explicit jumps decided at runtime !!!
Controller should be capable of jumping to start generating control signals from a new address in the PM.
Make branching address part of micro-code ! Unconditional Branching
Load bit provides a programmable way of deciding if JUMP should be associated with a particular state
Branch_addr provides the address
Variations Loadable Counter based State Machines with Conditional Branch Support
41
Algorithms may require conditional Jump support as a result of for example some ALU operation
We increase the load bits to have a programmable way to test various options from the available status bits
Subroutine, needs to return to the next micro code instruction. So we need to store return address in a register.
The state machine saves the contents of micro PC in a special register called the subroutine return address (SRA) register.
Parity bits are some time added to check false conditions . Again this helps in keeping the datapath as much independent as possible Allows us to branch on both true and false states & its programmable
PC ADDr RET ADDr JMP ADDr PC Address (00) JMP Address & Load SRA on CALL (01) RET Address (10) Select SRA Address
PC ADDr
On CALL Write is enabled to save the RET address & the correct LIFO address is selected based upon the MUX value (simple increment is fine for STACK ADDRESSING)
Read_lifo_addr points to top of stack Write_lifo_addr points to top+1 of stack Assumed no error handling
Complete System !
48
State 1
State 2
State 3
State 4
Start Processing : Repeat State 5 6, 256 times Convolve filter with data at location x,y
State 5
State 6
x++, y++
End
State 7
State 1
State 2
State 3
State 3.5
State 4
State 5
State 6
State 7
stacks need the same global address logic controller !!! Why ?
Complete System !
LOOP & Subroutine Address Stack
55
WRITE Gets the new value from IN_BUS on the next available space DEL updates the read address for the OUT_BUS ERROR = Any Error condition, for example : DEL on Empty
60
Block based exhaustive Motion Estimation searches a block in the whole image & computes some similarity measure, e.g. Sum of Absolute Difference
Raster Scanning
[SHO]Fig 10.22
Sample Design
63
[SHO]
1.
2.
From where shall I start Follow a Top Down Hierarchical Model with iterative refinement Define the interface with the external world {other components and memory e.t.c. !}
1.
The way of your memory arrangement can be tricky but again we identify incrementally
3.
Define major functional blocks & reiterate Step 1-3 for each of them until you constitute your complete data path
Lets assume we wish to have a micro-coded design. We wish to have flexibility to change the raster scan direction !!! The FUN Part : Lets start the design right now Divide & Conquer
[SHO]Fig 10.22
Considerations: Describe what your block shall do Shall read an image and a reference block, both from memory; & shall raster scan the target image completely and report the x,y for lowest computed SAD. Define its I/O & Draw the block diagram ! Any particular specs Customer want it programmable and may change rater style & starting position in future ! Start studying your Algorithm to go further deep in the design. Requires Four nested Loops so you need a nice looking controller with loop support ! Shall require some ALU to the real data crunching ! Requires Register file to store data read from the memory RASTER MACHINE Need a lot of Address Logic to generate the right logic depending upon the current state ! Need to store tx,ty
Tx & Ty Register
Target & REF Register File Addressing controlled by Address Generator (above)
Address Generator for ALU for the processing state To : ALU From: Reg File
Block Address Generator (BAG) For generating addresses for memory access during initial loading Needs to keep care for Row Major Addressing Input : From TAG Output : To Reg File & Memory
67 Address Generator for :Tx,Ty ,ALU, RAMs, Register File Inputs :Current State, Tx, Ty,
Target RAM (Single Ported)
ALU Performing SAD on each cycle & on storing the corresponding tx & ty with minimum SAD
a e i m
Row Major (Row * C)+Col 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
b f j n
c g k o
d h l p
0 0 0 0 1 How can you implement for a square image 1 - A Row Major to Linear Address Mapper 1 - A Linear to Row-Major Mapper 1 2 2 2 2 Solution : 3 3 Concatenation & De-Concatenation ! 3 3
C = Number of Columns Suppose your loop is over i,j ; where i is the loop index for current row and j is the loop index for current column
Linear Address Add [0000] [0001] [0010] [0011] [0100] [0101] [0110] [0111] [1000] [1001] [1010] [1011] [1100] [1101] [1110] [1111] Data a b c d e f g h i j k l m n o p
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
i = tx J = ty
69
j = 0: (255- (N-1)
Need to get data from RAM assuming Row Major Order Shall need a Row Major to Linear Converter if required Once the blocks are loaded it requires a simple one-to-one mapping (address generation) for ALU (SAD Block) ! N = elements per row or col assuming a square block !
For k = 0: (N-1)
For l = 0 : (N-1)
End
End
Raster Algo !
70
For k = 0: (N-1) For l = 0 : (N-1) SAD(k,l) = |S(k,l)-R(k,l)| SAD(i,j) = SAD(i,j) + SAD(k,l) End
End tx = tx +1
For j = (255- (N-1):0
For k = 0: (N-1) For l = 0 : (N-1) SAD(k,l) = S(k,l)-R(k,l) SAD(i,j) = SAD(i,j) + SAD(k,l) End
End
End
Rastering efficiently !
71
1 2
5 3
3 7
7 5
1 2
3
4 5
7
3 1
4
5 6
3
2 1
3
4 5
Tx & Ty Register
Target & REF Register File Addressing controlled by Address Generator (above)
Address Generator for ALU for the processing state To : ALU From: Reg File
Block Address Generator (BAG) For generating addresses for memory access during initial loading Needs to keep care for Row Major Addressing Input : From TAG Output : To Reg File & Memory
72 Address Generator for :Tx,Ty ,ALU, RAMs, Register File Inputs :Current State, Tx, Ty,
Target RAM (Single Ported)
ALU Performing SAD on each cycle & on storing the corresponding tx & ty with minimum SAD
73
ALU-In Depth
74
Instruction/State Reset Set tx Set ty RASTER Lp InitBlk Lp R 76 Lp C Lc+Pr Pr Pr_dne Update_ty SHIFT LpC_Dne RASTER Update_tx Load R/C SHIFT
State S0
Value
Loop
Start
End
0 0 RIGHT S1 S2 S3 S4 S5 Block size (256)/2-8 256-8 = c size b c size Lp InitBlk Lp C Lc+Pr Lc+Pr Lp InitBlk LpR_Dne LpC_Dne Lc+Pr
Initialize tx (Starting x co-ordinate) Initialize ty (Starting y co-ordinate) Tell processor you are traversing right initially Load initial Blocks for REF & TARGET. Will take clks equal to the number of elements Run till State Lp R Dne equal to half of number of rows Run till State Lp C Dne equal to number of columns for each row traversed in RIGHT Direction Process & Load RIGHT/LEFT Coulmn Due to RASTER value Process only
Pr
Pr
Store Result
Update ty based upon previous RASTER Direction LEFT S7 DOWN Shift Left Done with one row --- (over all the coulmns) Block needs to move down ! As defined by previous RASTER ! = c size UP Lc+Pr Lc+Pr Load Row Due to RASTER value
RASTER
Lp C Lc+Pr Pr Pr_dne Update_ty SHIFT S3 S4 S5
LEFT
RIGHT
LpC_Dne
RASTER Update_tx Load R/C SHIFT RASTER Lp R_Dne
S7
DOWN
= c size UP RIGHT
Lc+Pr
Lc+Pr
77
78
Micro Architecture Documenting Case Study: RISC-SPM (A mini RISC Stored Program Machine)
<Legacy Slides from last year !>
Partitioning of functions into blocks, clock/reset requirements, pipelining of registers, memory buffers, state machines and interface details.
Part 1 Block name, Owner, Version control Part 2 Overview Part 3 Functional/Requirement Specifications
Operation
Part 4 Detailed Functional description of key circuitry with drawings Part 5 Verification list of assertions, formal verification rules, etc. Part 6 Comments
Micro-Architecture Template
Version Control
Modification Author/s Date Remarks
Version
1.0 2.0
Initial Draft
Ossama
Micro-Architecture Template
Part 2 Overview
Describe what you block is supposed to do A mini RISC Stored Program Machine that performs basic arithmetic . Give enough information for people to recognize the functionality in a glance Should List
Abbreviations References
Part 3 Functional/Requirement Specification What are the functional demands / requirements / constraints of your block
Examples:
mini RISC SPM should operate at 2.5 GHz Interface with the external world
Interfacing
The
Instruction Set
84
Int
Processor
Controller
Rst Clk
Memory
If your block is top level you can go gradually to lower levels Block diagram/ Macro-Architecture
Highlighting
signals
Draw Control path and Data path for each ground level block For each & every block specify:
Overview,
Interfacing Signals,
Processor
Controller
Rst Clk
Memory
Further add the functionality Show how your block is structured Dont necessarily draw every wire rather a qualitative approach All interface signals should be present on your drawing. Show all storage elements/registers/pipeline stages
Processor
Register File
Controller
ALU
Rst Clk
Memory
(b) Datapath
93
(c) Controller
94
95
96
97
98
99
100
101
Part 5 Verification
Describe the rules for the correct behaviour of your block Take your time and describe rules
Example:
Part 1 Block name, Owner, Version control Part 2 Overview Part 3 Functional/Requirement Specifications
Operation
Part 5 Verification list of assertions, formal verification rules, etc. Part 6 Comments