Lecture 4

Lecture 4
Introduction to Digital Signal

Processors (DSPs)

Dr. Konstantinos Tatas
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
2
Outline/objectives
Identify the most important DSP processor
architecture features and how they relate
to DSP applications
Understand the types of code appropriate
for DSP implementation
3
What is a DSP?
A specialized microprocessor for real-
time DSP applications
Digital filtering (FIR and IIR)
FFT
Convolution, Matrix Multiplication etc
ADC DAC DSP
ANALOG
INPUT
ANALOG
OUTPUT
DIGITAL
INPUT
DIGITAL
OUTPUT
4
Hardware used in DSP
ASIC FPGA GPP DSP
Performance Very High High Medium Medium High
Flexibility Very low High High High
Power
consumption
Very low low Medium Low Medium
Development
Time
Long Medium Short Short
5
Common DSP features
Harvard architecture
Dedicated single-cycle Multiply-Accumulate
(MAC) instruction (hardware MAC units)
Single-Instruction Multiple Data (SIMD) Very
Large Instruction Word (VLIW) architecture
Pipelining
Saturation arithmetic
Zero overhead looping
Hardware circular addressing
Cache
DMA

6
Harvard Architecture
Physically separate
memories and paths
for instruction and
data

DATA
MEMORY
PROGRAM
MEMORY
CPU
7
Single-Cycle MAC unit
Multiplier
Adder
Register
a x
i i
a x
i i
a x
i-1 i-1
a x
i i
a x
i-1 i-1 +
(a x )
i i
i=0
n
Can compute a sum of n-
products in n cycles
8
Single Instruction - Multiple Data
(SIMD)
A technique for data-level parallelism by
employing a number of processing
elements working in parallel

9
Very Long Instruction Word (VLIW)
A technique for
instruction-level
parallelism by executing
instructions without
dependencies (known at
compile-time) in parallel
Example of a single
VLIW instruction:
F=a+b; c=e/g; d=x&y; w=z*h;
VLIW instruction
F=a+b c=e/g d=x&y w=z*h
PU
PU
PU
PU
a
b
F
c
d
w
e
g
x
y
z
h
10
CISC vs. RISC vs. VLIW

11
Pipelining
DSPs commonly feature deep pipelines
TMS320C6x processors have 3 pipeline stages
with a number of phases (cycles):
Fetch
Program Address Generate (PG)
Program Address Send (PS)
Program ready wait (PW)
Program receive (PR)
Decode
Dispatch (DP)
Decode (DC)
Execute
6 to 10 phases
12
Saturation Arithmetic
fixed range for operations like addition and
multiplication
normal overflow and underflow produce the
maximum and minimum allowed value,
respectively
Associativity and distributivity no longer apply
1 signed byte saturation arithmetic examples:
64 + 69 = 127
-127 5 = -128
(64 + 70) 25 = 122 64 + (70 -25) = 109
13
Examples
Perform the following operations using
one-byte saturation arithmetic
0x77 + 0x99 =
0x4*0x42=
0x3*0x51=
14
Zero Overhead Looping
Hardware support for loops with a
constant number of iterations using
hardware loop counters and loop buffers
No branching
No loop overhead
No pipeline stalls or branch prediction
No need for loop unrolling
15
Hardware Circular Addressing
A data structure
implementing a fixed
length queue of fixed size
objects where objects are
added to the head of the
queue while items are
removed from the tail of
the queue.
Requires at least 2
pointers (head and tail)
Extensively used in digital
filtering
y[n] = a0x[n]+a1x[n-1]++akx[n-k]
X[n]
X[n-1]
X[n-2]
X[n-3]
X[n]
X[n-1]
X[n-2]
X[n-3]
Head
Tail
Cycle1
Cycle2
16
Direct Memory Access (DMA)
The feature that allows peripherals to access
main memory without the intervention of the
CPU
Typically, the CPU initiates DMA transfer, does
other operations while the transfer is in
progress, and receives an interrupt from the
DMA controller once the operation is complete.
Can create cache coherency problems (the data
in the cache may be different from the data in
the external memory after DMA)
Requires a DMA controller

17
Cache memory
Separate instruction and data L1 caches
(Harvard architecture)
Cache coherence protocols required,
since most systems use DMA

18
DSP vs. Microcontroller
DSP
Harvard Architecture
VLIW/SIMD (parallel
execution units)
No bit level operations
Hardware MACs
DSP applications

Microcontroller
Mostly von Neumann
Architecture
Single execution unit
Flexible bit-level
operations
No hardware MACs
Control applications
19
Examples
Estimate how long will the following code
fragment take to execute on
A general purpose processor with 1 GHz operating
frequency, five-stage pipelining and 5 cycles required
for multiplication, 1 cycle for addition
A DSP running at 500 MHz, zero overhead looping
and 6 independent ALUs and 2 independent single-
cycle MAC units?

for (i=0; i<8; i++)
{
a[i] = 2*i + 3;
b[i] = 3*i + 5;
}
20
Review Questions
Which of the following code fragments is
appropriate for SIMD implementation?
a[0]=b[0]+c[0]; a[0]=b[0]&c[0];
a[2]=b[2]+c[2]; a[0]=b[0]%c[0];
a[4]=b[4]+c[4]; a[0]=b[0]+c[0];
a[6]=b[6]+c[6]; a[0]=b[0]/c[0];
Can the following instructions be merged into
one VLIW instruction? If not in how many?
a=b+c;
d=c/e;
f=d&a;
g=b%c;
21
Review Questions
Which of the following is not a typical DSP
feature?
Dedicated multiplier/MAC
Von Neumann memory architecture
Pipelining
Saturation arithmetic
Which implementation would you choose for
lowest power consumption?
ASIC
FPGA
General-Purpose Processor
DSP
22
Examples
How many VLIW instructions does the following program
fragment require if there two independent data paths
(a,b), with 3 ALUs and 1 MAC available in each and 8
instructions/word? How many cycles will it take to
execute if they are the first instructions in the program
and all instructions require 1 cycle, assuming the
pipelining architecture of slide 10 with 6 phases of
execution?
ADD a1,a2,a3 ;a3 = a1+a2
SUB b1,b3,b4 ;b4 = b1-b3
MUL a2,a3,a5 ;a5 = a2-a3
MUL b3,b4,b2 ;b2 = b3*b4
AND a7,a0,a1 ;a1 = a7 AND a0
MUL a3,a4,a5 ;a5 = a3*a4
OR a6,a3,a2 ;a2 = a6 OR a3

23
References
DR. Chassaing, DSP Applications using
C and the TMS320C6x DSK, Wiley, 2002
Texas Instruments, TMS320C64x
datasheets
Analog Devices, ADSP-21xx Processors

Lecture 4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4

Introduction to Digital Signal

You might also like