Graphics Processing Unit

GRAPHICS PROCESSING UNIT
PRESENTED BY LEKSHMI P A ROLL NO:19

2/2/2013 1
Presentation Overview
Definition Comparison with CPU Architecture GPU-CPU Interaction GPU Memory
2/2/2013
Why GPU?
To provide a separate dedicated graphics
resources including a graphics processor and memory. To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests.
2/2/2013
There comes
GPU
2/2/2013
What is a GPU?
A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics . Like the CPU (Central Processing Unit), it is a single-chip processor.
2/2/2013
HOWEVER,
The abstract goal of a GPU, is to enable
a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.
2/2/2013
GPU vs CPU
A GPU is tailored for highly parallel operation
while a CPU executes programs serially. For this reason, GPUs have many parallel execution units , while CPUs have few execution units . GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs. GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
2/2/2013 7
BRIEF HISTORY
First-Generation GPUs
Up to 1998; Nvidias TNT2, ATis Rage, and 3dfxs Voodoo3;DX6 feature set.
Second-Generation GPUs Third-Generation GPUs
1999 -2000; Nvidias GeForce256 and GeForce2, ATis Radeon7500, and S3s Savage3D; T&L; OpenGL and DX7;Configurable.
2001; GeForce3/4Ti, Radeon8500, MSs Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASM
Fourth-Generation GPUs
Fifth-Generation GPUs - GeForce 8X:DirectX10.
2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13 Process, 125M T/C, 200M T/S.
2/2/2013
GPU Architecture
How many processing units?
How many ALUs? Do you need a cache? What kind of memory?
2/2/2013 9
GPU Architecture
Lots.
How many ALUs? Do you need a cache? What kind of memory?

2/2/2013 10
GPU Architecture
Lots.
How many ALUs?

Hundreds.
Do you need a cache? What kind of memory?

2/2/2013 11
GPU Architecture
Lots.
How many ALUs?

Hundreds.
Do you need a cache?

Sort of.
What kind of memory?

2/2/2013 12
GPU Architecture
Lots.
How many ALUs?

Hundreds.
Do you need a cache?

Sort of.
What kind of memory?

very fast.
2/2/2013 13
The difference.
Without GPU
2/2/2013
With GPU
14
The GPU pipeline

The GPU receives geometry information
from the CPU as an input and provides a picture as an output Lets see how that happens
host interface
vertex processing
triangle setup
pixel processing
memory interface
2/2/2013
15
Details..
2/2/2013
16
Host Interface
The host interface is the communication bridge between the CPU and the GPU. It receives commands from the CPU and also pulls geometry information from system memory. It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) .
host interface vertex processing triangle setup pixel processing memory interface
2/2/2013
17
Vertex Processing
The vertex processing stage receives vertices
from the host interface in object space and outputs them in screen space This may be a simple linear transformation, or a complex operation involving morphing effects No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)
host interface
vertex processing
triangle setup
pixel processing
memory interface
2/2/2013
18
Triangle setup
In this stage geometry information becomes
raster information (screen space geometry is the input, pixels are the output) Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected
host interface
vertex processing
triangle setup
pixel processing
memory interface
2/2/2013
19
Triangle Setup (cont..)

A pixel is generated if and only if its center is inside
the triangle Every pixel generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle
2/2/2013
20
Pixel Processing
Each pixel provided by triangle setup is fed into
pixel processing as a set of attributes which are used to compute the final color for this pixel The computations taking place here include texture mapping and math operations
host interface
vertex processing
triangle setup
pixel processing
memory interface
2/2/2013
21
Memory Interface
Pixel colors provided by the previous stage are
written to the framebuffer Used to be the biggest bottleneck before pixel processing took over Before the final write occurs, some pixels are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size).
2/2/2013
22
Programmability in GPU pipeline

In current state of the art GPUs, vertex and
pixel processing are now programmable The programmer can write programs that are executed for every vertex as well as for every pixel This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications
2/2/2013
23
GPU Pipelined Architecture

(simplified view)
GPU
110010100100
C P U
Vertex Setup
Vertex Shader
Rasterizer
Pixel Shader
Frame buffer
Texture Storage + Filtering
Vertices
2/2/2013
Pixels
24
GPU Pipelined Architecture

(simplified view)
GPU
C P U
Vertex Setup
Vertex Shader
Rasterizer
Pixel Shader
Frame buffer
Texture Storage + Filtering
One unit can limit the speed of the pipeline

2/2/2013 25
CPU/GPU interaction
The CPU and GPU inside the PC work in
parallel with each other There are two threads going on, one for the CPU and one for the GPU, which communicate through a command buffer:
GPU reads commands from here
Pending GPU commands
CPU writes commands here

2/2/2013 26
CPU/GPU interaction (cont)

If this command buffer is drained empty,
we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isnt going to make your application faster! If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
2/2/2013 27
Synchronization issues
In the figure below, the CPU must not
overwrite the data in the yellow block until the GPU is done with the black command, which references that data:

2/2/2013
data
28
Inlining data One way to avoid these problems is to
inline all data to the command buffer and avoid references to separate data:
However, this is also bad for performance, since we may need to copy several Mbytes passing around a pointer
2/2/2013
29
GPU readbacks
The output of a GPU is a rendered image on the
screen, what will happen if the CPU tries to read it? GPU reads commands from here
Pending GPU commands
GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happens
2/2/2013 30
GPU readbacks (cont)

We lose all parallelism, since first the CPU
waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained) Both CPU and GPU performance take a nosedive Bottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)
2/2/2013 31
About GPU memory..
2/2/2013
32
Memory Hierarchy
CPU and GPU Memory Hierarchy
Disk
CPU Main Memory GPU Video Memory CPU Caches GPU Caches GPU Constant Registers
CPU Registers
2/2/2013
GPU Temporary Registers

33
Where is GPU Data Stored?

Vertex buffer Frame buffer Texture
Texture
Vertex Buffer
Vertex Processor
Rasterizer
Fragment Processor
Frame Buffer(s)
34
2/2/2013
CPU memory vs GPU memory

CPU
Registers Local Mem Global Mem
Read/write Read/write stack Read/write heap
GPU
Read/write None Read-only during computation. Write-only at end (to pre-computed address) None
Disk
2/2/2013
Read/write disk
35
It looks like..
2/2/2013
36
Some applications..
Computer generated holography using a
graphics processing unit Improve the performance of CAD tools. Computer graphics in games
2/2/2013
37
New..
NVIDIA's new graphics processing unit,
the GeForce 8X ULTRA, said to represent the very latest in visual effects technologies.
2/2/2013
38
THANK YOU
2/2/2013
39

Graphics Processing Unit

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Graphics Processing Unit

Uploaded by

Copyright:

Available Formats

GRAPHICS PROCESSING UNIT

PRESENTED BY LEKSHMI P A ROLL NO:19

Second-Generation GPUs Third-Generation GPUs

Fifth-Generation GPUs - GeForce 8X:DirectX10.

How many ALUs? Do you need a cache? What kind of memory?

How many ALUs?

Do you need a cache? What kind of memory?

How many ALUs?

Do you need a cache?

What kind of memory?

How many ALUs?

Do you need a cache?

What kind of memory?

The GPU pipeline

Triangle Setup (cont..)

Programmability in GPU pipeline

GPU Pipelined Architecture

Texture Storage + Filtering

GPU Pipelined Architecture

Texture Storage + Filtering

One unit can limit the speed of the pipeline

Pending GPU commands

CPU writes commands here

CPU/GPU interaction (cont)

GPU reads commands from here

CPU writes commands here

Inlining data One way to avoid these problems is to

CPU writes commands here

CPU writes commands here

GPU readbacks (cont)

About GPU memory..

GPU Temporary Registers

Where is GPU Data Stored?

CPU memory vs GPU memory

You might also like