The Architecture of Computer Hardware and Systems Software:: CPU and Memory: Design, Implementation, and Enhancement

Chapter 8
CPU and Memory:

Design, Implementation, and Enhancement
The Architecture of Computer

Hardware and Systems Software:
An Information Technology Approach
3rd Edition, Irv Englander
John Wiley and Sons 2003
Wilson Wong, Bentley College
Linda Senne, Bentley College
CPU Architecture Overview
CISC Complex Instruction Set Computer

RISC Reduced Instruction Set Computer
CISC vs. RISC Comparisons
VLIW Very Long Instruction Word
EPIC Explicitly Parallel Instruction
Computer
Chapter 8: CPU and Memory:
8-2
CISC Architecture
Examples
Intel x86, IBM Z-Series Mainframes, older
CPU architectures
Characteristics
Few general purpose registers
Many addressing modes
Large number of specialized, complex
instructions
Instructions are of varying sizes
8-3
Limitations of CISC Architecture

Complex instructions are infrequently
used by programmers and compilers
Memory references, loads and stores,
are slow and account for a significant
fraction of all instructions
Procedure and function calls are a
major bottleneck
Passing arguments
Storing and retrieving values in registers
8-4
RISC Features
Examples
Power PC, Sun Sparc, Motorola 68000
Limited and simple instruction set

Fixed length, fixed format instruction words
Enable pipelining, parallel fetches and executions
Limited addressing modes

Reduce complicated hardware
Register-oriented instruction set

Reduce memory accesses
Large bank of registers

Reduce memory accesses
Efficient procedure calls
8-5
CISC vs. RISC Processing
8-6
Circular Register Buffer
8-7
Circular Register Buffer

- After Procedure Call
8-8
CISC vs. RISC Performance

Comparison
RISC Simpler instructions
more instructions
more memory accesses
RISC more bus traffic and
increased cache memory misses
More registers would improve CISC
performance but no space available for them
Modern CISC and RISC architectures are
becoming similar
8-9
VLIW Architecture
Transmeta Crusoe CPU
128-bit instruction bundle = molecule
4 32-bit atoms (atom = instruction)
Parallel processing of 4 instructions
64 general purpose registers

Code morphing layer
Translates instructions written for other CPUs into
molecules
Instructions are not written directly for the Crusoe
CPU
8-10
EPIC Architecture
Intel Itanium CPU
128-bit instruction bundle
3 41-bit instructions
5 bits to identify type of instructions in bundle
128 64-bit general purpose registers

128 82-bit floating point registers
Intel X86 instruction set included
Programmers and compilers follow guidelines
to ensure parallel execution of instructions
8-11
Paging
Managed by the operating system
Built into the hardware
Independent of application
8-12
Logical vs. Physical Addresses

Logical addresses are relative
locations of data, instructions and
branch target and are separate from
physical addresses
Logical addresses mapped to physical
addresses
Physical addresses do not need to be
consecutive
8-13
Logical vs. Physical Address
8-14
Page Address Layout
8-15
Page Translation Process
8-16
Logical vs. Physical Addresses

Spazio fisico non deve essere uguale a
quello logico!!!
Piu grande spazio logico rispetto a
quello fisico!
Pagine caricate da disco in memoria
quando servono ..
Di piu nel cap 15
8-17
Memory Enhancements
Memory is slow compared to CPU processing
speeds!
2Ghz CPU = 1 cycle in of a billionth of a second
70ns DRAM = 1 access in 70 millionth of a second
Methods to improvement memory accesses

Wide Path Memory Access
Retrieve multiple bytes instead of 1 byte at a time
Memory Interleaving
Partition memory into subsections, each with its own

address register and data register
Cache Memory
8-18
Memory Interleaving
8-19
Why Cache?
Accesso in memoria spreca cicli di
clock!
Accesso cache 2 nanosecondi: 1 access in
2 millionth of a second
Guadagno velocita
Es:
Trovare tempo accesso alle memorio
Dram, ddram sdram, sram,
8-20
Cache Memory
Blocks: 8 or 16 bytes
Tags: location in main memory (memoria
fisica!)
Cache controller
hardware that checks tags
Cache Line
Unit of transfer between storage and cache memory
Hit Ratio: ratio of hits out of total requests

Synchronizing cache and memory
Write through
Write back
8-21
Step-by-Step Use of Cache
8-22
Step-by-Step Use of Cache
8-23
Performance Advantages
Hit ratios of 90% common
50%+ improved execution speed
Locality of reference is why caching works
Most memory references confined to small region of
memory at any given time
Well-written program in small loop, procedure or
function
Data likely in array
Variables stored together
8-24
Two-level Caches
Why do the sizes of the caches have to be
different?
8-25
Cache sui multiprocessor

Snoopy bus
8-26
E la cache del disco?

Cache per accesso al disco memoria
tra la ram (memoria principale) e il
disco gestita dal device.
Controller-cache
8-27
Cache vs. Virtual Memory

Cache speeds up memory access
Virtual memory increases amount of
perceived storage
independence from the configuration and
capacity of the memory system
low cost per bit
8-28
Modern CPU Processing Methods
Timing Issues
Separate Fetch/Execute Units
Pipelining
Scalar Processing
Superscalar Processing
8-29
Timing Issues
Computer clock used for timing purposes

MHz million steps per second
GHz billion steps per second
Instructions can (and often) take more than one step
Data word width can require multiple steps
8-30
Separate Fetch-Execute Units

Fetch Unit
Instruction fetch unit
Instruction decode unit
Determine opcode
Identify type of instruction and operands
Several instructions are fetched in parallel and

held in a buffer until decoded and executed
IP Instruction Pointer register
Execute Unit
Receives instructions from the decode unit
Appropriate execution unit services the instruction
8-31
Alternative CPU Organization
8-32
Instruction Pipelining
Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions
Only one instruction is being executed to completion at a time
Scalar processing
Average instruction execution is approximately equal to the clock speed of the CPU
Problems from stalling

Instructions have different numbers of steps
Problems from branching
8-33
Branch Problem Solutions

Separate pipelines for both possibilities
Probabilistic approach
Requiring the following instruction to not
be dependent on the branch
Instruction Reordering (superscalar
processing)
8-34
Pipelining Example
8-35
Superscalar Processing
Process more than one instruction per
clock cycle
Separate fetch and execute cycles as
much as possible
Buffers for fetch and decode phases
Parallel execution units
8-36
Superscalar CPU Block Diagram
8-37
Scalar vs. Superscalar Processing
8-38
Superscalar Issues
Out-of-order processing dependencies
(hazards)
Data dependencies
Branch (flow) dependencies and speculative
execution
Parallel speculative execution or branch
prediction
Branch History Table
Register access conflicts
Logical registers
8-39
Hardware Implementation
Hardware operations are
implemented by logic gates
Advantages
Speed
RISC designs are simple and typically
implemented in hardware
8-40
Microprogrammed Implementation
Microcode are tiny programs stored in
ROM that replace CPU instructions
Advantages
More flexible
Easier to implement complex instructions
Can emulate other CPUs
Disadvantage
Requires more clock cycles
8-41
Copyright 2003 John Wiley & Sons

All rights reserved. Reproduction or translation of this
work beyond that permitted in Section 117 of the 1976
United States Copyright Act without express permission
of the copyright owner is unlawful. Request for further
information should be addressed to the permissions
Department, John Wiley & Songs, Inc. The purchaser
may make back-up copies for his/her own use only and
not for distribution or resale. The Publisher assumes no
responsibility for errors, omissions, or damages caused
by the use of these programs or from the use of the
information contained herein.
8-42

The Architecture of Computer Hardware and Systems Software:: CPU and Memory: Design, Implementation, and Enhancement

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Architecture of Computer Hardware and Systems Software:: CPU and Memory: Design, Implementation, and Enhancement

Uploaded by

Copyright:

Available Formats

Chapter 8

CPU and Memory:

The Architecture of Computer

CPU Architecture Overview

CISC Complex Instruction Set Computer

Chapter 8: CPU and Memory:

Limitations of CISC Architecture

Limited and simple instruction set

Limited addressing modes

Register-oriented instruction set

Large bank of registers

Chapter 8: CPU and Memory:

CISC vs. RISC Processing

Chapter 8: CPU and Memory:

Circular Register Buffer

Chapter 8: CPU and Memory:

Circular Register Buffer

Chapter 8: CPU and Memory:

CISC vs. RISC Performance

64 general purpose registers

128 64-bit general purpose registers

Chapter 8: CPU and Memory:

Chapter 8: CPU and Memory:

Logical vs. Physical Addresses

Logical vs. Physical Address

Chapter 8: CPU and Memory:

Page Address Layout

Chapter 8: CPU and Memory:

Page Translation Process

Chapter 8: CPU and Memory:

Logical vs. Physical Addresses

Chapter 8: CPU and Memory:

Methods to improvement memory accesses

Retrieve multiple bytes instead of 1 byte at a time

Partition memory into subsections, each with its own

Chapter 8: CPU and Memory:

Hit Ratio: ratio of hits out of total requests

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:

Chapter 8: CPU and Memory:

Cache sui multiprocessor

Chapter 8: CPU and Memory:

E la cache del disco?

Chapter 8: CPU and Memory:

Cache vs. Virtual Memory

Chapter 8: CPU and Memory:

Modern CPU Processing Methods

Chapter 8: CPU and Memory:

Computer clock used for timing purposes

Chapter 8: CPU and Memory:

Separate Fetch-Execute Units

Several instructions are fetched in parallel and

Alternative CPU Organization

Chapter 8: CPU and Memory:

Problems from stalling

Problems from branching

Chapter 8: CPU and Memory:

Branch Problem Solutions

Chapter 8: CPU and Memory:

Chapter 8: CPU and Memory: