You are on page 1of 24

Computer Architecture

Chapter 8
Multiprocessors
Shared Memory Architectures
Prof. Jerry Breecher
CSCI 240
Fall 2003

Chapter Overview
Were going to do only one section from this chapter, that part related
to how caches from multiple processors interact with each other.
8.1 Introduction the big picture
8.3 Centralized Shared Memory Architectures

Chap. 8 - Multiproces

The Big Picture: Where are


We Now?

Introduction
8.1 Introduction

The major issue is this:

8.3 Centralized Shared


Memory Architectures

Weve taken copies of the contents of main


memory and put them in caches closer to the
processors. But what happens to those
copies if someone else wants to use the main
memory data?
How do we keep all copies of the data in synch
with each other?

Chap. 8 - Multiproces

The Multiprocessor Picture


Processor/Memory
Bus

Example:
Pentium System
Organization
PCI Bus

I/O Busses

Chap. 8 - Multiproces

Shared Memory Multiprocessor


Processor

Processor

Processor

Processor

Registers

Registers

Registers

Registers

Caches

Caches

Caches

Caches

Chipset

Memory

Disk & other IO

Chap. 8 - Multiproces

Memory: centralized with


Uniform Memory Access
time (uma) and bus
interconnect, I/O
Examples: Sun Enterprise
6000, SGI Challenge, Intel
SystemPro

Shared Memory Multiprocessor

Several processors share one


address space
conceptually a shared memory
often implemented just like a
multicomputer
address space distributed
over private memories
Communication is implicit
read and write accesses to
shared memory locations
Synchronization
via shared memory locations
spin waiting for non-zero
barriers

P
Network/Bus
M

Conceptual Model

Chap. 8 - Multiproces

Message Passing
Multicomputers

Computers (nodes) connected by a network


Fast network interface
Send, receive, barrier
Nodes not different than regular PC or workstation
Cluster conventional workstations or PCs with fast network
cluster computing
Node
Berkley NOW
IBM SP2
P

Network

Chap. 8 - Multiproces

Large-Scale MP Designs
Memory: distributed with nonuniform memory access time (numa)
and scalable interconnect (distributed memory)

40 cycles

100 cycles

Low Latency
High Reliability

1 cycle

Chap. 8 - Multiproces

Shared Memory
Architectures
8.1 Introduction
8.3 Centralized Shared
Memory Architectures

In this section we will understand the


issues around:

Sharing one memory space among


several processors.
Maintaining coherence among several
copies of a data item.

Chap. 8 - Multiproces

The Problem of Cache Coherency

Shared Memory Architectures


CPU

CPU

CPU

Cache

Cache

Cache

100

550

100

200

200

200

Memory

Memory

Memory

100

100

100

200

200

440

I/O
a) Cache and memory
coherent: A = A, B = B.

I/O

I/O

Output of A gives 100

Input 440 to B

b) Cache and memory


incoherent: A ^= A.

Chap. 8 - Multiproces

c) Cache and memory


incoherent: B ^= B.

10

Shared Memory
Architectures

Some Simple Definitions

Mechanism

How It Works

Performance

Coherency Issues

Write Back

Write modified
data from cache
to memory only
when
necessary.

Good,
because
doesnt tie up
memory
bandwidth.

Can have problems


with various copies
containing different
values.

Write modified
data from cache
to memory
Write Through
immediately.

Not so good uses a lot of


memory
bandwidth.

Modified values
always written to
memory; data
always matches.

Chap. 8 - Multiproces

11

Shared Memory
Architectures

What Does Coherency Mean?

Informally:
Any read must return the most recent write
Too strict and too difficult to implement
Better:
Any write must eventually be seen by a read
All writes are seen in proper order (serialization)
Two rules to ensure this:
If P writes x and P1 reads it, Ps write will be seen by P1 if the
read and write are sufficiently far apart
Writes to a single location are serialized:
seen in one order
Latest write will be seen
Otherwise could see writes in illogical order
(could see older value after a newer value)

Chap. 8 - Multiproces

12

Shared Memory
Architectures

There are Different Types of


Memory In The Cache

What kinds of memory are there in the cache?

Test_and_set(lock)
shared_data = xyz;
Clear(lock);

TYPE

Shared?

Writable

How Kept Coherent

Code

Shared

No

No Need.

Private Data

Exclusive

Yes

Write Back

Shared Data

Shared

Yes

Write Back *

Interlock Data

Shared

Yes

Write Through **

* Write Back gives good performance, but if you use write through
here, there will be performance degradation.
** Write through here means the lock state is seen immediately.
You want a write through here to flush the cache.

Chap. 8 - Multiproces

13

Shared Memory
Architectures

Potential HW Coherency
Solutions

Snooping Solution (Snoopy Bus):


Send all requests for data to all processors
Processors snoop to see if they have a copy and respond accordingly
Requires broadcast, since caching information is at processors
Works well with bus (natural broadcast medium)
Dominates for small scale machines (most of the market)
Directory-Based Schemes
Keep track of what is being shared in one centralized place
Distributed memory => distributed directory for scalability
(avoids bottlenecks)
Send point-to-point requests to processors via network
Scales better than Snooping
Actually existed BEFORE Snooping-based schemes

Chap. 8 - Multiproces

14

Shared Memory
Architectures

An Example Snoopy Protocol


Maintained by Hardware

Invalidation protocol, write-back cache


Each block of memory is in one state:
Clean in all caches and up-to-date in memory (Shared)
OR Dirty in exactly one cache (Exclusive)
OR Not in any caches
Each cache block is in one state (track these):
Shared : block can be read
OR Exclusive : cache has only copy, its writeable, and dirty
OR Invalid : block contains no data
Read misses: cause all caches to snoop bus
Writes to clean line are treated as misses

Chap. 8 - Multiproces

15

Shared Memory
Architectures
State machine
for CPU requests
for each
cache block

Snoopy-Cache State Machine-I


CPU Read hit

Invalid

CPU Read
Place read miss
on bus

Applies to
Write Back
Data

Shared
(read/only)

CPU Write

Place Write
Miss on bus

CPU read miss


Write back block

CPU Read miss


Place read miss
on bus

CPU Write
Place Write Miss on Bus

Cache Block
State

Exclusive
(read/write)

CPU read hit


CPU write hit

CPU Write Miss


Write back cache block
Place write miss on bus

Chap. 8 - Multiproces

16

Shared Memory
Architectures

Snoopy-Cache State Machine-II

State machine
for bus requests
for each
cache block
Appendix E gives
details of bus requests

Invalid

Write miss
for this block

Write Back
Block; (abort
memory access)
Write miss
for this block
Exclusive
(read/write)

Shared
(read/only)

Write Back
Block; (abort
memory access)

Read miss
for this block

Chap. 8 - Multiproces

17

Shared Memory
Architectures

Example

Processor 1
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

P1
State

Bus

Processor 2

P2
Addr Value State

Bus
Addr Value Action Proc. Addr

Memory
Memory
Value Addr Value

P2: Write 20 to A1
P2: Write 40 to A2

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 A2

This is the
Cache for P1.

CPU Read hit

Remote Write
or Miss
Shared

Invalid
Read
miss on bus

Remote
Write
or Miss
Write Back

Write
miss on bus

Remote Read
Write Back

CPU Write
Place Write
Miss on Bus

Exclusive
CPU read hit
CPU write hit

Chap. 8 - Multiproces

18

Shared Memory
Architectures
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

P1
State
Excl.

P2
Addr Value State
A1
10

Example: Step 1
Bus
Addr Value Action Proc. Addr
WrMs
P1
A1

Memory
Value Addr Value

P2: Write 20 to A1
P2: Write 40 to A2

Invalid

Shared

Write
miss on bus

Exclusive

Chap. 8 - Multiproces

19

Shared Memory
Architectures
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

Example: Step 2

P1
P2
State Addr Value State
Excl.
A1
10
Excl.
A1
10

Bus
Addr Value Action Proc. Addr
WrMs
P1
A1

Memory
Value Addr Value

P2: Write 20 to A1
P2: Write 40 to A2

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 A2

Invalid

Shared

Exclusive

CPU read hit

Chap. 8 - Multiproces

20

Shared Memory
Architectures
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

Example: Step 3

P1
P2
Bus
State Addr Value State Addr Value Action Proc. Addr
Excl.
A1
10
WrMs
P1
A1
Excl.
A1
10
Shar.
A1
RdMs
P2
A1
Shar.
A1
10
WrBk
P1
A1
Shar.
A1
10
RdDa
P2
A1

Memory
Value Addr Value

10
10

A1

P2: Write 20 to A1
P2: Write 40 to A2

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 A2.

Shared

Invalid

Read
miss on bus

Remote Read
Write Back
Exclusive

Chap. 8 - Multiproces

21

10
10
10
10
10

Shared Memory
Architectures
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

P2: Write 20 to A1
P2: Write 40 to A2

Example: Step 4

P1
P2
Bus
State Addr Value State Addr Value Action Proc. Addr
Excl.
A1
10
WrMs
P1
A1
Excl.
A1
10
Shar.
A1
RdMs
P2
A1
Shar.
A1
10
WrBk
P1
A1
Shar.
A1
10
RdDa
P2
A1
Inv.
Excl.
A1
20 WrMs
P2
A1

Memory
Value Addr Value

10
10

A1

Remote Write

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 A2

Invalid

Shared

Exclusive

Chap. 8 - Multiproces

22

10
10
10
10
10

Shared Memory
Architectures
step
P1: Write 10 to A1
P1: Read A1
P2: Read A1

P2: Write 20 to A1
P2: Write 40 to A2

Example: Step 5

P1
P2
Bus
State Addr Value State Addr Value Action Proc. Addr
Excl.
A1
10
WrMs
P1
A1
Excl.
A1
10
Shar.
A1
RdMs
P2
A1
Shar.
A1
10
WrBk
P1
A1
Shar.
A1
10
RdDa
P2
A1
Inv.
Excl.
A1
20 WrMs
P2
A1
WrMs
P2
A2
Excl. A2
40
WrBk
P2
A1

Memory
Value Addr Value

10
10

A1

20

A1

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 A2

Chap. 8 - Multiproces

23

10
10
10
10
20

Summary
8.1 Introduction the big picture
8.3 Centralized Shared Memory Architectures
Weve looked at what happens to caches when we have multiple
processors or devices looking at memory.

Chap. 8 - Multiproces

24

You might also like