You are on page 1of 17

User-Level Interprocess Communication

for Shared Memory Multiprocessors


By Brian N. Bershad, Thomas E.
Anderson, Edward D.
Lazowska, and Henry M. Levy.
Presented by Damon Tyman
(all figures/tables from the paper)
The progression
• LRPC (Lightweight Remote Procedure Call; Feb 1990)
▫ Optimize RPC for performance in “common case”
cross-domain (same-machine) communication.
• URPC (User-Level Remote Procedure Call; May 1991)
▫ Eliminate the involvement of the kernel in cross-address
space communication and thread management, utilizing
shared memory and user-level threads.
Introduction
• Efficient interprocess communication encourages system decomposition
across address space boundaries.
• Advantages of decomposed systems:
▫ Failure isolation
▫ Extensibility
▫ Modularity
• Unfortunately, when cross-address space communication is slow, these
advantages are usually traded for better system performance.
Kernel-level interprocess communication

• Traditionally, interprocess communication has been the


responsibility of the kernel.
• Problems with kernel-based communication:
▫ Architectural performance barriers
Performance limited by cost of invoking the kernel and reallocating
processor to a different address space.
In authors’ earlier work, LRPC, 70% of overhead can be attributed to
kernel mediation.
▫ Interaction with high-performance user-level threads
For satisfactory performance, medium- and fine-grained parallel
applications need user-level thread management.
Costs (performance and system complexity) high for partitioning
strongly interdependent communication and thread management
across protection boundaries.
User-level cross-address space communication
• With shared memory multiprocessor, kernel can be eliminated from cross-
address space communication.
• User-level managed cross-address space communication:
▫ Shared memory can be utilized as the data transfer channel.
▫ Messages sent between address spaces directly—no kernel intervention.
▫ Frequency of processor reallocation can be reduced by preferring threads from the
same address space as the processor is already active in.
▫ The inherent parallelism of sending and receiving messages can be exploited for
improved call performance.
User-level Remote Procedure Call (URPC)
• URPC provides safe and efficient communication between address spaces
on the same machine without kernel mediation.
• Three components of interprocess communication isolated: processor
reallocation, thread management, and data transfer.
• Control transfer between address spaces handled by thread management
and processor reallocation.
• Compared to traditional “small kernel systems” in which the kernel handles
all three components, in URPC only processor reallocation involves the
kernel.
• URPC implementation cross-address space procedure call latency 93
microseconds compared to LRPC’s (using kernel-managed communication)
157 microseconds.
Messages vs. Procedure Calls
• Commonly, in existing systems communication between applications
achieved by sending messages over a narrow channel that supports just a
few operations.
• While powerful construct, message passing may require thinking in a
paradigm not supported by the programming language in use.
▫ RPC better supports the programming paradigm supported by the programming
language.
▫ Message-passing may serve as the underlying technique, but programming
approach more familiar.
▫ With many programming languages, communication within an address space has
synchronous procedure call, data typing, and shared memory features.
▫ RPC reconciles the above with the untyped, asynchronous messages of cross-
address space communication by using an abstraction that hides these
characteristics of the cross-address space communication from the user.
Synchronization
• To the programmer, cross-address space procedure calls appear
synchronous (calling thread blocks waiting for return).
• At the thread management level and below, however, the call is
asynchronous.
• While blocked, another thread can be run. URPC favors threads in the same
address space as the processor is already running.
• As soon as the reply arrives, the blocked thread can be scheduled to any
processor assigned to its address space.
• All the involved scheduling operations can be handled by a user-level thread
management system, avoiding the need to reallocate any processors to a
new address space (if the callee address space has at least one processor
assigned to it).
User view/system components
• Two software packages used by stub and application code:
FastThreads (user-level threads scheduled on top of middleweight
kernel threads) and URPC package
Processor reallocation overhead
• There is significantly less overhead in switching a processor over to a thread
in the same address space (context switching) than reallocating to a
thread in a different address space (processor reallocation).
• Processor reallocation high overhead:
▫ Requires changing the protected mapping registers that define the virtual address
space context.
▫ Scheduling costs to choose address space for reallocation.
▫ Substantial long-term costs due to poor cache and TLB performance from constant
locality switches (greater for processor reallocation than just context switching).
▫ Minimal latency same-address space context switch takes about 15 microseconds
on the C-VAX while cross-address space processor reallocation takes 55
microseconds (doesn’t consider long-term costs!).
Processor Reallocation Policy
• An optimistic reallocation policy is employed: threads from the
same address space are scheduled whenever possible. Assumes:
▫ The client has other work to do.
▫ The server will soon have a processor with which to service a request.
• Optimistic assumptions may not always hold (allowing for a
processor reallocation to server)
▫ Single-threaded applications
▫ High-latency I/O operations
▫ Real-time applications (bounded call latency)
▫ Server may not have its own processor.
• Voluntary return of processors cannot be guaranteed.
▫ No way to enforce policies regarding return of processors.
▫ If client gives processor up to server, may never get back.
Sample Execution
One client (an editor) and two servers
(window manager and file manager)
each have their own address space.
T1 and T2 are threads in the editor.
Two available processors.
Data transfer using shared memory
• Data flows between URPC packages in different address spaces over a
bidirectional shared memory queue with non-spinning test-and-set locks on
either end.
▫ Receiver should never have to delay (spin-wait) while sender holds the lock.
• “Abusability factor” not increased by shared memory message channels
▫ Denial of service, bogus results, communication protocol violation still possible.
▫ Up to higher-level protocols to filter abuses up to application layer.
• Communication safety responsibility of stubs
▫ Arguments passed in shared memory buffers during binding phase.
• No kernel copying necessary.
Cross-address space call and thread
management
• Strong correlation between communications functions and thread
management synchronization functions
▫ RPCs synchronous with respect to calling thread
▫ Each communication function has corresponding thread management
function (send <-> start, receive <-> stop)
• Thread classification:
▫ Heavyweight: kernel makes no distinction between thread and address
space
▫ Middleweight: kernel-managed threads but decoupled from address
spaces (can be multiple threads per address space)
▫ Lightweight: threads managed by user-level libraries
• Fine-grained parallel applications demand high-performance thread
management, only possible with user-level threads.
• Two-level scheduling: lightweight user-level threads are scheduled on top of
weightier kernel-managed threads.
URPC Performance
•Processing times close for
client and server

•Nearly half of time goes to


thread management (dispatch)
URPC Performance
C: client processors, S: server processors;
T:runnable threads in client address space
Latency is traded for throughput; T=2 from T=1
means ~20% latency increase, but ~75%
throughput increase
And beyond…
• LRPC: improve RPC for cross-domain
communication performance.
• URPC: move inter-process communication to
user-level as much as possible.
▫ Threads managed at user-level but scheduled on
top of middleweight threads.
• Scheduler Activations (Feb 1992): provide better
abstraction (than kernel threads) for kernel
support of user-level threads.

You might also like