You are on page 1of 26

Introduction

CSCI319 Distributed Systems


Key goal of distributed systems?

Lecture 7

Distributed file systems

Dr. Z.Q. (George) Zhou

1 2

Introduction Introduction

Sharing of stored info is perhaps the most important


What is the most heavily loaded service in an intranet ?
aspect of distributed resource sharing
E.g. How Web servers share info?

3 4

Introduction Introduction

File systems Concentration of persistent storage at a few servers


Originally developed for centralized computer systems and Reduced the need for local disk storage
desktop computers as an OS facility More importantly: enables economies to be made in
Distributed file systems management and archiving of the persistent data owned by an
A file service enables programs to store and access remote files organization
exactly as they do local ones, allowing users to access their files Make other services easier to implement
from any computer in an intranet. E.g. name service, user authentication service, print service
E.g. Web servers store and access material from a local distributed file
Advantages? system

5 6

1
Provides an emulation of a
Storage systems and their properties (roughly) Storage
shared memorysystems
by and their properties (roughly)
replication of memory pages
or segments at each host
Sharing Persis- Distributed Consistency Example Sharing Persis- Distributed Consistency Example
tence cache/replicas maintenance tence cache/replicas maintenance

Main memory RAM Main memory 1 RAM


File system UNIX file system File system 1 UNIX file system
Distributed file system Sun NFS Distributed file system Sun NFS
Web Web server Web Web server

Distributed shared memory Ivy (DSM, Ch. 18) Distributed shared memory Ivy (DSM, Ch. 18)

Remote objects (RMI/ORB) CORBA Remote objects (RMI/ORB) 1 CORBA


Persistent object store (CORBA CORBA Persistent Persistent object store (CORBA 1 CORBA Persistent
& Persistent Java) Object Service & Persistent Java) Object Service
Peer-to-peer storage system OceanStore (Ch. 10) Peer-to-peer storage system 2 OceanStore (Ch. 10)

Types of consistency: Types of consistency:


1: strict one-copy. √: slightly weaker guarantees. 2: considerably weaker guarantees. 1: strict one-copy. √: slightly weaker guarantees. 2: considerably weaker guarantees.
7 8

Consistency Consistency

Consistency: whether mechanisms exist to maintain Distributed systems: strict consistency is more difficult to
consistency between multiple copies of data when updates achieve. Sun NFS caches copies of portions of files at
occur. client computers, and they adopt specific consistency
caching : first applied to main memory and non- mechanisms to maintain an approximation to strict
distributed file systems (strict ‘1’) – program cannot consistency. This is indicated by a tick.
observe difference between cached copies and stored data
after an update.
Distributed systems: ?

9 10

Consistency Consistency

Web: uses caching extensively at ___ and ___ Web: uses caching extensively at client computers and
proxy servers.
Consistency is often maintained by explicit user actions
E.g. update a web page and see what happens when browsing it?
OK for browsing, not OK for cooperative apps such as shared
whiteboard.

11 12

2
Consistency Consistency

Persistent object store: CORBA and Persistent Java Persistent object store: CORBA and Persistent Java
maintain single copies of persistent objects. The only maintain single copies of persistent objects. The only
consistency issue is _________________ consistency issue is between the persistent copy of an
object on disk and the active copy in memory, which is
not visible to remote clients.

13 14

Characteristics of (non-distributed) file systems Characteristics of (non-distributed) file systems

Responsibilities of file systems? Responsibilities of file systems


Organization
Storage
Retrieval
Naming
Sharing
Protection

of files

15 16

Characteristics of (non-distributed) file systems Characteristics of (non-distributed) file systems

Responsibilities of file systems Files contain both data and attributes


provide a programming interface Data: a sequence of data items
File length
Attributes: held as a single
Creation timestamp
“abstraction” record, e.g.:
Read timestamp
Not normally updated by user programs
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list

17 Fig. 8.3 18

3
Characteristics of (non-distributed) file systems Figure 8.2 (non-distributed) File system modules (layered)

Metadata: all extra persistent info stored by file system


for managing files, e.g. file attributes, directories etc. Directory module: relates file names to file IDs
Directory: support naming of files File module: relates file IDs to particular files
a file, often of a special type, that provides a mapping from text
Access control module: checks permission for operation requested
name to internal file ID
File access module: reads or writes file data or attributes
May include names of other directories, leading to the familiar
hierarchic file naming scheme and the multi-part pathnames Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

Each layer depends only on the layers below it.

19 20

Figure 8.4 UNIX file system operations


Figure 8.2 (non-distributed) File system modules (layered) (system calls implemented by the kernel)

filedes = open(name, mode) Opens an existing file with the given name.
filedes = creat(name, mode) Creates a new file with the given name.
Directory module: relates file names to file IDs Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
File module: relates file IDs to particular files status = close(filedes) Closes the open file filedes.
Distributed file service requires all count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
Access control module: checks permission for operation requested
of the components shown here, with count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
File access module: additional
reads components
or writes to deal with
file data or attributes
and advance the read-write pointer.
Block module: client server
accesses communication
and allocates disk blocksand pos = lseek(filedes, offset, Moves the read-write pointer to offset (relative or absolute,
whence) depending on whence).
with the distributed naming and
Device module: disk I/O and buffering status = unlink(name) Removes the file name from the directory structure. If the file
location of files. has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
Each layer depends only on the layers below it. status = stat(name, buffer) Gets the file attributes for file name into buffer.

Accessed through library procedures such as C Standard I/O Library or Java file classes
21 22

Figure 8.4 UNIX file system operations


(system calls implemented by the kernel)
Idempotent? Distributed file system requirements
Stateless?
filedes = open(name, mode) Opens an existing file with the given name.
Separating implementation concerns
filedes = creat(name, mode) Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open Abstract/model of file service
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset, Moves the read-write pointer to offset (relative or absolute,
whence) depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.

Accessed through library procedures such as C Standard I/O Library or Java file classes
23 24

4
Distributed file system requirements Distributed file system requirements

Many of the requirements and potential pitfalls in the Transparency


design of distributed services were first observed in the Access transparency: Client programs should be unaware of
early development of distributed file systems. distribution of files. A single set of operations is provided for
access to local and remote files. Programs written to operate on
Initially, they offered access transparency and location local files are able to access remote files without modification.
transparency
Then performance, scalability, concurrency control, fault
tolerance, and security requirements emerged and were
met in subsequent phases of development.

25 26

Distributed file system requirements Distributed file system requirements

Transparency Transparency
Location transparency: Files may be relocated without changing Mobility transparency: Neither client programs nor system
their pathnames, and user programs see the same name space admin tables in client nodes need to be changed when files are
wherever they are executed. moved. This allows file mobility: files may be moved, either by
system admins or automatically.

27 28

Distributed file system requirements Distributed file system requirements

Transparency Transparency
Performance transparency: Client programs should continue to Scaling transparency: The service can be expanded by
perform satisfactorily while the load on the service varies within incremental growth to deal with a wide range of loads and
a specified range. network sizes.

29 30

5
Distributed file system requirements Distributed file system requirements

Concurrent file updates File replication


Changes to a file by one client should not interfere with the Several copies at different locations.
operation of other clients simultaneously accessing or changing Benefits ?
the same file
This is a well known issue of concurrency control
E.g. file- or record-level locking

31 32

Distributed file system requirements Distributed file system requirements

File replication Hardware and OS heterogeneity


Several copies at different locations. Service interface should be defined so that client and server can
Benefits ? have different OS and hardware.
Multiple servers can share workload, enhancing scalability This is an important aspect of openness.
Enhancing fault tolerance
Few file services support replication fully, but most support the
caching of files or portions of files locally, a limited form of
replication.

33 34

Distributed file system requirements Distributed file system requirements

Fault tolerance Consistency


Service continues when client or server fails When files are replicated or cashed at different sites, there is an
inevitable delay in the propagation of modifications made at one
Three concepts: site to all of the other sites that hold copies, and this may reult in
some deviation from one-copy semantics.
Fault prevention, fault detection, fault tolerance

35 36

6
Distributed file system requirements Distributed file system requirements

Security Efficiency
Virtually all file systems provide access control. A distributed file system should provide a service that is
In distributed file systems, there is a need to authenticate client comparable with, or better than, local file systems in
requests so that access control at the server is based on correct performance and reliability. It must be convenient to
user ID and to protect the contents of request and reply messages administer, providing operations and tools that enable system
with digital signatures and (optionally) encryption of secret data. admins to install and operate the system conveniently.

37 38

Software qualities: A software engineering perspective


Representative Qualities

Correctness, reliability, and robustness


Performance
Usability
Verifiability
Maintainability
Reparability
Evolvability
Reusability
Portability
Understandability
Interoperability
Productivity
Timeliness
39
Visibility 40

SE principles File service architecture

An abstract architectural model (underpins Sun NFS and


Rigor and formality Andrew File System).
Client computer Server computer
Separation of concerns
Modularity Application Application
program program
Directory service

Abstraction
Anticipation of change Flat file service
Generality Client module

Incrementality

41 42

7
File service architecture File service architecture

Flat file service & directory service: provide a Client module: provides a single programming interface
comprehensive set of operations (interfaces for use by with operations on files similar to conventional file
clientClient
programs)
computer for access to files. Server computer systems
Client computer Server computer

Application Application Directory service Application Application Directory service


program program program program

Flat file service Flat file service

Client module Client module

43 44

File service architecture File service architecture

Design is open:
open different client modules can be used to Flat file service
implement different programming interfaces, simulating Implements operations on contents of files
the fileClient
operations
computer of a variety of different OScomputer
Server and UFIDs) are used to refer to files in all
Unique File Identifiers (UFID
optimizing performance for different hardware Directory service
requests for flat file service operations
Application Application
configurations.
program program Division of responsibilities between file service and directory
service is based upon the use of UFIDs.
UFID: a long sequence of bits; each file in the distributed system has a
Flat file service unique UFID.
Client module When the flat file service receives a request to create a file, it generates a
new UFID for it and returns the UFID to the requester.

45 46

File service architecture File service architecture

Directory service Directory service


Provides a mapping between file’s text name and its UFID.
?
Provides functions needed to generate directories, to add new
file names to directories and to obtain UFIDs from directories.
It is a client of flat file service
May/may not? hold references to other directories.

47 48

8
Figure 8.6 Flat file service operations
File service architecture (A definition of the interface to a flat file service)

Client module
Runs in each client computer
Read(FileId, i, n) -> Data If 1 i Length(File): Reads a sequence of up to n items
Integrates and extends the ops of flat file service and directory — throws BadPosition from a file starting at item i and returns it in Data.
service under a single API that is available to user level Write(FileId, i, Data) If 1 i Length(File)+1: Writes a sequence of Data to a
programs in client computer — throws BadPosition file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Caches recent file blocks Delete(FileId) Removes the file from the file store.
E.g. in Unix host: GetAttributes(FileId) -> Attr Returns the file attributes for the file.
Emulates full set of Unix file operations SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not
Interprets Unix multi-part file names by requests to directory service shaded in Figure 8.3).

This is the RPC interface used by client module, not normally


used directly by user-level programs.

49 50

File service architecture File service architecture

Compare with Unix: which one can achieve: Compare with Unix
idempotent)?
repeatable operations (idempotent Is the interface functionally equivalent to that of Unix file
Stateless server? system primitives ?
Stateless servers can be restarted after a failure and resume operation
without need for client or server to restore any state.

51 52

File service architecture File service architecture

Compare with Unix Compare with Unix


The interface and the Unix file system primitives are The interface and the Unix file system primitives are
functionally equivalent functionally equivalent
But no open and close (files can be accessed immediately using But no open and close (files can be accessed immediately using
UFID; e.g. NFS server does not keep files open on behalf of clients) UFID)
Read / Write: difference from Unix? Read / Write: specifies starting point
In Unix, start at current position of read-write pointer; pointer is
advanced by the number of bytes transferred after each
read/write

53 54

9
An example of file operation in Unix

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
FILE *fp;

Output? Output?
const char *str = "1234567890";
fp = fopen("sample.txt", "w");
if(fp == NULL) {
perror("failed to open sample.txt");
return EXIT_FAILURE;
}
fwrite(str, 1, strlen(str), fp);
fseek(fp, -4, SEEK_CUR);
fwrite("a", 1, strlen("a"), fp);
fclose(fp);
return 1;
}

55 56

File service architecture File service architecture

Reason for difference? Reason for difference?


Repeatable operations: except Create, all ops are idempotent, Repeatable operations: except Create, all ops are idempotent,
allowing at-least-once RPC semantics (client may/may not? allowing at-least-once RPC semantics (client may repeat calls to
repeat calls to which they receive no reply). which they receive no reply).
Repeated execution of Create produces a different new file for
each call.

57 58

File service architecture File service architecture

Reason for difference? Unix file operations are neither idempotent nor stateless.
Repeatable operations: except Create, all ops are idempotent, A read-write pointer is generated whenever a file is opened
allowing at-least-once RPC semantics (client may repeat calls to If an operation is accidentally repeated, the automatic advance
which they receive no reply). of the pointer results in access to a different portion of file
Repeated execution of Create produces a different new file for
each call.
Stateless server

59 60

10
File service architecture File service architecture

Access control Access control


Difference between distributed and non-distributed systems? Difference between distributed and non-distributed (e.g. Unix)?
In a distributed system, access rights checks need to be performed at the
Non-distributed example: In Unix, user’s access rights are checked
server/client?
against the access mode (read or write) requested in the open call. User
ID used in the access rights check is the result of the user’s earlier
authenticated login and cannot be tampered with. The resulting access
rights are retained until file is closed (No further checks are required
when subsequent ops on the same file are requested).

61 62

File service architecture File service architecture

Access control Access control


Difference between distributed and non-distributed (e.g. Unix)? Difference between distributed and non-distributed (e.g. Unix)?
In a distributed system, access rights checks need to be performed at the In a distributed system, access rights checks need to be performed at the
server. server.
• A user ID needs to be passed with requests
– could be forged ID!! Solution?
• A user ID needs to be passed with requests
– could be forged ID!!
• Further, if the results of access rights check were retained at server • Further, if the results of access rights check were retained at server
and used for future access, the server would no longer be stateless. and used for future access, the server would no longer be stateless.

Solution?

63 64

Figure 8.7 RPC interface to a directory service


(operations on individual directories) Figure 8.7 RPC interface to a directory service

Lookup(Dir, Name) -> FileId Locates the text name in the directory and returns the Lookup(Dir, Name) -> FileId Locates the text name in the directory and returns the
— throws NotFound relevant UFID. If Name is not in the directory, throws an — throws NotFound relevant UFID. If Name is not in the directory, throws an
exception. exception.
AddName(Dir, Name, FileId) If Name is not in the directory, adds (Name, File) to the AddName(Dir, Name, FileId) If Name is not in the directory, adds (Name, File) to the
— throws NameDuplicate directory and updates the file’s attribute record. — throws NameDuplicate directory and updates the file’s attribute record.
If Name is already in the directory: throws an exception. If Name is already in the directory: throws an exception.
UnName(Dir, Name) If Name is in the directory: the entry containing Name is UnName(Dir, Name) If Name is in the directory: the entry containing Name is
— throws NotFound removed from the directory. — throws NotFound removed from the directory.
If Name is not in the directory: throws an exception. If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the
regular expression Pattern. regular expression Pattern.

Purpose: translate file text names to UFIDs Purpose: translate file text names to UFIDs
Method: maintains directory files containing the mappings Method: maintains directory files containing the mappings
Each directory is stored as _______________ Each directory is stored as a conventional file with a UFID

65 66

11
Figure 8.7 RPC interface to a directory service Figure 8.7 RPC interface to a directory service

Lookup(Dir, Name) -> FileId Locates the text name in the directory and returns the Lookup(Dir, Name) -> FileId Locates the text name in the directory and returns the
— throws NotFound relevant UFID. If Name is not in the directory, throws an — throws NotFound relevant UFID. If Name is not in the directory, throws an
exception. exception.
AddName(Dir, Name, FileId) If Name is not in the directory, adds (Name, File) to the AddName(Dir, Name, FileId) If Name is not in the directory, adds (Name, File) to the
— throws NameDuplicate directory and updates the file’s attribute record. — throws NameDuplicate directory and updates the file’s attribute record.
If Name is already in the directory: throws an exception. If Name is already in the directory: throws an exception.
UnName(Dir, Name) If Name is in the directory: the entry containing Name is UnName(Dir, Name) If Name is in the directory: the entry containing Name is
— throws NotFound removed from the directory. — throws NotFound removed from the directory.
If Name is not in the directory: throws an exception. If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the
regular expression Pattern. regular expression Pattern.

Purpose: translate file text names to UFIDs


Method: maintains directory files containing the mappings
So, directory service
is a client of the file
Dir: a UFID
Purpose: for
translate file textthe
names to UFIDs
Method: maintains directory files containing the mappings
Each directory is stored as a conventional file with a UFID service file
Each containing
directory is stored asthe
a conventional file with a UFID

67
directory 68

File service architecture File service architecture

Hierarchic file system File groups


E.g. Unix a collection of files located on a given server
A number of directories in a tree structure A server may hold several file groups
Each directory holds names of files and other directories Groups can be moved between servers
accessible from it. A file cannot change group
Unix and most OS: a similar construct called a filesystem
“file system” : a software component providing access to files
“filesystem” (e.g. the Unix mountable filesystem): the set of files held in
a storage device or partition

69 70

File service architecture Case study: Sun Network File System (NFS)

File groups Introduced in 1985 by Sun


Originally introduced to support moving collections of files Achieved success both technically and commercially
stored on removable media between computers Widely adopted in industry and academic environments
In distributed file service, support allocation of files to file
servers
Several distributed file services already existed in uni and
File group ID must be unique throughout a distributed system
research labs, but NFS was the first designed as a product

See commands nfsd, mountd, rpcbind

71 72

12
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Key interfaces were placed in the public domain (why?) Key interfaces were placed in the public domain (why?)
E.g. Sun Microsystems Inc. (1989). NFS: Network File System
Protocol Specification. RFC 1094.
See http://portal.acm.org/citation.cfm?id=RFC1094 To encourage adoption as a standard
Then undergone many versions

73 74

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Each computer in an NFS network can be both client and All implementations of NFS support NFS protocol – a set
server of RPC (for clients to perform ops on a remote file store)
Files at every machine can be made accessible by other NFS protocol is OS independent but was originally
machines developed for use in networks of Unix systems
A common practice: some dedicated servers and others We will describe the Unix implementation of NFS
are workstations protocol (version 3).
Hardware and OS heterogeneity
Design is OS independent: client and server implementation
exist for almost all OS, e.g. Unix, Linux, Mac, Windows

75 76

Figure 8.8 Figure 8.8


NFS architecture NFS architecture
NFS Server module resides in kernel on
each NFS server

Client computer Server computer Client computer Server computer

Application Application Application Application


program program program program
UNIX UNIX
system calls system calls
UNIX kernel UNIX kernel
UNIX kernel Virtual file system Virtual file system UNIX kernel Virtual file system Virtual file system
Local Remote Local Remote
file system

file system

UNIX NFS NFS UNIX UNIX NFS NFS UNIX


file file file file
Other

Other

client server client server


system system system system
NFS NFS
protocol protocol

It follows the abstract model discussed previously.


77 78

13
Figure 8.8 Figure 8.8
Requests referring to remote files are NFS client and server modules communicate
NFS architecture
translated by client module to NFS protocol
NFS architecture
using RPC.
ops and then passed to NFS server module (Sun’s RPC was developed for use in NFS)
Client computer Server computer Client computer Server computer

Application Application Application Application


program program program program
UNIX UNIX
system calls system calls
UNIX kernel UNIX kernel
UNIX kernel Virtual file system Virtual file system UNIX kernel Virtual file system Virtual file system
Local Remote Local Remote
file system

file system
UNIX NFS NFS UNIX UNIX NFS NFS UNIX
file file file file
Other

Other
client server client server
system system system system
NFS NFS
protocol protocol

79 80

Figure 8.8 Figure 8.8


NFS architecture NFS architecture

Client computer
RPC can use either UDP or TCP. Server
NFS computer Client computer Server computer

protocol is compatible with both.


Application Application The RPC interface to NFS server is/is not? open.
Application Application
program program program program
UNIX UNIX
system calls system calls
UNIX kernel UNIX kernel
UNIX kernel Virtual file system Virtual file system UNIX kernel Virtual file system Virtual file system
Local Remote Local Remote
file system

file system

UNIX NFS NFS UNIX UNIX NFS NFS UNIX


file file file file
Other

Other

client server client server


system system system system
NFS NFS
protocol protocol

81 82

Figure 8.8 Figure 8.8


NFS architecture NFS architecture

Client computer Server computer Client computer Server computer


Optional security feature: signed user
The RPC interface to NFS server is open. (any
credentials, encryption of data.
processApplication
Application can send requests to NFS server) Application Application
program program program program
UNIX UNIX
system calls system calls
UNIX kernel UNIX kernel
UNIX kernel Virtual file system Virtual file system UNIX kernel Virtual file system Virtual file system
Local Remote Local Remote
file system

file system

UNIX NFS NFS UNIX UNIX NFS NFS UNIX


file file file file
Other

Other

client server client server


system system system system
NFS NFS
protocol protocol

83 84

14
Figure 8.8 Figure 8.8
NFS architecture NFS architecture

Client computer Server computer Other


Client distributed
computer file systems may be present
Server computer
Provides access transparency: no distinction
that support Unix system calls. If so, they could
between local and remote files for user programs.
Application Application be integrated
Application in the same way.
Application
program program program program
UNIX UNIX
system calls system calls
UNIX kernel UNIX kernel
UNIX kernel Virtual file system Virtual file system UNIX kernel Virtual file system Virtual file system
Local Remote Local Remote
file system

file system
UNIX NFS NFS UNIX UNIX NFS NFS UNIX
file file file file
Other

Other
client server client server
system system system system
NFS NFS
protocol protocol

85 86

Figure 8.8
NFS architecture Case study: Sun Network File System (NFS)

The virtual file system computeris added to Unix kernel to distinguish


Clientmodule computerlocal
between
Server NFS file IDs vs Unix file IDs
and remote files and to translate between Unix independent file IDs used by
NFS and the internal file IDs used in Unix and other file systems, and it passes Unix: i-node number of a file is a number that identifies and
Application
each request Application local system module (see figure).
to the appropriate
program program
locates the file within the file system where the file is stored
UNIX
system calls
UNIX kernel
UNIX kernel
Example:
Virtual file system Virtual file system
Local Remote
$ ls -i
file system

UNIX NFS NFS UNIX


file file 2065648 aaa.txt 2065649 bbb.file
Other

client server
system system
NFS
protocol $

87 88

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

File handle:
NFS file IDs vs Unix file IDs
Filesystem ID | i-node number of file | i-node generation number
In NFS: file IDs are called file handles
derived from the file’s i-node number by adding two extra fields as Note 1: NFS adopts Unix mountable filesystem as unit of file
follows:
grouping.
File handle:
Note 2: i-node generation number is needed because
Filesystem ID | i-node number of file | i-node generation number conventional Unix reuses i-node number after a file is
removed. The generation number is stored with each file
and is incremented each time the i-node number is reused,
e.g. in a Unix creat system call.

89 90

15
Case study: Sun Network File System (NFS) Figure 8.9 NFS server RPC interface (simplified) – 1

File handle: lookup(dirfh, name) -> fh, attr Returns file handle and attributes for the file name in the directory
dirfh.
Filesystem ID | i-node number of file | i-node generation number create(dirfh, name, attr) -> Creates a new file name in directory dirfh with attributes attr and
newfh, attr returns the new file handle and attributes.
Note 3: file handles are passed from server to client in the remove(dirfh, name) status Removes file name from directory dirfh.
results of lookup, create and mkdir (see Fig. 8.9) and getattr(fh) -> attr Returns file attributes of file fh. (Similar to the UNIX stat system
call.)
from client to server in the argument lists of all server setattr(fh, attr) -> attr Sets the attributes (mode, user id, group id, size, access time and
operations. modify time of a file). Setting the size to 0 truncates the file.
read(fh, offset, count) -> attr, data Returns up to count bytes of data from a file starting at offset.
Also returns the latest attributes of the file.
write(fh, offset, count, data) -> attr Writes count bytes of data to a file starting at offset. Returns the
attributes of the file after the write has taken place.
rename(dirfh, name, todirfh, toname) Changes the name of file name in directory dirfh to toname in
-> status directory to todirfh
.
link(newdirfh, newname, dirfh, name) Creates an entry newname in the directory newdirfh which refers to
-> status file name in the directory dirfh.
Continues on next slide ...
91 92
Cf. Figs 8.6 & 8.7 & Unix counterparts except readdir and statfs

Figure 8.9 NFS server RPC interface (simplified) – 2 Case study: Sun Network File System (NFS)

symlink(newdirfh, newname, string)


-> status
Creates an entry newname in the directory newdirfh of type The virtual file system layer has one VFS structure for
symbolic link with the value string. The server does not interpret
the string but makes a symbolic link file to hold it. each mounted file system and one v-node per open file
readlink(fh) -> string Returns the string that is associated with the symbolic link file
identified by fh.
A VFS structure relates a remote file system to local directory
mkdir(dirfh, name, attr) -> Creates a new directory name with attributes attr and returns the
on which it is mounted
newfh, attr new file handle and attributes.
A v-node contains
rmdir(dirfh, name) -> status Removes the empty directory name from the parent directory dirfh.
Fails if the directory is not empty. an indicator to show whether file is local or remote
readdir(dirfh, cookie, count) -> Returns up to count bytes of directory entries from the directory if local, v-node contains a reference to the index of the local file (an _
entries dirfh. Each entry contains a file name, a file handle, and an opaque ____in Unix); if remote, it contains the _______ of the remote file.
pointer to the next directory entry, called a cookie. The cookie is
used in subsequent readdir calls to start reading from the following
entry. If the value of cookie is 0, reads from the first entry in the
directory.
statfs(fh) -> fsstats Returns file system information (such as block size, number of
free blocks and so on) for the file system containing a file fh.

93 94

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

The virtual file system layer has one VFS structure for The NFS client module
each mounted file system and one v-node per open file Supplies an interface for conventional apps
A VFS structure relates a remote file system to local directory Emulates the semantics of standard Unix file system primitives
on which it is mounted precisely and is integrated with Unix kernel (rather than supplied
A v-node contains as a library for loading into client processes)
an indicator to show whether file is local or remote Benefits?
if local, v-node contains a reference to the index of the local file (an i-
node in Unix); if remote, it contains the file handle of the remote file.

95 96

16
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

The NFS client module Access control and authentication


Supplies an interface for conventional apps NFS server is stateless and does not keep files open on behalf of
Emulates the semantics of standard Unix file system primitives clients, so …
precisely and is integrated with Unix kernel (rather than supplied must check user ID against file’s access permission on each request
as a library for loading into client processes)
Benefits?
• User programs can access files via Unix system calls without Question: Why are these additional
recompilation or reloading
• A single client module serves all user-level processes, with a shared
parameters not shown in Fig 8.9?
cache of recently used blocks
• Encryption key used to authenticate user IDs passed to the server can
be retained in kernel, preventing impersonation by user-level clients.

97 98

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Access control and authentication Access control and authentication


NFS server is stateless and does not keep files open on behalf of NFS server is stateless and does not keep files open on behalf of
clients, so … clients, so …
must check user ID against file’s access permission on each request must check user ID against file’s access permission on each request

Auto supplied by RPC system.


Security problem: client can modify
RPC calls to include other users’ IDs.

99 100

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Access control and authentication Mount service


On each server, there is a file (e.g. /etc/exports) containing the
NFS server is stateless and does not keep files open on behalf of
clients, so … names of local filesystems available for remote mounting
For each filesystem, indicates which hosts are permitted to mount it
must check user ID against file’s access permission on each request
Clients: use a modified version of Unix mount command to
request mounting of a remote filesystem, specifying remote
host name, pathname of a remote directory and the local name
with which it is to be mounted.
Solution: encrypt user’s authentication info.
The remote directory may be any sub-tree, enabling clients to
mount any part of the remote filesystem.

101 102

17
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Mount service Mount service


The “modified version of Unix mount command” communicates
with the mount service process on the remote host using a mount Server 1 Client Server 2
(root) (root) (root)
protocol.
The location (IP address and port number) of the server and the file
handle for the remote directory are passed on to the VFS layer and the
NFS client export ... vmunix usr nfs

Remote Remote
people students x staff users
mount mount

big jon bob ... jim ann jane joe

Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
103 the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. 104

Case study: Sun Network File System (NFS) Question

Hard-mounted After the timeout of an RPC call to access a file on a


When user-level process accesses a file in a filesystem that is hard-mounted file system, the NFS client module does not
hard-mounted, the process is suspended until the request can be return control to the user-level process that originated the
completed; if the remote host is unavailable, the NFS client call. Why?
module continues to retry until it is satisfied.
Thus in the case of a server failure, user-level processes are
suspended until the server restarts and then they continue just
as though there had been no failure.

105 106

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Soft-mounted Soft-mounted
In the case of a server failure, NFS client module returns a In the case of a server failure, NFS client module returns a
failure indication to user-level processes after a small number failure indication to user-level processes after a small number
of retries. of retries.
So, well-written programs can detect the failure So, well-written programs can detect the failure
But many Unix utilities and apps do not test for file access
failure; hence causing unpredictable results.

Discussion: If you are to write an installation, will you use soft- or


hard-mounting ?

107 108

18
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Soft-mounted Path name translation


In the case of a server failure, NFS client module returns a Unix translates multi-part file pathnames to i-node references
failure indication to user-level processes after a small number In NFS, can pathnames be translated at a server?
of retries.
So, well-written programs can detect the failure
But many Unix utilities and apps do not test for file access
failure; hence causing unpredictable results.
For this reason, many installations use hard mounting
exclusively, with the consequence that programs are unable to
recover gracefully when an NFS server is unavailable for a
significant period.

109 110

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Path name translation Automounter


Unix translates multi-part file pathnames to i-node references To mount a remote directory dynamically whenever an
In NFS, can pathnames be translated at a server? “empty” mount point is referenced by a client
Note: directories holding different parts of a multi-part name When no reference for several minutes, unmount
may reside in filesystems at different servers. So pathnames are
parsed, and their translation is performed in an interactive
manner by the client. “autofs” in Solaris and Linux.
man automount

Discussion: Does your server needs to wait for other servers


to start?
111 112

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Review caching in conventional OS Review caching in conventional OS (e.g. Unix)


Cache things that have been read from disk
Read-ahead : anticipates read accesses and fetches the pages
Caching examples? Your experience? following those that have most recently been read
What data are cached and when and why ? Delayed-write: write to disk later (see sample program
cache.c)

113 114

19
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Server caching in NFS Server caching in NFS


write offers two options (not shown in Fig 8.9) write offers two options (not shown in Fig 8.9)
write-through caching: write to disk before sending a reply to client. write-through caching: write to disk before sending a reply to client.
• Client can be sure that data are written to file when reply is received. • Client can be sure that data are written to file when reply is received.
Data are stored only in memory cache. Will be written to disk when a
commit operation is received.
Advantages and disadvantages? • Client can be sure that data is written to file when __________ ?

115 116

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Server caching in NFS Client caching


write offers two options (not shown in Fig 8.9) Purpose: reduce number of requests sent to servers
write-through caching: write to disk before sending a reply to client.
Potential problems?
• Client can be sure that data are written to file when reply is received.
Data are stored only in memory cache. Will be written to disk when a
commit operation is received.
• Client can be sure that data is written to file when reply to commit is
received.
(Standard NFS clients use this mode.)

117 118

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Client caching Value of t ?


Timestamp: validate cache before use Set to small or large? advantages and disadvantages?

Data item 1
Data item 2

… Each entry has:


Tc: time when cache entry was last validated
Tm: time when block was last modified at server

Validity condition at time T: Validity condition at time T:


(T-Tc<t) OR (Tmclient=Tmserver) (T-Tc<t) OR (Tmclient=Tmserver)

cache t : freshness interval t : freshness interval


119 120

20
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

In Sun Solaris clients, t is set adaptively for individual


files, in range 3 to 30 seconds depending on the
frequency of updates to the file; for directories, range is
30 to 60 seconds. Tc: time when cache entry was last validated
Tm: time when block was last modified at server

Which part needs to access server?


Validity condition at time T: Validity condition at time T:
(T-Tc<t) OR (Tmclient=Tmserver) (T-Tc<t) OR (Tmclient=Tmserver)

t : freshness interval 121 t : freshness interval 122

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

When first half is false, current value of Tmserver is


obtained (by a getattr call to server) and compared with
local value of Tmserver.
Tc: time when cache entry was last validated If same, then entry is valid, and Tc is updated to current time.
Tm: time when block was last modified at server
If differ, then request server for relevant data.

Tc: time when cache entry was last validated


Tm: time when block was last modified at server
When first half is true, need to access server?
Validity condition at time T: Validity condition at time T:
(T-Tc<t) OR (Tmclient=Tmserver) (T-Tc<t) OR (Tmclient=Tmserver)

t : freshness interval 123 t : freshness interval 124

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Measures to reduce traffic to server Measures to reduce traffic to server


When new value of Tmserver arrives at client, apply to all entries When new value of Tmserver arrives at client, apply to all entries
Carry (piggyback) file attribute info with results of every file op. Carry (piggyback) file attribute info with results of every file op.
Adaptive algorithm for value of t Adaptive algorithm for value of t

Tc: time when cache entry was last validated Tc: time when cache entry was last validated
Tm: time when block was last modified at server Tm: time when block was last modified at server

Validity condition at time T: Validity condition at time T:


(T-Tc<t) OR (Tmclient=Tmserver) (T-Tc<t) OR (Tmclient=Tmserver)

t : freshness interval 125 t : freshness interval 126

21
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Consistency Consistency
An approximation to one-copy semantics An approximation to one-copy semantics
Meets the needs of majority of applications Meets the needs of majority of applications

The use of file sharing via NFS for communication of close The use of file sharing via NFS for communication of close
coordination between processes on different computers coordination between processes on different computers
can/cannot? be recommended. cannot be recommended.

127 128

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Securing NFS with Kerberos Performance


Kerberos: How to evaluate performance?
network authentication protocol, developed at MIT in the 1980s
provides authentication & security facilities for MIT campus
network

http://web.mit.edu/Kerberos

is default authentication service of Windows 2000


(see “Windows 2000 Kerberos Authentication” in microsoft.com)

129 130

Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

Performance Performance
How to evaluate performance? Measurements are taken regularly by Sun and other NFS
Analysis implementors
Measurement www.spec.org
Simulation

131 132

22
Case study: Sun Network File System (NFS) Case study: Sun Network File System (NFS)

NFS stateless server Spritely NFS: an enhancement


brings robustness and ease of implementation See V. Srinivasan and J.C. Mogul, “Spritely NFS: Experiments
But: concurrent writes to the same file may not give same with Cache-Consistency Protocols”, Proceedings of the twelfth
results as single Unix; and frequent getattr requests ACM symposium on Operating systems principles, pp 45 – 57,
1989

Enhancement?

133 134

Spritely NFS

With an addition of open and close calls


Client:
open specifies a mode (read, write, both) and include counts of
local processes currently have the file open for reading and for
writing.
close also sends the counts

135 136

Spritely NFS Spritely NFS

With an addition of open and close calls With an addition of open and close calls
Server: Server:
If open has write mode, it will fail if there is another write If open in read mode
If open in write mode, other “reading” clients will receive server sends message to writing client, instructing it to stop caching (i.e.
message to invalidate locally cached portions of file to use a strictly write-through mode)
and it instructs all reading clients to stop caching (so that all local read
calls result in a request to server).

137 138

23
Spritely NFS Spritely NFS

Advantage What did we learn?


One-copy semantics
It is possible to achieve one-copy semantics without
Some efficiency gains in handling cached writes
substantial loss of performance, albeit at the cost of some
Disadvantage extra implementation complexity and the need for a recovery
Carry client-related state at server mechanism to restore the state after a server crash.
Vulnerable to server crashes if state not saved.

It implements a recovery protocol that interrogates clients.

139 140

Other enhancements Other enhancements

WebNFS WebNFS
E.g., https://yanfs.dev.java.net (in Java) Background: some Internet applications (e.g. Java applets)
could benefit from direct access to NFS servers without many of
the overheads associated with the emulation of Unix

141 142

Other enhancements Other enhancements

WebNFS WebNFS
Aim: to enable web browsers, Java programs and other apps to To read a portion of a single file located on an NFS server that
interact with an NFS server directly supports WebNFS requires the establishment of a TCP
WebNFS server: at a well-known port number 2049 connection and two RPC calls

Compare with a Web server / HTTP ?

143 144

24
Other enhancements Other enhancements

WebNFS example: NFS version 4 and beyond …


A weather service might publish a file on its NFS server containing a
large database of frequently updated weather data with a URL such
as:
nfs://data.weather.gov/weatherdata/global.data
e.g. CITI: Projects: NFS Version 4 Open Source Reference
An interactive WeatherMap client, which displays weather maps, Implementation
could be constructed in Java or any other language that supports a http://www.citi.umich.edu/projects/nfsv4
WebNFS procedure library. The client reads only those portions of
the global.data file that are needed to construct the particular maps
requested by a user, whereas a similar application using HTTP
would either have to transfer the entire database or require the
support of a special-purpose server program to supply the data
requested.

145 146

Other enhancements Group discussions and exercises

NFS version 4 and beyond … To what extent does Sun NFS deviate from one-copy file
update semantics? Construct a scenario in which two user-
e.g. see multi-language demo in Open Solaris project level processes sharing a file would operate correctly in a
NFSv4.1 being developed ... single UNIX host but would observe inconsistencies when
running in different hosts.
http://hub.opensolaris.org/bin/view/Project+nfsv41/
http://hub.opensolaris.org/bin/view/Project+nfsv41/basics
http://hub.opensolaris.org/bin/view/Project+nfsv41/chinese_basics

NFSv4.1 pNFS OpenSolaris project:


http://opensolaris.org/os/project/nfsv41
147 148

Group discussions and exercises Group discussions and exercises

Sun NFS aims to support heterogeneous distributed What data must the NFS client module hold on behalf of
systems by the provision of an operating system- each user-level process?
independent file service. What are the key decisions that
the implementer of an NFS server for an operating system
other than UNIX would have to take? What constraints
should an underlying filing system obey to be suitable for
the implementation of NFS servers?

149 150

25
Group discussions and exercises Group discussions and exercises

How does the NFS Automounter help to improve the How many lookup calls are needed to resolve a 5-part
performance and scalability of NFS? pathname (for example, /usr/users/jim/code/xyz.c) for a
file that is stored on an NFS server? What is the reason for
performing the translation step-by-step?

151 152

Compare with The Andrew File System


(AFS: developed at Carnegie Mellon U)

Q: Compare the update semantics of UNIX when


accessing local files with those of NFS and AFS. Under
what circumstances might clients become aware of the
differences?

Ans:
UNIX: strict one-copy update semantics;
NFS: approximation to one-copy update semantics with a delay
(~3 seconds) in achieving consistency;
AFS: consistency is achieved only on close. Thus concurrent
updates at different clients will result in lost updates – the last
client to close the file wins.
153

26

You might also like