Caching in Distributed File System

56
Caching in Distributed File System Ke Wang CS614 – Advanced System Apr 24, 2001

description

Caching in Distributed File System. Ke Wang CS614 – Advanced System Apr 24, 2001. Key requirements of distributed system. Scalability from small to large networks Fast and transparent access to geographically Distributed File System(DFS) - PowerPoint PPT Presentation

Transcript of Caching in Distributed File System

Page 1: Caching                   in      Distributed File System

Caching in Distributed File System

Ke WangCS614 – Advanced System

Apr 24, 2001

Page 2: Caching                   in      Distributed File System

Key requirements of distributed system Scalability from small to large networks Fast and transparent access to

geographically Distributed File System(DFS)

Information protection Ease of administration Wide support from variety of vendors

Page 3: Caching                   in      Distributed File System

Background DFS -- a distributed implementation of a

file system, where multiple users share files and storage resources.

Overall storage space managed by a DFS is composed of different, remotely located, smaller storage spaces

There is usually a correspondence between constituent storage spaces and sets of files

Page 4: Caching                   in      Distributed File System

DFS Structure Service - a software entity providing a

particular type of function to client Server - service software running on a

single machine Client - process that can invoke a

service using a set of operations that forms its client interface

Page 5: Caching                   in      Distributed File System

Why caching? Retaining most recently accessed disk

blocks. Repeated accesses to a block in cache

can be handled without involving the disk.

Advantages - Reduce delays

- Reduce contention for disk arm

Page 6: Caching                   in      Distributed File System

Caching in DFS Advantages

Reduce network traffic Reduce server contention

Problems Cache-consistency

Page 7: Caching                   in      Distributed File System

Stuff to consider Cache location (disk vs. memory) Cache Placement (client vs. server) Cache structure (block vs. file) Stateful vs. Stateless server Cache update policies Consistency Client-driven vs. Server-driven protocols

Page 8: Caching                   in      Distributed File System

Practical Distributed System NFS: Sun’s Network File System AFS: Andrew File System (CMU) Sprite FS: File System for the

Sprite OS ( UC Berkeley)

Page 9: Caching                   in      Distributed File System

Sun’s Network File System(NFS)

Page 10: Caching                   in      Distributed File System

Sun’s Network File System(NFS) Originally released in 1985 Build on top of an unreliable

datagram protocol UDP (change to TCP now)

Client-server model

Page 11: Caching                   in      Distributed File System

Andrew File System(AFS) Developed at CMU since 1983 Client-server model Key software: Vice and Venus Goal: high scalability (5,000-

10,000 nodes)

Page 12: Caching                   in      Distributed File System

Andrew File System(AFS)

Page 13: Caching                   in      Distributed File System

Andrew File System(AFS) VICE is a multi-threaded server process

with each thread handling a single client request

VENUS is the client process that runs on each workstation which forms the interface with VICE

User-level processes

Page 14: Caching                   in      Distributed File System

Prototype of AFS One process for one client Client cache file Verify timestamp every open

-> a lot of interaction with server -> heavy network traffic

Page 15: Caching                   in      Distributed File System

Improve AFS To improve prototype

Reduce cache validity check Reduce server processes Reduce network traffic

Higher scalability!

Page 16: Caching                   in      Distributed File System

Sprite File System Designed for networked

workstation with large physical memories

(can be diskless) Expect memory of 100-500Mbytes Goal: high performance

Page 17: Caching                   in      Distributed File System

Caches in Sprite FS

Page 18: Caching                   in      Distributed File System

Caches in Sprite FS(cont) When a process makes a file access, it

is presented first to the cache(file traffic). If not satisfied, request is passed either to a local disk, if the file is stored locally(disk traffic), or to the server where the file is stored(server traffic). Servers also maintain caches to reduce disk traffic.

Page 19: Caching                   in      Distributed File System

Caching in Sprite FS Two unusual aspects

Guarantee complete consistent view

Concurrent write sharing Sequential write sharing

Cache size varies dynamically

Page 20: Caching                   in      Distributed File System

Cache LocationDisk vs. Main Memory Advantages of disk caches

More Reliable Cached data are still there during

recovery and don’t need to be fetched again

Page 21: Caching                   in      Distributed File System

Cache LocationDisk vs. Main Memory(cont) Advantages of main-memory caches:

Permit workstations to be diskless More quick access Server caches(used to speed up disk

I/O) are always in main memory; using main-memory caches on the clients permits a single caching mechanism for servers and users

Page 22: Caching                   in      Distributed File System

Cache PlacementClient vs. Server Client cache reduce network traffic

Read-only operations on unchanged files do not need go over the network

Server cache reduce server load Cache is amortized across all clients

( but needs to be bigger to be effective)

In practice, need BOTH!

Page 23: Caching                   in      Distributed File System

Cache structure Block basis

Simple Sprite FS, NFS

File basis Reduce interaction with servers

AFS Cannot access files larger than cache

Page 24: Caching                   in      Distributed File System

Compare NFS: client memory(disk), block

basis AFS: client disk, file basis Sprint FS: client memory, server

memory, block basis

Page 25: Caching                   in      Distributed File System

Stateful vs. Stateless Server Stateful – Servers hold information

about the client

Stateless – Servers maintain no state information about clients

Page 26: Caching                   in      Distributed File System

Stateful Servers Mechanism

Client opens a file Server fetches information about the

file from its disk, store in memory, gives client a unique connection id and open file

id is used for subsequent accesses until the session ends

Page 27: Caching                   in      Distributed File System

Stateful Servers(cont) Advantages:

Fewer disk access Read-ahead possible RPCs are small, contains only an id File may be cached entirely on client,

invalidated by the server if there is a conflicting write

Page 28: Caching                   in      Distributed File System

Stateful Servers(cont) Disadvantage:

Server loses all its volatile state in crash

Restore state by dialog with clients, or abort operations that underway when crash occurred

Server needs to be aware of client failures

Page 29: Caching                   in      Distributed File System

Stateless Server Each request must be self-

contained Each request identifies the file and

position in the file No need to establish and terminate

a connection by open and close operations

Page 30: Caching                   in      Distributed File System

Stateless Server(cont) Advantage

A file server crash does not affect clients

Simple Disadvantage

Impossible to enforce consistency RPC needs to contain all state, longer

Page 31: Caching                   in      Distributed File System

Stateful vs. Stateless AFS and Sprite FS are stateful

Sprite FS servers keep track of which clients have which files open

AFS servers keep track of the contents of client’s caches

NFS is stateless

Page 32: Caching                   in      Distributed File System

Cache Update Policy Write-through

Delayed-write

Write-on-close (variation of delayed-write)

Page 33: Caching                   in      Distributed File System

Cache Update Policy(cont) Write-through – all writes be

propagated to stable storage immediately

Reliable, but poor performance

Page 34: Caching                   in      Distributed File System

Cache Update Policy(cont) Delayed-write – modification

written to cache and then written through to server later

Write-on-close – modification written back to server when file close Reduces intermediate read and write

traffic while file is open

Page 35: Caching                   in      Distributed File System

Cache Update Policy(cont) Pros for delayed-write/write-on-close

Lots of files have lifetimes of less than 30s Redundant writes are absorbed Lots of small writes can be batched into

larger writes Disadvantage:

Poor reliability; unwritten data may be lost when client crash

Page 36: Caching                   in      Distributed File System

Caching in AFS Key to Andrew’s scalability Client cache entire file in disk Write-on-close

Server load and network traffic reduced Contacts server only on open and close Retain across reboots Require local disk, large enough

Page 37: Caching                   in      Distributed File System

Cache update policy NFS and Sprite delayed-write

Delay 30 seconds AFS write-on-close

Reduce traffic to server dramatically Good scalability of AFS

Page 38: Caching                   in      Distributed File System

Consistency Is locally cached copy of data

consistent with the master copy?

Is there danger of “stale” data?

Permit concurrent write sharing?

Page 39: Caching                   in      Distributed File System

Sprite:Complete Consistency Concurrent Write Share

A file open on multiple clients At least one client write

Server detects Require write back to server Invalidate open cache

Page 40: Caching                   in      Distributed File System

Sprite:Complete Consistency Sequential Write Sharing

A file modified, closed, opened by others

Out-of-date blocks Compare version number with server

Current data in other’s cache Keep track of last writer

Page 41: Caching                   in      Distributed File System

AFS: session semantics Session semantics in AFS

Writes to an open file invisible to others

Once file closed, changes visible to new opens anywhere

Other file operations visible immediately

Only guarantee sequential consistency

Page 42: Caching                   in      Distributed File System

Consistency Sprite guarantees complete

consistency AFS uses session semantics NFS not guarantee consistency

NFS is stateless. All operations involve contacting the server; if server is unreachable, read & write cannot work

Page 43: Caching                   in      Distributed File System

Client-driven vs. Server-driven Client-driven approach

Client initiates validity check Server check whether the local data

are consistent with master copy Server-driven approach

Server records files client caches When server detect inconsistency, it

must react

Page 44: Caching                   in      Distributed File System

AFS: server-driven Callback (key to scalability)

Cache valid if have callback on Server notify before modification When reboot, all suspect reduces cache validation

requests to server

Page 45: Caching                   in      Distributed File System

Client-driven vs. Server-driven AFS is server-driven (callback)

Contributes to AFS’s scalability Whole file caching and session

semantics also help NFS and Sprite are client-driven

Increased load on network and server

Page 46: Caching                   in      Distributed File System

AFS:Effect on scalability

Page 47: Caching                   in      Distributed File System

Sprite:Dynamic cache size Make client cache as large as

possible Virtual memory and file system

negotiate Compare age of oldest page Two problems

Double caching Multiblock pages

Page 48: Caching                   in      Distributed File System

Why not callback in Sprite?

Page 49: Caching                   in      Distributed File System

Why not callback in Sprite? Estimated improvement is small Reason

Andrew is user-level process Sprite is kernel-level implementation

Page 50: Caching                   in      Distributed File System

Comparison

Page 51: Caching                   in      Distributed File System

Performance – running time

Page 52: Caching                   in      Distributed File System

Performance – running time Use Andrew benchmark Sprite system is fastest

Kernel-to-kernel PRC Delayed write Kernel implementation (AFS is user-

level)

Page 53: Caching                   in      Distributed File System

Performance – CPU utilization

Page 54: Caching                   in      Distributed File System

Performance – CPU utilization Use Andrew benchmark Andrew system showed greatest

scalability File-based cache Server-driven Use of callback

Page 55: Caching                   in      Distributed File System

Nomadic Caching New issues

If client become disconnected? Weakly connected(by modem)?

Violate key property: transparency!

Page 56: Caching                   in      Distributed File System

Nomadic Caching Cache misses may impede

progress Local update invisible remotely Update conflict Update vulnerable to loss, damage

Coda file system