A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel...

32
Page 1 of John Wong CTO [email protected] Twin Peaks Software Inc. Mirror File System A Multiple Server File System

Transcript of A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel...

Page 1: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 1 of

John WongCTO

[email protected]

Twin Peaks Software Inc.

Mirror File System

A Multiple Server File System

Page 2: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 2 of

Multiple Server File System

• Conventional File System – EXT3/UFSand NFS– Manage files on a single server and its storage

devices

• Multiple Server File system– Manage files on multiple servers and their

storage devices

Page 3: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 3 of

Problems

• Single resource is vulnerable

• Redundancy provides a safety net

– Disk level => RAID

– Storage level => Storage Replication

– TCP/IP level => SNDR

– File System level => CFS, MFS

– System level => Clustering system

– Application => Database

Page 4: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 4 of

Why MFS?

• Many advantages over existingtechnologies

Page 5: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 5 of

Local File System

Data

EXT3

Application 1

Application 2

Kernel Space

User Space

Disk Driver

EXT3 manages file on the local server’s storage devices

Page 6: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 6 of

Application Application

NFS (Client mount)

Application Application

Data

EXT3/UFS

NFSD

Network File SystemClient Server

NFS manages file on remote server’s storage devices

Page 7: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 7 of

rsync, tar

NFS (Clientmount)

Application Application

Data B

EXT3/UFS

NFSD

EXT3 | NFS

EXT3/UFS

Data B

Client Server

Application

Applications can only use either one, not both.

Page 8: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 8 of

• Combine these two file systems to manage file onboth local and remote servers storage devices

-- at the same time

-- in real time

EXT3 + NFS ??

Page 9: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 9 of

Application Application

Data

EXT3/UFS

Application Application

Data

EXT3/UFS

Passive MFS Server

MFS = EXT3 + NFSActive MFS Server

MFS

NFS

User Space

Kernel Space

Page 10: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 10 of

• MFS is a kernel loadable module

- loaded on top of EXT3/UFS and NFS

• Standard VFS interface

• Provide Complete Transparency - to users and applications - to underlining file systems

Building Block Approach

Page 11: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 11 of

Q & A

Application Application

Data A

Application Application

Data B

MFS MFS

Page 12: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 12 of

• Building block approach

-- Building upon existing EXT3, NFS, NTFS, CIFS infrastructures

• No metadata is replicated

-- Superblock, Cylinder group, file allocation map are not replicated.

• Every file write operation is checked by file system

-- file consistency, integrity

• Live file, not raw data replication

-- The primary and backup copy both are live files

Advantages

Page 13: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 13 of

• Interoperability

-- Two nodes can be different systems

-- Storage systems can be different

• Small granularity

-- Directory level, not entire file system

• One to many or many to one replication

Advantages

Page 14: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 14 of

• Fast replication-- Replication in Kernel file system module

• Immediate failover-- No need to fsck and mount operation

• Geographically dispersed clustering

-- Two nodes can be separated by hundreds of miles

• Easy to deploy and manage-- Only one copy of MFS running on primary server is needed for replication

Advantages

Page 15: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 15 of

Why MFS?

• Better Data Protection

• Better Disaster Recovery

• Better RAS

• Better Scalability

• Better Performance

• Better Resources Utilization

Page 16: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 16 of

File System Framework

SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL

Opticaldrive

Network

File System Operation calls

User Applications

System Call Interface

File Operation SystemCalls Other System calls

read

()

wri

te (

)

op

en (

)

clo

se (

)

mkd

ir (

)

rmd

ir (

)

link

()

ioct

l ()

crea

t ()

lsee

k ()

mo

un

t ()

um

ou

nt

()

Sta

tfs(

)

syn

c ()

Vnode interfaces VFS interfaces

UF

S (

2)

NF

S (

2)

VxF

S

HS

FS

QF

S

UF

S (

1)

NF

S (

1)

PC

FS P

CF

S

Data Data Data Data

File System Operation calls

Page 17: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 17 of

MFS FrameworkUser Applications

System Call Interface

File Operation System Calls File System Operation callsOther System calls

read ()

wri

te (

)

op

en (

)

clo

se (

)

mkd

ir (

)

rmd

ir (

)

link

()io

ctl

()

crea

t ()

lsee

k ()

mo

un

t ()

um

ou

nt

()

Sta

tfs(

)

syn

c()

Vnode interfaces VFS interfaces

UF

S(2

)

NF

S(2

)VxF

S

HS

FS

QF

S

UF

S(1

)

NF

S (

1)

PC

FS

PC

FS

Network Optical driveData Data Data Data

MFS

Vnode VFS interface

Page 18: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 18 of

• Transparent to users and applications

- No re-compilation or re-link needed

• Transparent to existing file structures

- Same pathname access

• Transparent to underlying file systems

- UFS, NFS

Transparency

Page 19: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 19 of

• Conventional Mount

- One directory, one file system

• MFS Mount

- One directory, two or more file systems

Mount Mechanism

Page 20: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 20 of

# mount –F mfs host:/ndir1/ndir2 /udir1/udir2

- First mount the NFS on a UFS directory

- Then mount the MFS on top of UFS and NFS

- Existing UFS tree structure /udir1/udir2becomes a local copy of MFS

- Newly mounted host:/ndir1/ndir2 becomes aremote copy of MFS

- Same mount options as NFS except no ‘-o hard’option

Mount Mechanism

Page 21: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 21 of

# /usr/lib/fs/mfs/mfsck mfs_dir

- After MFS mount succeeds, the local copy maynot be identical to the remote copy.

- Use mfsck (the MFS fsck) to synchronize them.

- The mfs_dir can be any directory under MFSmount point.

- Multiple mfsck commands can be invoked at thesame time.

MFS mfsck Command

Page 22: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 22 of

READ/WRITE Vnode Operation

• All VFS/vnode operations received by MFS

• READ related operation: read, getattr,…. those operations only need to go to local copy

(UFS).

• WRITE related operation: write, setattr,…..those operations go to both local (UFS) and remote(NFS) copy simultaneously (using threads)

Page 23: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 23 of

• Directory Level

- Mirror any UFS directory instead of entire UFSfile system

- Directory A mirrored to Server A

- Directory B mirrored to Server B

• Block Level Update

- Only changed block is mirrored

Mirroring Granularity

Page 24: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 24 of

# /usr/lib/fs/mfs/msync mfs_root_dir

- A daemon that synchronizes MFS pair after aremote MFS partner fails.

- Upon a write failure, MFS:

- Logs name of file to which the write operation failed

- Starts a heartbeat thread to verify the remote MFSserver is back online

- Once the remote MFS server is back online, msyncuses the log to sync missing files to remote server.

MFS msync Command

Page 25: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 25 of

Active/Active ConfigurationServer Server

Application Application

Data A

UFS

Application Application

Data B

UFS

Active MFS Server

MFS MFS

Active MFS Server

NFS NFS

Page 26: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 26 of

MFS uses UFS, NFS file record lock.

Locking is required for the active-activeconfiguration.

Locking enables write-related vnode operations asatomic operations.

Locking is enabled by default.

Locking is not necessary in active-passiveconfiguration.

MFS Locking Mechanism

Page 27: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 27 of

• Real-time

-- Replicate file in real-time

• Scheduled

-- Log file path, offset and size

-- Replicate only changed portion of a file

Real -Time and Scheduled

Page 28: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 28 of

• Online File Backup

• Server File Backup, active passive

• Server/NAS Clustering, active Active

Applications

Page 29: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 29 of

Application Application

Data

NTFS

Application Application

Data

NTFS

Remote Server

MFS = NTFS + CIFS

Window Desktop/Laptop

MFS

CIFS

Page 30: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 30 of

MFS

UserDesktop/Laptop

Online File BackupReal-time or Scheduled time

Folder

ISP Server

MFS

Folder

MFS

Folder

LAN or WanLAN or Wan

Page 31: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 31 of

Secondary

Server Replication

Mirror FileSystem

Mirror FileSystem

App

Primary

Email

Mirror FileSystem

Mirroring Path : /home

: /var/spool/mail

Heartbeat

Page 32: A Multiple Server File System - USENIX...File System Framework SOLARIS Internal, Core Kernel Architecture, Jim Mauro. Richard McDougall, PRENTICE HALL Optical drive Network File System

Page 32 of

Enterprise Clusters

Mirror FileSystem

Mirror FileSystem

Mirroring PathAppApp App App App

Mirror FileSystem

Mirror FileSystem

Central

Mirror File System

App App