2009 04.s10-admin-topics3
-
Upload
desmond-devendran -
Category
Technology
-
view
1.953 -
download
0
Transcript of 2009 04.s10-admin-topics3
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Solaris 10 Administration Topics Workshop3 - File Systems
By Peter Baer Galvin
For UsenixLast Revision April 2009
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
About the Speaker
Peter Baer Galvin - 781 273 4100
www.cptech.com
My Blog: www.galvin.info
Bio
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions.
2
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ObjectivesCover a wide variety of topics in Solaris 10
Useful for experienced system administrators
Save time
Avoid (my) mistakes
Learn about new stuff
Answer your questions about old stuff
Won't read the man pages to you
Workshop for hands-on experience and to reinforce concepts
Note – Security covered in separate tutorial
3
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
More Objectives
What makes novice vs. advanced administrator?
Bytes as well as bits, tactics and strategy
Knows how to avoid trouble
How to get out of it once in it
How to not make it worse
Has reasoned philosophy
Has methodology
4
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Prerequisites
Recommend at least a couple of years of Solaris experience
Or at least a few years of other Unix experience
Best is a few years of admin experience, mostly on Solaris
5
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
About the Tutorial
Every SysAdmin has a different knowledge set
A lot to cover, but notes should make good reference
So some covered quickly, some in detail
Setting base of knowledge
Please ask questions
But let’s take off-topic off-line
Solaris BOF6
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Fair WarningSites vary
Circumstances vary
Admin knowledge varies
My goals
Provide information useful for each of you at your sites
Provide opportunity for you to learn from each other
7
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Why Listen to Me
8
20 Years of Sun experienceSeen much as a consultantHopefully, you've used:
My Usenix ;login: column
The Solaris Corner @ www.samag.com
The Solaris Security FAQ
SunWorld “Pete's Wicked World”
SunWorld “Pete's Super Systems”
Unix Secure Programming FAQ (out of date)
Operating System Concepts (The Dino Book), now 8th ed
Applied Operating System Concepts
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Slide Ownership
As indicated per slide, some slides copyright Sun Microsystems
Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee
9
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
OverviewLay of the Land
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Schedule
11
Times and Breaks
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Coverage
Solaris 10+, with some Solaris 9 where needed
Selected topics that are new, different, confusing, underused, overused, etc
12
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Outline
Overview
Objectives
Choosing the most appropriate file system(s)
UFS / SDS
Veritas FS / VM (not in detail)
ZFS
13
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Polling Time
Solaris releases in use?
Plans to upgrade?
Other OSes in use?
Use of Solaris rising or falling?
SPARC and x86
OpenSolaris?
14
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Your Objectives?
15
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Lab PreparationHave device capable of telnet on the USENIX network
Or have a buddy
Learn your “magic number”
Telnet to 131.106.62.100+”magic number”
User “root, password “lisa”
It’s all very secure
16
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Lab Preparation
Or...
Use virtualbox
Use your own system
Use a remote machine you have legit access to
17
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Choosing the Most Appropriate File Systems
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Choosing the Most Appropriate File Systems
Many file systems, many not optional (tmpfs et al)
Where you have choice, how to choose?
Consider
Solaris version being used
< S10 means no ZFS
ISV support
For each ISV make sure desired FS is supported
Apps, backups, clustering
Priorities
Now weigh priorities of performance, reliability, experience, features, risk / reward
19
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Consider...Pros and cons of mixing file systems
Root file system
Not much value in using vxfs / vxvm here unless used elsewhere
Interoperability (need to detach from one type of system and attach to another?)
Cost
Supportability & support model
Non-production vs. production use20
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
The Crux of Performance
Root Disk Mirroring
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
•Root disk mirroring•ZFS
22
Topics
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Root Disk MirroringComplicated because
Must be bootable
Want it protected from disk failure
And want the protection to work
Can increase or decrease upgrade complexity
Veritas
Live upgrade23
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Manual MirroringVxvm encapsulation can cause lack of availability
Vxvm needs a rootdg disk
Any automatic mirroring can propagate errors
ConsiderUse disksuite (Solaris Volume Manager) to mirror boot disk
Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror copy
Or use 10Mb rootdg on 2 boot disks in disksuite to do the mirroring
Best of all worlds – details in column at www.samag.com/solaris
24
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Manual Mirroring
25
Sometimes want more than no mirroring, less than real mirroringThus "manual mirroring"
Nightly cron job to copy partitions elsewhereCan be used to duplicate root disk, if installboot usedCombination of newfs, mount, ufsdump | ufsrestoreQuite effective, useful, and cheap
Easy recovery from corrupt root image, malicious error, sysadmin error
Has saved at least one clientBut disk failure can require manual interventionComplete script can be found at www.samag.com/solaris
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Best Practice – Root DiskHave 4 disks for root!
1st is primary boot device
2nd is disksuite mirror of first
3rd is manual mirror of 1st
4th is manual mirror, kept on a shelf!
Put nothing but systems files on these disks (/, /var, /opt, /usr, swap)
26
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Aside: Disk Performance
27
Which is faster?
73GB drive10000 RPM
3Gb/sec
Which is faster?
300GB drive10000 RPM
3Gb/sec
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
UFS / SDS
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
UFS OverviewStandard Pre-Solaris 10 file system
Many years old, updated continously
But still showing its age
No integrated volume manager, instead use SDS (disk suite)
Very fast, but feature poor
For example snapshots exist but only useful for backups
Painful to manage, change, repair
29
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Features64-bit pointers
16TB file systems (on 64-bit Solaris)
1TB maximum file size
metadata logging (by default) increases performance and keeps file systems (usually) consistent after a crash
Lots of ISV and internal command (dump) support
Only bootable Solaris file system (until S10 10/08)
Dynamic multipathing, but via separate “traffic manager” facility
30
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
IssuesSometimes there is still corruption
Need to run fsck
Sometimes it fails
Many limits
Many features lacking (compared to ZFS)
Lots of manual administration tasks
format to slice up a disk
newfs to format the file system, fsck to check it
mount and /etc/vfstab to mount a file system
share commands, plus svcadm commands, to NFS export
Plus separate volume management31
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Volume ManagementSeparate set of commands (meta*) to manage volumes (RAID et al)
For example, to mirror the root file system
Have 2 disks with identical partitioning
Have 2 small partition per disk for meta-data (here slices 5 and 6)
newfs the file systems
Create meta-data state databases (at least 3, for quorum)
# metadb -a /dev/dsk/c0t0d0s5
# metadb -a /dev/dsk/c0t0d0s6
# metadb -a /dev/dsk/c0t1d0s5
# metadb -a /dev/dsk/c0t1d0s6
32
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Volume Management (cont)Initialize submirrors (components of mirrors) and mirror the partitions - here we do /, swap, and /var
# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d0 -m d10
Make the new / bootable
# metaroot d0
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d1 -m d11
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d4 -m d14
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d7 -m d1733
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Volume Management (cont)
Update /etc/vfstab to reflect new meta devices
/dev/md/dsk/d1 - - swap - no -
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes -
/dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes -
Finally attach the submirror to each device to be mirrored
# metattach d0 d20
# metattach d1 d21
# metattach d4 d24
# metattach d7 d27
Now the root disk is mirrored, and commands such as Solaris upgrade, live upgrade, and boot understand that
34
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Veritas VM / FS
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
OverviewA popular, commercial addition to Solaris
64-bit
Integrated volume management (vxfs + vxvm)
Mirrored root disk via “encapsulation”
Good ISV support
Good extended features such as snapshots, replication
Shrink and grow file systems
Extent based (for better and worse), journaled, clusterable
Cross-platform36
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
FeaturesVery large limits
Dynamic multipathing included
Hot spares to automatically replace failed disks
Dirty region logging (DRL) volume transaction logs for fast recovery from crash
But still can require consistency check
37
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Issues$$$
Adds supportability complexities (who do you call)
Complicates OS upgrades (unencapsulate first)
Fairly complex to manage
Comparison of performance vs. ZFS at http://www.sun.com/software/whitepapers/solaris10/zfs_veritas.pdf
38
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFSLooks to be the “next great thing”Shipped officially in S10U2 (the 06/06 release)From scratch file system
Includes volume management, file system, reliability, scalability, performance, snapshots, clones, replication128-bit file system, almost everything is “infinite”Checksumming throughoutSimple, endian independent, export/importable…Still using traffic manager for multipathing
(some following slides are from ZFS talk by Jeff Bonwick and Bill Moore – ZFS team leads at Sun)
40
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Trouble with Existing FilesystemsNo defense against silent data corruption
Any defect in disk, controller, cable, driver, or firmware can corrupt data silently; like running a server without ECC memory
Brutal to manageLabels, partitions, volumes, provisioning, grow/shrink, /etc/vfstab...
Lots of limits: filesystem/volume size, file size, number of files, files per directory, number of snapshots, ...
Not portable between platforms (e.g. x86 to/from SPARC)
Dog slowLinear-time create, fat locks, fixed block size, naïve prefetch, slow random writes, dirty region logging
41
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Design PrinciplesPooled storage
Completely eliminates the antique notion of volumes
Does for storage what VM did for memory
End-to-end data integrityHistorically considered “too expensive”
Turns out, no it isn't
And the alternative is unacceptable
Transactional operationKeeps things always consistent on disk
Removes almost all constraints on I/O order
Allows us to get huge performance wins42
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Why “volumes” ExistIn the beginning, each filesystem managed a single disk
Customers wanted more space, bandwidth, reliability
Rewrite filesystems to handle many disks: hard
Insert a little shim (“volume”) to cobble disks together: easy
An industry grew up around the FS/volume model
Filesystems, volume managers sold as separate products
Inherent problems in FS/volume interface can't be fixed43
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Traditional Volumes
FS
Volume(stripe)
FS
Volume(mirror)
44
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Pools
Abstraction: malloc/free
No partitions to manage
Grow/shrink automatically
All bandwidth always available
All storage in the pool is shared
45
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Pooled Storage
FS
Storage Pool(RAIDZ)
FS FS
Storage Pool(Mirror)
FS FS
46
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 47
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Data Integrity ModelEverything is copy-on-write
Never overwrite live dataOn-disk state always valid – no “windows of vulnerability”No need for fsck(1M)
Everything is transactionalRelated changes succeed or fail as a wholeNo need for journalingEverything is checksummedNo silent data corruptionNo panics due to silently corrupted metadata
48
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 49
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 50
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 51
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 52
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 53
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 54
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 55
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 56
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 57
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 58
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 59
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 60
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 61
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 62
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 63
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 64
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 65
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 66
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 67
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 68
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 69
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 70
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 71
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
TermsPool - set of disks in one or more RAID formats (i.e. mirrored stripe)
No “/”
File system - mountable-container of files
Data set - file system, block device, snapshot, volume or clone within a pool
Named via pool/path[@snapshot]
72
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Terms (cont)ZIL - ZFS intent log
On-disk duplicate of in-memory log of changes to make to data sets
Write goes to memory, ZIL, is acknowledged, then goes to disk
ARC - in-memory read cache
L2ARC - level 2 ARC - on flash memory
73
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
What ZFS doesn’t doCan’t remove individual devices from pools
Rather, replace the device, or 3-way mirror including the device and then remove the device
Can’t shrink a pool (yet)
Can add individual devices, but not optimum (yet)
If adding disk to RAIDZ or RAIDZ2, then end up with RAIDZ(2)+ 1 concatenated device
Instead add full RAID elements to a pool
Add a mirror pair or RAIDZ(2) set
74
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool# zpool
missing command
usage: zpool command args ...
where 'command' is one of the following:
create [-fn] [-o property=value] ...
[-O file-system-property=value] ...
[-m mountpoint] [-R root] <pool> <vdev> ...
destroy [-f] <pool>
add [-fn] <pool> <vdev> ...
remove <pool> <device> ...
list [-H] [-o property[,...]] [pool] ...
iostat [-v] [pool] ... [interval [count]]
status [-vx] [pool] ...
online <pool> <device> ...
offline [-t] <pool> <device> ...
clear <pool> [device]75
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont) attach [-f] <pool> <device> <new-device>
detach <pool> <device>
replace [-f] <pool> <device> [new-device]
scrub [-s] <pool> ...
import [-d dir] [-D]
import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] -a
import [-o mntopts] [-o property=value] ...
[-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id> [newpool]
export [-f] <pool> ...
upgrade
upgrade -v
upgrade [-V version] <-a | pool ...>
history [-il] [<pool>] ...
get <"all" | property[,...]> <pool> ...
set <property=value> <pool> 76
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont)# zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0
# zpool status -v
pool: ezfs
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
ezfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
errors: No known data errors
77
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont)
pool: zfs
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
raidz ONLINE 0 0 0
c0d0s7 ONLINE 0 0 0
c0d1s7 ONLINE 0 0 0
c1d1 ONLINE 0 0 0
c1d0 ONLINE 0 0 0
errors: No known data errors
78
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont)
(/)# zpool iostat -v capacity operations bandwidthpool used avail read write read write---------- ----- ----- ----- ----- ----- -----bigp 630G 392G 2 4 41.3K 496K raidz 630G 392G 2 4 41.3K 496K c0d0s6 - - 0 2 8.14K 166K c0d1s6 - - 0 2 7.77K 166K c1d0s6 - - 0 2 24.1K 166K c1d1s6 - - 0 2 22.2K 166K---------- ----- ----- ----- ----- ----- -----
79
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont)# zpool status -v
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c0d0s0 ONLINE 0 0 0
c0d1s0 ONLINE 0 0 0
errors: No known data errors
pool: zpbg
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpbg ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
errors: No known data errors
80
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont) zpool iostat -v
capacity operations bandwidthpool used avail read write read write---------- ----- ----- ----- ----- ----- -----
rpool 6.72G 225G 0 1 9.09K 11.6K mirror 6.72G 225G 0 1 9.09K 11.6K
c0d0s0 - - 0 0 5.01K 11.7K c0d1s0 - - 0 0 5.09K 11.7K---------- ----- ----- ----- ----- ----- -----
zpbg 3.72T 833G 0 0 32.0K 1.24K raidz1 3.72T 833G 0 0 32.0K 1.24K
c4t0d0 - - 0 0 9.58K 331 c4t1d0 - - 0 0 10.3K 331 c5t0d0 - - 0 0 10.4K 331
c5t1d0 - - 0 0 10.3K 331 c6t0d0 - - 0 0 9.54K 331
---------- ----- ----- ----- ----- ----- -----
81
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool (cont)
Note that for import and export, a pool is the delineator
You can’t import or export a file system because it’s an integral part of a pool
Might cause you to use smaller pools than other
82
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs# zfs
missing command
usage: zfs command args ...
where 'command' is one of the following:
create [-p] [-o property=value] ... <filesystem>
create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume>
destroy [-rRf] <filesystem|volume|snapshot>
snapshot [-r] [-o property=value] ... <filesystem@snapname|volume@snapname>
rollback [-rRf] <snapshot>
clone [-p] [-o property=value] ... <snapshot> <filesystem|volume>
promote <clone-filesystem>
rename <filesystem|volume|snapshot> <filesystem|volume|snapshot>
rename -p <filesystem|volume> <filesystem|volume>
rename -r <snapshot> <snapshot>
83
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont) list [-rH] [-o property[,...]] [-t type[,...]] [-s
property] ...
[-S property] ... [filesystem|volume|snapshot] ... set <property=value> <filesystem|volume|snapshot> ...
get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume|
snapshot] ... inherit [-r] <property> <filesystem|volume|snapshot> ...
upgrade [-v] upgrade [-r] [-V version] <-a | filesystem ...>
mount mount [-vO] [-o opts] <-a | filesystem>
unmount [-f] <-a | filesystem|mountpoint> share <-a | filesystem> unshare [-f] <-a | filesystem|mountpoint>
84
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont) send [-R] [-[iI] snapshot] <snapshot>
receive [-vnF] <filesystem|volume|snapshot>
receive [-vnF] -d <filesystem>
allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...]
<filesystem|volume>
allow [-ld] -e <perm|@setname>[,...] <filesystem|volume>
allow -c <perm|@setname>[,...] <filesystem|volume>
allow -s @setname <perm|@setname>[,...] <filesystem|volume>
unallow [-rldug] <"everyone"|user|group>[,...]
[<perm|@setname>[,...]] <filesystem|volume>
unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>
unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem|volume>
Each dataset is of the form: pool/[dataset/]*dataset[@name]
For the property list, run: zfs set|get
For the delegated permission list, run: zfs allow|unallow
85
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont)# zfs get
missing property argument
usage:
get [-rHp] [-o field[,...]] [-s source[,...]]
<"all" | property[,...]> [filesystem|volume|snapshot] ...
The following properties are supported:
PROPERTY EDIT INHERIT VALUES
available NO NO <size>
compressratio NO NO <1.00x or higher if compressed>
creation NO NO <date>
mounted NO NO yes | no
origin NO NO <snapshot>
referenced NO NO <size>
type NO NO filesystem | volume | snapshot
used NO NO <size>
aclinherit YES YES discard | noallow | restricted | passthrough
aclmode YES YES discard | groupmask | passthrough
atime YES YES on | off
86
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont) canmount YES NO on | off | noauto casesensitivity NO YES sensitive | insensitive | mixed
checksum YES YES on | off | fletcher2 | fletcher4 | sha256
compression YES YES on | off | lzjb | gzip | gzip-[1-9]
copies YES YES 1 | 2 | 3
devices YES YES on | off
exec YES YES on | off
mountpoint YES YES <path> | legacy | none
nbmand YES YES on | off
normalization NO YES none | formC | formD | formKC | formKD
primarycache YES YES all | none | metadata
quota YES NO <size> | none
readonly YES YES on | off
recordsize YES YES 512 to 128k, power of 2
refquota YES NO <size> | none
refreservation YES NO <size> | none
reservation YES NO <size> | none
87
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont) secondarycache YES YES all | none | metadata
setuid YES YES on | off
shareiscsi YES YES on | off | type=<type>
sharenfs YES YES on | off | share(1M) options
sharesmb YES YES on | off | sharemgr(1M) options
snapdir YES YES hidden | visible
utf8only NO YES on | off
version YES NO 1 | 2 | 3 | current
volblocksize NO YES 512 to 128k, power of 2
volsize YES NO <size>
vscan YES YES on | off
xattr YES YES on | off
zoned YES YES on | off
Sizes are specified in bytes with standard units such as K, M, G, etc.
User-defined properties can be specified by using a name containing a colon (:).
88
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont)(/)# zfs list
NAME USED AVAIL REFER MOUNTPOINTbigp 630G 384G - /zfs/bigpbigp/big 630G 384G 630G /zfs/bigp/big
(root@sparky)-(7/pts)-(06:35:11/05/05)-(/)# zfs snapshot bigp/big@5-nov
(root@sparky)-(8/pts)-(06:35:11/05/05)-(/)# zfs listNAME USED AVAIL REFER MOUNTPOINT
bigp 630G 384G - /zfs/bigpbigp/big 630G 384G 630G /zfs/bigp/big
bigp/big@5-nov 0 - 630G /zfs/bigp/big@5-nov
# zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/big@5-nov
# zfs send -i 5-nov big/bigp@6-nov | ssh host \
zfs receive poolB/received/big
89
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs (cont)# zpool history
History for 'zpbg':2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0
c11t0d0 c12t0d0 c13t0d0
2006-04-03.18:19:48 zfs receive zpbg/imp2006-04-03.18:41:39 zfs receive zpbg/home
2006-04-03.19:04:22 zfs receive zpbg/photos2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home2006-04-03.19:44:22 zfs receive zpbg/mail
2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail2006-04-03.20:14:32 zfs receive zpbg/mqueue
2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/mqueue
# zfs create -V 2g tank/volumes/v2# zfs set shareiscsi=on tank/volumes/v2# iscsitadm list targetTarget: tank/volumes/v2 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-cf9a72aa062a Connections: 0
90
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool history -l
Shows user name, host name, and zone of command
# zpool history -l users
History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 [user root on corona:global] 2008-07-10.09:43:13 zfs create users/marks [user root on corona:global] 2008-07-10.09:43:44 zfs destroy users/marks [user root on corona:global] 2008-07-10.09:43:48 zfs create users/home [user root on corona:global] 2008-07-10.09:43:56 zfs create users/home/markm [user root on corona:global] 2008-07-10.09:44:02 zfs create users/home/marks [user root on corona:global]
91
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zpool history -i
Shows zfs internal activities - useful for debugging
# zpool history -i users
History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 2008-07-10.09:43:13 [internal create txg:6] dataset = 21 2008-07-10.09:43:13 zfs create users/marks 2008-07-10.09:43:48 [internal create txg:12] dataset = 27 2008-07-10.09:43:48 zfs create users/home 2008-07-10.09:43:55 [internal create txg:14] dataset = 33
92
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Delegate AdminUse zfs allow and zfs unallow to grant and remove permissions
Use “delegation” property to manage if delegation enabled
Then delegate# zfs allow cindys create,destroy,mount,snapshot tank/cindys
# zfs allow tank/cindys ------------------------------------------------------------- Local+Descendent permissions on (tank/cindys) user cindys create,destroy,mount,snapshot -------------------------------------------------------------
# zfs unallow cindys tank/cindys # zfs allow tank/cindys
93
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS - Odds and Endszfs get all will display all set attributes of all ZFS file systems
Recursive snapshots (via -r) as of S10 8/07
zfs clone makes a RW copy of a snapshot
zfs promote sets the root of the file system to be the specified clone
You can undo a zpool destroy with zpool import -D
As of S10 8/07 ZFS is integrated with FMA
As of S10 11/06 ZFS supports double-RAID parity
94
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS “GUI”
Did you know that Solaris has an admin GUI?
Webconsole enabled by default
Turn off via svcadm if not used
By default (on Nevada B64 at least) ZFS only on-by-default feature
95
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 96
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 97
ZFS Automatic SnapshotsIn Nevada 100 (LSARC 2008/571) - will be in OpenSolaris 2008.11
SMF service and GNOME app
Can take automatic scheduled snapshots
By default all zfs file systems, at boot, then every 15 minutes, every hour, every day, etc
Auto delete of oldest snapshots if user-defined amount of space is not available
Can perform incremental or full backups via those snapshots
Nautilus integration allows user to browse and restore files graphically
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 98
ZFS Automatic Snapshots (cont)
One SMF service per time frequency:
frequent snapshots every 15 mins, keeping 4 snapshots
hourly snapshots every hour, keeping 24 snapshots
daily snapshots every day, keeping 31 snapshots
weekly snapshots every week, keeping 7 snapshots
monthly snapshots every month, keeping 12 snapshots
Details here: http://src.opensolaris.org/source/xref/jds/zfs-snapshot/README.zfs-auto-snapshot.txt
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 99
ZFS Automatic Snapshots (cont)
Service properties provide more details
zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to true, so to take frequent snapshots of tank/timf, run the following zfs command:
# zfs set com.sun:auto-snapshot:frequent=true tank/timf
The "snap-children" property is ignored when using this fs-name value. Instead, the system automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system, based on the values of the ZFS user properties.
zfs/interval [ hours | days | months | none]
When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to manually fire the method script whenever they want - useful for snapshotting on system events.
zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot. Setting to "all" keeps all snapshots.
zfs/period How often you want to take snapshots, in intervals set according to "zfs/interval" (eg. every 10 days)
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 100
ZFS Automatic Snapshots (cont)zfs/snapshot-children "true" if you would like to recursively take snapshots of all child filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//'
zfs/backup [ full | incremental | none ]
zfs/backup-save-cmd The command string used to save the backup stream.
zfs/backup-lock You shouldn't need to change this - but it should be set to "unlocked" by default. We use it to indicate when a backup is running.
zfs/label A label that can be used to differentiate this set of snapshots from others, not required. If multiple schedules are running on the same machine, using distinct labels for each schedule is needed - otherwise oneschedule could remove snapshots taken by another schedule according to it's snapshot-retention policy. (see "zfs/keep")
zfs/verbose Set to false by default, setting to true makes the service produce more output about what it's doing.
zfs/avoidscrub Set to false by default, this determines whether we should avoid taking snapshots on any pools that have a scrub or resilver in progress. More info in the bugid:
6343667 need itinerary so interrupted scrub/resilver doesn't have to start over
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Automatic Snapshot (cont)
http://blogs.sun.com/erwann/resource/menu-location.png
101
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Automatic Snapshot (cont)
If life-preserver icon enabled in file browser, then backup of directory is available
Press to bring up nav bar
102
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Automatic Snapshot (cont)Drag slider into past to show previous version of files in the directory
Then right-click on afile and select “Restore to Desktop” if you want it back
More features coming
Press to bring up nav bar103
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 104
ZFS StatusNetbackup, Legato support ZFS for backup / restore
VCS supports ZFS as file system of clustered services
Most vendors don’t care which file system app runs on
Performance as good as other file systems
Feature set better
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Futures
Support by ISVsBackup / restore
Some don’t get metadata (yet)
Use zfs send to emit file containing filesystem
Clustering (see Lustre)
Performance still a work in progress
Being ported to BSD, Mac OS Leopard
Check out the ZFS FAQ at http://www.opensolaris.org/os/community/zfs/faq/
105
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS PerformanceFrom http://www.opensolaris.org/jive/thread.jspa?messageID=14997
billm
Reply On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote:> Does ZFS reorganize (ie. defrag) the files over time?
Not yet.
> If it doesn't, it might not perform well in "write-little read-much"> scenarios (where read performance is much more important than write> performance).
As always, the correct answer is "it depends". Let's take a look atseveral cases:
- Random reads: No matter if the data was written randomly orsequentially, random reads are random for any filesystem,regardless of their layout policy. Not much you can do tooptimize these, except have the best I/O scheduler possible.
106
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Performance (cont)
- Sequential writes, sequential reads: With ZFS, sequential writeslead to sequential layout on disk. So sequential reads willperform quite well in this case.
- Random writes, sequential reads: This is the most interestingcase. With random writes, ZFS turns them into sequential writes,which go *really* fast. With sequential reads, you know whichorder the reads are going to be coming in, so you can kick offa bunch of prefetch reads. Again, with a good I/O scheduler(which ZFS just happens to have), you can turn this into good readperformance, if not entirely as good as totally sequential.
Believe me, we've thought about this a lot. There is a lot we can do toimprove performance, and we're just getting started.
107
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Performance (cont)For DBs and other direct-disk-access-wanting applications
There is no direct I/O in ZFS
But can get very good performance by matching I/O size of the app (e.g. Oracle uses 8K) with recordsize of zfs file system
This is set at filesystem create time108
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Performance (cont)The ZIL can be a bottleneck on NFS servers
NFS does sync writes
Put the ZIL on another disk, or on SSD
ZFS aggressively uses memory for caching
Low priority user, but can cause temporary conflicts with other users
Use arcstat to monitor memory usehttp://www.solarisinternals.com/wiki/index.php/Arcstat
109
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Backup ToolZetaback is a thin-agent based ZFS backup tool
Runs from a central host
Scans clients for new ZFS filesystems
Manages varying desired backup intervals (per host) for
full backups
incremental backups
Maintain varying retention policies (per host)
Summarize existing backups
Restore any host:fs backup at any point in time to any target host
https://labs.omniti.com/trac/zetaba110
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
zfs upgradeOn-disk format of ZFS changes over time
Forward-upgradeable, but not backward compatible
Watch out when attaching and detaching zpools
Also “sent” not readable by older zfs versions# zfs upgrade
This system is currently running ZFS filesystem version 2. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any ’zfs send’ streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 1 datab 1 datab/users 1 datab/users/area51
111
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Automatic Snapshots and Backups
Unsupported services, may become supportedhttp://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_10
http://blogs.sun.com/timf/entry/zfs_automatic_for_the_people
112
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS - Smashing!
http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18
113
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Storage Odds and Endsiostat -y shows performance info on multipathed devices
raidctl is RAID configuration tool for multiple RAID controllers
fsstat file-system based stat command# fsstat -F
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
0 0 0 0 0 0 0 0 0 0 0 ufs
0 0 0 26.0K 0 52.0K 354 4.71K 1.56M 0 0 proc
0 0 0 0 0 0 0 0 0 0 0 nfs
53.2K 1.02K 24.0K 8.99M 48.6K 4.26M 161K 44.8M 11.8G 23.1M 6.58G zfs
0 0 0 2.94K 0 0 0 0 0 0 0 lofs
7.26K 2.84K 4.30K 31.5K 83 35.4K 6 40.5K 41.3M 45.6K 39.2M tmpfs
0 0 0 410 0 0 0 33 11.0K 0 0 mntfs
0 0 0 0 0 0 0 0 0 0 0 nfs3
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 0 0 0 0 0 0 0 0 autofs
114
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes
http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html
Example 1: ZFS Filesystem
Objectives:
Understand the purpose of the ZFS filesystem.
Configure a ZFS pool and filesystem.
Requirements:
A server (SPARC or x64 based) running the OpenSolaris OS.
Configuration details from the running server.
Step 1: Identify your Disks.
Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here:
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t2d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0
1. c0t3d0
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0
Specify disk (enter its number): ^D
115
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Add your disks to your ZFS pool.
# zpool create -f mypool c0t3d0s0
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 10G 94K 10.0G 0% ONLINE -
Step 3: Create a filesystem in your pool.
# zfs create mypool/myfs
# df -h /mypool/myfs
Filesystem size used avail capacity Mounted on
mypool/myfs 9.8G 18K 9.8G 1% /mypool/myfs
116
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 2: Network File System (NFS)
Objectives:
Understand the purpose of the NFS filesystem.
Create an NFS shared filesystem on a server and mount it on a client.
Requirements:
Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS.
Configuration details from the running systems.
Step 1: Create the NFS shared filesystem on the server.
Switch on the NFS service on the server:
# svcs nfs/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/nfs/server:default
# svcadm enable nfs/server
Share the ZFS filesystem over NFS:
# zfs set sharenfs=on mypool/myfs
# dfshares
RESOURCE SERVER ACCESS TRANSPORT
x4100:/mypool/myfs x4100 - -
117
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Switch on the NFS service on the client.
This is similar to the the procedure for the server:
# svcs nfs/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/nfs/client:default
# svcadm enable nfs/client
Mount the shared filesystem on the client:
# mkdir /mountpoint
# mount -F nfs x4100:/mypool/myfs /mountpoint
# df -h /mountpoint
Filesystem size used avail capacity Mounted on
x4100:/mypool/myfs 9.8G 18K 9.8G 1% /mountpoint
118
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 3: Common Internet File System (CIFS)
Objectives:
Understand the purpose of the CIFS filesystem.
Configure a CIFS share on one machine (from the previous example) and make it available on the other machine.
Requirements:
Two servers (SPARC or x64 based) running the OpenSolaris OS.
Configuration details provided here.
Step 1: Create a ZFS filesystem for CIFS.
# zfs create -o casesensitivity=mixed mypool/myfs2
# df -h /mypool/myfs2
Filesystem size used avail capacity Mounted on
mypool/myfs 2 9.8G 18K 9.8G 1% /mypool/myfs2
Step 2: Switch on the SMB Server service on the server.
# svcs smb/server
STATE STIME FMRI
disabled 6:49:39 svc:/network/smb/server:default
# svcadm enable smb/server
119
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Share the filesystem using CIFS.
# zfs set sharesmb=on mypool/myfs2
Verify using the following command:
# zfs get sharesmb mypool/myfs2
NAME PROPERTY VALUE SOURCE
mypool/myfs2 sharesmb on local
Step 4: Verify the CIFS naming.
Because we have not explicitly named the share, we can examine the default name assigned to it using the following command:
# sharemgr show -vp
default nfs=()
zfs
zfs/mypool/myfs nfs=()
/mypool/myfs
zfs/mypool/myfs2 smb=()
mypool_myfs2=/mypool/myfs2
Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown.
Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS.
Add the following line to the end of the file:
other password required pam_smb_passwd.so.1 nowarn
120
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 6: Change the password using the passwd command.
# passwd username
New Password:
Re-enter new Password:
passwd: password successfully changed for root
Now repeat Steps 5 and 6 on the Solaris client.
Step 7: Enable CIF client services on the client node.
# svcs smb/client
STATE STIME FMRI
disabled 6:47:03 svc:/network/smb/client:default
# svcadm enable smb/client
121
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 8: Make a mount point on the client and mount the CIFS resource from the server.
Mount the resource across the network and check it using the following command sequence:
# mkdir /mountpoint2
# mount -F smbfs //root@x4100/mypool_myfs2 /mountpoint2
Password: *******
# df -h /mountpoint2
Filesystem size used avail capacity Mounted on
//root@x4100/mypool_myfs2 9.8G 18K 9.8G 1% /mountpoint2
# df -n
/ : ufs
/mountpoint : nfs
/mountpoint2 : smbfs
122
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 4: Comstar Fibre Channel Target
Objectives
Understand the purpose of the Comstar Fibre Channel target.
Configure an FC target and initiator on two servers.
Requirements:
Two servers (SPARC or x64 based) running the OpenSolaris OS.
Configuration details provided here.
Step 1: Start the SSCSI Target Mode Framework and verify it.
Use the following commands to start up and check the service on the host that provides the target:
# svcs stmf
STATE STIME FMRI
disabled 19:15:25 svc:/system/device/stmf:default
# svcadm enable stmf
# stmfadm list-state
Operational Status: online
Config Status : initialized
123
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Ensure that the framework can see the ports.
Use the following command to ensure that the target mode framework can see the HBA ports:
# stmfadm list-target -v
Target: wwn.210000E08B909221
Operational Status: Online
Provider Name : qlt
Alias : qlt0,0
Sessions : 4
Initiator: wwn.210100E08B272AB5
Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B072AB5
Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60
Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008
124
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Target: wwn.210100E08BB09221
Operational Status: Online
Provider Name : qlt
Alias : qlt1,0
Sessions : 4
Initiator: wwn.210100E08B272AB5
Alias: ute198:qlc1
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210100E08B296A60
Alias: ute198:qlc3
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B072AB5
Alias: ute198:qlc0
Logged in since: Thu Mar 27 16:38:30 2008
Initiator: wwn.210000E08B096A60
Alias: ute198:qlc2
Logged in since: Thu Mar 27 16:38:30 2008
125
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Create a device to use as storage for the target.
Use ZFS to create a volume (zvol) for use as the storage behind the target:
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 68G 94K 68.0G 0% ONLINE -
# zfs create -V 5gb mypool/myvol
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 5.00G 61.9G 18K /mypool
mypool/myvol 5G 66.9G 16K -
126
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 4: Register the zvol with the framework.
The zvol becomes the SCSI logical unit (disk) behind the target:
# sbdadm create-lu /dev/zvol/rdsk/mypool/myvol
Created the following LU:
GUID DATA SIZE SOURCE
6000ae4093000000000047f3a1930007 5368643584 /dev/zvol/rdsk/mypool/myvol
Confirm its existence as follows:
# stmfadm list-lu -v
LU Name: 6000AE4093000000000047F3A1930007
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/mypool/myvol
View Entry Count : 0
127
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs.
Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303
128
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs.
Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port
HBA Port WWN: 25000003ba0ad303
Port Mode: Initiator
Port ID: 1
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.1.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 24000003ba0ad303
. . .
129
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA ports to it.
Name the group mygroup:
# stmfadm create-hg mygroup
# stmfadm list-hg
Host Group: mygroup
Add the WWNs of the ports to the group:
# stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 \
wwn.210100E08B296A60 \
wwn.210100E08B272AB5 \
wwn.210000E08B072AB5
Now check that everything is in order:
# stmfadmlist-hg-member -v -g mygroup
With the host group created, you're now ready to export the logical unit. This is accomplished by adding a view entry to the logical unit using this host group, as shown in the following command:
# stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007
130
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 7: Check the visibility of the targets on the initiator host.
First, force the devices on the initiator host to be rescanned with a simple script:
#!/bin/ksh
fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln
do
fcinfo remote-port -p $ln -s >/dev/null 2>&1
done
The disk exported over FC should then appear in the format list:
# format
Searching for disks...done
c6t6000AE4093000000000047F3A1930007d0: configured with capacity of 5.00GB
131
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Build an OpenSolaris Storage Server in 10 Minutes - cont
...
partition> p
Current partition table (default):
Total disk cylinders available: 20477 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 511 128.00MB (512/0/0) 262144
1 swap wu 512 - 1023 128.00MB (512/0/0) 262144
2 backup wu 0 - 20476 5.00GB (20477/0/0) 10484224
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 1024 - 20476 4.75GB (19453/0/0) 9959936
7 unassigned wm 0 0 (0/0/0) 0
partition>
132
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS RootSolaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file system (as does OpenSolaris)
Note that you can’t as of U6 flash archive a ZFS root system(!)
Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and upgrading there, then booting there
lucreate to copy the primary BE to create an alternate BE
# zpool create mpool mirror c1t0d0s0 c1t1d0s0
# lucreate -c c1t2d0s0 -n zfsBE -p mpool
The default file systems are created in the specified pool and the non-shared file systems are then copied into the root pool
Run luupgrade to upgrade the alternate BE (optional)
Run luactivate on the newly upgraded alternatve BE so that when the system is rebooted, it will be the new primary BE
# luactivate zfsBE
133
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Life is goodOnce on ZFS as root, life is good
Mirror the root disk with 1 command (if not mirrored):# zpool attach rpool c1t0d0s0 c1t1d0s0
Note that you have to manually do an installboot on the mirrored disk
Now consider all the ZFS features, used on the boot disk
Snapshot before patch, upgrade, any change
Undo change via 1 command
Replicate to another system for backup, DR
. . .
134
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS LabsWhat pools are available in your zone?
What are their states?
What is their performance like?
What ZFS file systems?Create a new file systemCreate a file thereTake a snapshot of that file systemDelete the fileRevert to the file system state as of the snapshotHow do you see the contents of a snapshot?
135
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ZFS Final ThoughtEric Schrock's Weblog - Thursday Nov 17, 2005
UFS/SVM vs. ZFS: Code Complexity
A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields:
UFS: kernel= 46806 user= 40147 total= 86953
SVM: kernel= 75917 user=161984 total=237901
TOTAL: kernel=122723 user=202131 total=324854
ZFS: kernel= 50239 user= 21073 total= 71312
The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years...
136
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 137
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Where to Learn MoreCommunity: http://www.opensolaris.org/os/community/zfs
Wikipedia: http://en.wikipedia.org/wiki/ZFS
ZFS blogs: http://blogs.sun.com/main/tags/zfs
ZFS ports
Apple Mac: http://developer.apple.com/adcnews
FreeBSD: http://wiki.freebsd.org/ZFS
Linux/FUSE: http://zfs-on-fuse.blogspot.com
As an appliance: http://www.nexenta.com
Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/features/articles/zfs_overview.jsp
138
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Sun Storage 7x10
139
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Speaking of Futures
140
The future of Sun storage?
Announced 11/10/2008
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 141Sun Confidential: Internal Only 10
Most Scalable Storage System Design
• Hybrid Flash Storage Pools> Data is intelligently placed in
DRAM, Flash or DIsk> Transparently Managed as one
storage pool> Optimizes $/GB and $/IOP
performance
• Enterprise Grade Flash> 3-5 year lifetime
Read/L2ARC
SSDsWrite/
ZIL SSDs
HDD Pool (SATA)
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 142Sun Confidential: Internal Only 35
Latency Comparison
1nS
10nS
100nS
1uS
10uS
100uS
1mS
10mS
100mS
1 S
TAPE
HDD
FLASH/SSD
DRAM
CPU
Bridging the DRAM to HDD Gap
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 143
Sun Confidential: Internal Only 12
Raw Capacity(TB)
ZFS Hybrid Pool Example
Hybrid Storage Pool (DRAM + Read SSD + Write SSD + 5x 4200 RPM SATA)
Traditional Storage Pool (DRAM + 7x 10K RPM 2.5”)
Storage Power(Watts)
Read IOPs Write IOPs Cost
3.2x
11%
4%4.9x
2x
Based on Actual Benchmark Results
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 144
Full Compliment of Storage SoftwareIncluded with the system at no additional cost
Data Protocols
Data Services
AdditionalManagementAdditionalManagementData Protocols Data Services Data Management
• NFS v3 and v4• CIFS• ISCSI• HTTP• WebDAV• FTP• NDMP v4• FC Target (Roadmap)• InfiniBand (Roadmap)• SNMP
• Write Flash Acceleration• Read Flash Acceleration• RAID-Z DP (6)• Mirroring• Striping• Active-active Clustering• Remote Replication• Antivirus Quarantine• Snapshots
(r/o, r/w, unlimited)• Compression
• DTrace Analytics• Self-healing system
and data• Simple out-of-the-box
setup• Secure Browser UI
and CLI• Advanced Networking• NIS, LDAP, and AD• Users, Rolls• Dashboard• Alerts• Phone Home• Scripting• Upgrade
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 145Sun Confidential: Internal Only 27
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 146Sun Confidential: Internal Only 7
Providing UnprecedentedStorage Analytics• Automatic real-time visualization of
application and storage related workloads
• Simple yet sophisticated instrumentationprovides real-time comprehensive analysis
• Supports multiple simultaneous applicationand workload analysis in real- time
• Analysis can be saved, exported andreplayed for further analysis.
• Built on DTrace instrumentation > NFSv3, NFSv4, CIFS, iSCSI
> ZFS and the Solaris i/o path
> CPU and Memory Utilization
> Networking (TCP, UDP, IP)
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 147Sun Confidential: Internal Only 8
ANSWERING KEY QUESTIONS
“What is CPU and Memory Utilization?”
“How much storage is being utilized?”
“How is disk performing? How many Ops/Sec?”
“What Services are active?”
“Which applications/users are causing performance issues?”
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 148
Data ServicesZFS - Continued
• ZFS Useable Space" Market Leading Usable Space
Double Parity RAID
Double Parity RAID
Wide Stripes
Mirrored Single Parity RAID
Striped
72% 83% 42% 60% 90%
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved 149
Sun Storage 7000 Unified Storage Systems Price, Performance, Capacity and Availability
721048x 3.5” SATAII DisksUp to 46TB total storage Hybrid Storage Pool with Write optimized SSD
711016x2.5”SAS Disks, 2.3TBStandard Storage Pool SSD is not used
Capacity
Pri
ce
/ Performance
7410288 x 3.5” SATAII DisksUp to 287TB* total storageHybrid Storage Pool with Read and Write optimized SSD
*Up to 575TB soon after release
7410 Cluster
288 x 3.5” SATAII DisksUp to 287TB* total storage Hybrid Storage Pool with Read / Write optimized SSD
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
ReferencesYou Are Now Free to Move About
Solaris
150
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References [Kozierok] TCP/IP Guide, No Starch Press, 2005 [Nemeth] Nemeth et al, Unix System Administration
Handbook, 3rd edition, Prentice Hall, 2001 [SunFlash] The SunFlash announcement mailing list
run by John J. Mclaughlin. News and a whole lot more. Mail [email protected]
Sun online documents at docs.sun.com [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995
151
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
[O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002
[McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)
[Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001
152
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) [Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)
[McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books)
153
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) Subscribe to the Firewalls mailing list by sending
"subscribe firewalls <mailing-address>" to [email protected]
USENIX membership and conferences. Contact USENIX office at (714)588-8649 or [email protected]
Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com
Solaris 2 FAQ by Casper Dik: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ
154
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)Sun Managers Mailing List FAQ by John DiMarco: ftp://ra.mcs.anl.gov/sun-managers/faq
Sun's unsupported tool site (IPV6, printing)http://playground.sun.com/
Sunsolve STBs and Infodocshttp://www.sunsolve.com
155
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) comp.sys.sun.* FAQ by Rob Montjoy: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq
“Cache File System” White Paper from Sun: http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris-whitepapers.html
“File System Organization, The Art of Automounting” by Sun: ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps
Solaris 2 Security FAQ by Peter Baer Galvinhttp://www.sunworld.com/common/security-faq.html
Secure Unix Programming FAQ by Peter Baer Galvinhttp://www.sunworld.com/swol-08-1998/swol-08-security.html
156
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued) Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/Comp.answers/firewalls-faq
There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/solaris2Peter’s Solaris Corner at SysAdmin Magazinehttp://www.samag.com/solaris
Marcus and Stern, Blueprints for High Availability, Wiley, 2000
Privilege Bracketing in Solaris 10http://www.sun.com/blueprints/0406/819-6320.pdf
157
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
Peter Baer Galvin's Sysadmin Column (and old Pete's Wicked World security columns, etc)http://www.galvin.info
My blog at http://pbgalvin.wordpress.comOperating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling http://www.sun.com/blueprints (March 2000)Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’http://www.sun.com/bigadmin
158
Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved
References (continued)
DTracehttp://users.tpg.com.au/adsln4yb/dtrace.html
http://www.solarisinternals.com/si/dtrace/index.php
http://www.sun.com/bigadmin/content/dtrace/
159
Saturday, May 2, 2009