Solaris 10 - dps.uibk.ac.attf/lehre/ss07/bs/vorlesungen/Solaris Vortrag...Sun Microsystems GesmbH...

56
Sun Microsystems GesmbH Wienerbergstrasse 3/VII A- 1101 Wien Solaris 10 DI Gerald Hartl Account Manager for Education and Research 1

Transcript of Solaris 10 - dps.uibk.ac.attf/lehre/ss07/bs/vorlesungen/Solaris Vortrag...Sun Microsystems GesmbH...

Sun Microsystems GesmbH Wienerbergstrasse 3/VII A- 1101 Wien

Solaris 10

DI Gerald HartlAccount Manager for Education and Research

1

Agenda

• Short Solaris 10 Overview

• Introduction to Solaris Internals

• Memory

• File System

• Q&A

2

Short Solaris 10 Overview

3

• 1982 - Sun Microsystems Inc.

• 1984 - SunOS 1.0FFS from 4.2 BSD

Solaris 10

4

Solaris 10 InnovationsOverview

... and over 600 projecets

HighestAvailability

withPredictive

Self Healing

MaximumSecuritybased on

Trusted Solaris

OptimalMonitoring

withDTrace

Secure andEffective

Consolidationwith

SolarisContainern

ExtremePerformance

5

Solaris 10Same Ideas about Consolidation

Container 1:Web-Server

Container 2:App-Server

Container 3:Database

Memory PCI-E I/O

134GB/s Interconnect

Cor

e #1

Cor

e #2

Cor

e #3

L2 Cache

Cor

e #4

Cor

e #5

Cor

e #6

Cor

e #7

Cor

e #8

6

Container and Ultra/OpenSPARC T1Blade Shelf on a Chip

• Network consolidation on chip> Higher performance (chip bandwidth)

• Container can be assignedto cores> Optimize Resource

utilization

• Sandbox for application

Container 1:Web-Server

Container 2:App-Server

Container 3:Database

Memory PCI-E I/O

134GB/s Interconnect

Cor

e #1

Cor

e #2

Cor

e #3

L2 Cache

Cor

e #4

Cor

e #5

Cor

e #6

Cor

e #7

Cor

e #8

7

OS Virtualisation Trends

• More OS instances

> More administratin required

• Strong seperation

• Higher costs (HW or license)

More Flexibility

Stronger Seperation

Hardware Partitions Virtual Machines OS Virtualisation Resource Management

Dynamic SystemDomains

Solaris Container(Zones + SRM)

Solaris ResourceManager (SRM) VMware

Hardware Consolidation OS Consolidation

• Only one OS instance

> Simple administration

• Less seperation (HW)

• More flexibility

8

• Extreme reliability> No data without checksums> Selfhealing datastore

• Simple administration> Single line instead of scripts> Includes Volume Manager

• Highest capacity> 128bit filesystem

• High performance

• Add ons modules available

ZFS: The Ultimate Filesystem

9

The ZFS Idea

• Volume Manager andFilesystem> Reduce complexity> Simple administration> Increase resource utilization

• Innovative architecture> No filesystem check required> Mirroring, Snapshot, RAID-Z,

compression, ...

• Available for testing:

Server

ZFS

1

ZFS

2

ZFS

3

ZFS

4

ZFS Storage Pool

c0t0d0 c0t0d1 c0t2d0

10

Example Filesystem

/home/bob

c0t0d0 c0t1d0RAID1

/home/ann /home/sue

11

In the Past# format... (long interactive session omitted)

# metadb -a -f disk1:slice0 disk2:slice0

# metainit d10 1 1 disk1:slice1d10: Concat/Stripe is setup# metainit d11 1 1 disk2:slice1d11: Concat/Stripe is setup# metainit d20 -m d10d20: Mirror is setup# metattach d20 d11d20: submirror d11 is attached

# metainit d12 1 1 disk1:slice2d12: Concat/Stripe is setup# metainit d13 1 1 disk2:slice2d13: Concat/Stripe is setup# metainit d21 -m d12d21: Mirror is setup# metattach d21 d13d21: submirror d13 is attached

# metainit d14 1 1 disk1:slice3d14: Concat/Stripe is setup# metainit d15 1 1 disk2:slice3d15: Concat/Stripe is setup# metainit d22 -m d14d22: Mirror is setup# metattach d22 d15d22: submirror d15 is attached

# newfs /dev/md/rdsk/d20newfs: construct a new file system /dev/md/rdsk/d20: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d20 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d20 /dev/md/rdsk/d20 /export/home/ann ufs 2 yes -

# newfs /dev/md/rdsk/d21newfs: construct a new file system /dev/md/rdsk/d21: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d21 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d21 /dev/md/rdsk/d21 /export/home/bob ufs 2 yes -

# newfs /dev/md/rdsk/d22newfs: construct a new file system /dev/md/rdsk/d22: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d22 /export/home/sue# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d22 /dev/md/rdsk/d22 /export/home/sue ufs 2 yes -

# format... (long interactive session omitted)# metattach d12 disk3:slice1d12: component is attached# metattach d13 disk4:slice1d13: component is attached# metattach d21# growfs -M /export/home/bob /dev/md/rdsk/d21/dev/md/rdsk/d21:... (many pages of 'superblock backup' output omitted)

12

With ZFS

• Create a storage pool named “home”# zpool create home mirror c0t0d0 c0t1d0

• Create a filesysteme for “ann”, “bob” and “sue”# zfs create home/ann# zfs create home/bob# zfs create home/sue

• Add new disk to pool# zpool add home mirror c1t0d0 c1t1d0

13

http://www.opensolaris.org/os/ http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src

14

Introduction to Solaris Internals

15

Solaris Kernel Architecture

Source: Solaris Internals, page 15

16

Global Thread Priorities

Source: Solaris Internals, page 18

Lightweight Process (LWP)The kernel visible execution context for a user thread

17

Global Thread Priorities

Source: Solaris Internals, page 22

18

Solaris Resource Management

Source: Solaris Internals, page 35

19

Zones in Solaris

Source: Solaris Internals, page 36

20

Components of a Process

Source: Solaris Internals, page 44

21

Process Structures

Source: Solaris Internals, page 55

22

Thread States

Source: Solaris Internals, page 157

23

Processor Abstractions

Source: Solaris Internals, page 162

24

Processor Abstractions

• CPU partitions

• Processor sets

• Resource pools

• Locality groups (lgroups, MPO)Solaris 9, Memory Placement Optimization

Source: Solaris Internals, page 162

25

Memory

26

Virtual to Physical Memory Management

Source: Solaris Internals, page 449

27

Solaris Virtual Memory Layers

Source: Solaris Internals, page 445

28

Virtual Address Spaces

• Executable textbinary, read only with execute permissions

• Executable datamapped read/write/private

• Heap spacememory allocated by malloc()

• Process stackanonymous memory and is mapped read/write

Source: Solaris Internals, page 457

29

Virtual Address Spaces

Source: Solaris Internals, page 457

30

Address Space Layout - UltraSPARC

Source: Solaris Internals, page 459

31

Address Space Layout - x86 & x64

Source: Solaris Internals, page 459

32

The Stack

Solaris Version Maximum Heap Size Notes

Solaris x86 32bit mode 2GBytes by default

Boot option kernel basecan be moved to allowlarger process addressspace

Solaris x64 64bit mode 16EBytes Virtually unlimited

SPARC 64bit mode 16TBytes on UltraSPARC I/II16EBytes Virtually unlimited

Source: Solaris Internals, page 462

33

Memory Mapped Files

Source: Solaris Internals, page 463

34

Tracing the VM System

sol10# ./vm.d <pid>sol10# more vm.d

:::BEGIN{ start = timestamp;}

syscall:::/$target == pid/{ trace((timestamp - start) / 1000);}

::add_physmem:,::sptcreate:,...::sptdestroy:,::va_to_pfn:/$target == pid/{ trace((timestamp - start) / 1000);}

Source: Solaris Internals, page 466

35

Tracing the VM System

0 => munmap 31940 -> as_unmap 31990 -> as_findseg 32060 <- as_findseg 32090 -> segvn_unmap 32110 -> segvn_lockop 32170 <- segvn_lockop 32190 -> hat_unload_callback 32210 -> page_get_pagesize 32360 <- page_get_pagesize 32370 -> hat_page_setattr 32390 <- hat_page_setattr 32400 -> free_vp_pages 32470 -> page_share_cnt 32520 -> hat_page_getshare 32550 <- hat_page_getshare 32560 <- page_share_cnt 32580 <- free_vp_pages 32590 <- hat_unload_callback 32610 -> seg_free 32630 -> as_removeseg 32650 <- as_removeseg 32700 -> segvn_free 3272...

Source: Solaris Internals, page 466

36

The Address Space

Source: Solaris Internals, page 467

37

Page Faults in Address Spaces

Source: Solaris Internals, page 473

38

Segment Drivers

Source: Solaris Internals, page 476

39

The vnode Segment seg_vn

Source: Solaris Internals, page 481

• Executable text

• Executable data

• Heap and stack (anonymous memory)

• Shared libraries

• Mapped files

40

The vnode Segment seg_vn

Source: Solaris Internals, page 481

41

Anonymous Memory

Source: Solaris Internals, page 485

42

Virtual Memory Watchpoints

Source: Solaris Internals, page 492

43

File System

44

File System Framework

Source: Solaris Internals, page 657

45

Process Level File Abstractions

Source: Solaris Internals, page 658

46

Virtual File System (vfs) Interface

Source: Solaris Internals, page 675

47

The mount Method

Source: Solaris Internals, page 681

48

The Mounted vfs List

Source: Solaris Internals, page 684

49

The vnode

Source: Solaris Internals, page 685

50

The Life Cycle of a vnode

Source: Solaris Internals, page 696

51

File System I/O

Source: Solaris Internals, page 707

52

read() and write() System Calls

Source: Solaris Internals, page 709

53

vop_read() segmap Interaction

Source: Solaris Internals, page 710

54

The Directory Name Lookup Cache

Source: Solaris Internals, page 726

55

Sun Microsystems GesmbH Wienerbergstrasse 3/VII A- 1101 Wien

Solaris 10Q&A

DI Gerald HartlAccount Manager for Education and Research

56