What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests...

32
What's new in Linux 2.6? Dr. Ulrich Weigand Linux for zSeries Development, IBM Lab Böblingen [email protected]

Transcript of What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests...

Page 1: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

What's new in Linux 2.6?

Dr. Ulrich WeigandLinux for zSeries Development, IBM Lab Bö[email protected]

Page 2: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

TimelineNew features overviewScalability enhancementsThreading model and futexesDevice model and device configuration

Agenda

Page 3: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Linux 2.6

TimelineJanuary 1999: Linux 2.2.0 releasedMay 1999: Start of 2.3.x development (2.2.8)January 2001: Linux 2.4.0 releasedNovember 2001: Start of 2.5.x development (2.4.15)October 2002: Feature freeze for 2.6February 2003: Current version 2.5.61Estimated release of Linux 2.6.0: mid-2003

Page 4: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Source: Guillaume Boissierehttp://www.kernelnewbies.org/status/

Linux 2.6: Overvie

Page 5: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

New featuresPlatform and device supportFile systems and volume managersNetwork protocolsEliminate system limits

Performance and scalability enhancementsSchedulerMemory managementBlock I/O layerSMP scalability

Backports to 2.4 kernels

Linux 2.6: Overview

Page 6: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

New architecturesPowerPC 64-bit (ppc64)AMD 64-bit (x86_64)ucLinux (MMU-less processors: v850, m68knommu)User Mode Linux

New devicesNew input device / frame buffer layersALSA (Advanced Linux Sound Architecture)Video for Linux v2New IDE layer, Serial ATA support

Platform and device support

Page 7: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Support for new file systemsIBM JFSSGI XFSNFS v4Andrew File System (AFS)ReiserFS v4 (planned)

Other enhancementsDevice mapper infrastructure (LVM2, EVMS)Extended Attribute / Access Control List (ACL) supportLarge directory support for ext2/ext3Zero-copy NFS

File systems

Page 8: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Networking enhancementsSCTP (Stream Control Transmission Protocol)TCP segmentation offloadIPsec support and CryptoAPIImproved IPv6 supportBluetooth support

Networking

Page 9: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Removal of hard limitsNumber of processes/threads: 64k -> 2GBlock device limit: 2TB -> 16TB / 8EBNumber of groups per process: 32 -> unlimited (planned)Major/minor numbers: 256 -> 4k/1M (planned)

SMP scalabilityReduce use of Big Kernel LockEliminate global locks (I/O request, IRQ, task list)Per-CPU data structures

Scalability

Page 10: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

O(1) scheduler

Authors: Ingo Molnar et al.Design of old scheduler

Global run-queue holds all runnable processesReschedule scans full run-queue to find next process to runTime-slice recalculation after all slices have been consumed

ProblemsReschedule slow when run-queue is longRecalculation loop slow, trashes cacheSMP scalability issues

Page 11: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

O(1) scheduler (cont.)

Design goals for new schedulerO(1) algorithms: wakeup, schedule, timer interruptScale to large number of processes/threadsPerfect SMP scalabilityProcessor affinity (incl. NUMA/SMT support)Keep good interactive performanceKeep good performance with few runnable processesKeep features: priorities, RT scheduling, CPU binding

ImplementationActive/expired per-CPU priority arrays as run-queueLoad balancing between CPUs done by migration threads

Page 12: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Process AUser

Process AKernel Handler Scheduler Process B

User

SVC IRQ Wakeup

Process AKernel

SVC Exit

Process AUser Handler Scheduler Process B

User

Latency

Resched

IRQ Wakeup Resched

Authors: Robert Love, Andrew Morton, et al.Latency problem

Kernel preemption / low latency

Page 13: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Kernel preemption / low latency (cont.)

Proposed solutionsKernel code yields voluntarily ('low latency patches')Kernel code get preempted involuntarilyCurrently implemented: both

Preemption blockersInterrupts (hard and soft)Kernel SMP critical section (spinlock, per-CPU data etc.)Scheduler (and other core routines)

Design issuesAvoid large-scale code changesAvoid throughput vs. latency trade-off

Page 14: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Memory management

Authors: Rik van Riel, Andrew Morton, et al.Reverse mapping problem

Physical Page

VM A VM B VM C

Reverse Mapping

Page 15: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Memory management (cont.)

Advantages of reverse mappingsEasy to unmap page from all address spacesPage replacement scans based on physical pagesLess CPU spent inside memory managerLess fragile behaviour under extreme load

Challenges with reverse mappingsOverhead to set up rmap structuresOut of memory while allocating rmap?

Page 16: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

I/O scalability

Authors: Jens Axboe, Andrew Morton, et al.Block I/O layer

Manages all access to block devicesQueues/merges/remaps block read/write requestsImplements 'buffer cache'

Problems in 2.4Shortcomings of 'buffer head' data structureLarge I/O, vectored I/O, raw I/O, async I/O inefficientGlobal I/O request lock contentionBounce buffer bottleneck on high-memory systems

Page 17: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

I/O scalability (cont.)

New BIO data structureEfficiently unifies all types of I/O requestsChallenges

Rewrite much of the block I/O layerAdapt all low-level drivers and remappers (MD, LVM)Avoid deadlocks in out-of-memory situations

Other enhancementsEliminate global I/O request lockImproved I/O schedulerMerged buffer cache with page cache

Page 18: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Asynchronous I/O

Authors: Ben LaHaise et al.Asynchronous I/O

I/O requests executed while application continues to runCompletion of I/O signalled to applicationGoal: higher throughput, esp. for data bases etc.

ImplementationKernel provides async. I/O API (io_submit, io_getevents,...)Synchronous kernel-internal interfaces switched to async.Goal: everything in-kernel should be asynchronousUser space implements POSIX AIO on top

Page 19: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Networking scalability: epoll

Authors: Davide Libenzi et al.Idle connection problem

Typical server load: many connections, few activeEvent notification API (select, poll) performance degraded

epoll: New notification mechanismAPI: epoll_create / epoll_ctl / epoll_waitIdle connections do not affect performanceBetter performance, more robust than RT signals

Page 20: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Threading model

Authors: Ulrich Drepper, Ingo MolnarProblems with LinuxThreads

POSIX non-compliance One PID per processPOSIX signal handlingInter-process synchronization primitives

Limited number of threadsPerformance issuesManager thread / heavy-weight library

Page 21: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Threading model (cont.)

Design of new threading model1-on-1 modelNo manager threadLight-weight user space wrapper libraryO(1) scheduler for large number of threadsKernel/toolchain support for thread-local storageKernel awareness of 'thread groups'Kernel support for fast thread start-up/exitIn-kernel POSIX signal handlingSynchronization primitives via 'futex'

Page 22: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Threading model (cont.)

Thread-local storage supportNew compiler feature (C/C++ language extension)

extern __thread int errno;Compiler/Toolchain/Library support

Thread pointer via access register(s)TLS relocations in assembler/linkerTLS support in dynamic linker and glibcTLS access models to optimize performance

Kernel supportCLONE_TLS flag to clone()

Page 23: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Authors: Rusty Russell et al.Design goals

Intra-process and inter-process synchronizationImplement all POSIX synchronization primitivesAllow blocking and non-blocking waitAllow multiple strategies (wake-one vs. wake-all etc.)No administrative overhead (setup/cleanup etc.)No system calls in the uncontended caseNo unnecessary context switchesNo limits (e.g. number of futexes)

Fast user space synchronization (Futex)

Page 24: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Futex (cont.)

ImplementationUser space atomic operations on shared memory word'futex' system call to handle contention cases

Futex system callsys_futex (addr_t addr, int op, int val, struct timespec *timeout)FUTEX_WAIT: If the lock word at 'addr' still contains 'val', sleep until a futex wakeup on 'addr' is performed or timeout.FUTEX_WAKE: Wake up to 'val' processes sleeping on the futex 'addr'. Return number of processes actually woken.FUTEX_FD: Return file descriptor usable for asynchronous wait on futex 'addr'. Optionally set up SIGIO signal 'val'.

Page 25: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Device model

Authors: Patrick Mochel et al.Design goals

Represent physical device treeEnables power-save suspend/resume operationsSimplifies device reference counting and locking

Enable dynamic device attach/detachAutomatically probe for devices, manual online/offline overridesInterface with /sbin/hotplug user mode helper

Unified user interfaceNew file system: sysfsMultiple subsystems provide 'views' into device tree

Page 26: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Devices SubsystemPhysical device interconnection tree

Bus SubsystemTop-level view of all device drivers by bus typeLinks to connected devices

Block SubsystemTop-level view of all block devices and partitionsLinks to underlying devices

Net SubsystemTop-level view of all network devicesLinks to underlying devices

Device model (cont.)

Page 27: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Bus and device types on zSeriesChannel Subsystem Bus / I/O Subchannel Devices

Identifier: Subchannel NumberAttributes: Channel Paths, PIM/PAM/POM

CCW Device Bus / CCW DevicesIdentifier: Device NumberAttributes: Control Unit Type, Device Type, Online Status

CCW Device Group BusesGroup device: Multiple CCW devices used as a unitRequired for QETH, LCS, and CTC devicesObsoletes 2.4 Channel Device Configuration layerIdentifier: First device in groupAttributes: Shared Online Status

Device model (cont.)

Page 28: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

User interface via sysfs: devices

/sys /devices /sys System Bus /channel_pathNN Channel Path /css0 Channel Subsystem Bus /0:NNNN Subchannel /0:NNNN CCW Device /qeth QETH Group Bus /0:NNNN CCW Device Group /bus /block /net

Device model (cont.)

Page 29: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

User interface via sysfs: device drivers

/sys /bus /css/drivers /io_subchannel Subchannel Driver /0:NNNN Links to /devices /css/devices *All* devices /0:NNNN /ccw/drivers /dasd-eckd DASD Driver /0:NNNN Links to /devices /ccwgroup/drivers /qeth QETH Driver /group Group creation /0:NNNN Links to /devices

Device model (cont.)

Page 30: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

User interface via sysfs: block devices

/sys /block /dasda DASD block device /device Link to /devices /dev Major/minor number ... /dasda1 1st partition /dev Major/minor /dasda2 2nd partition /dev Major/minor

Device model (cont.)

Page 31: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Example: Install new QETH device

# Create QETH CCW group deviceecho 0:5c00,0:5c01,0:5c02 \ > /sys/bus/ccwgroup/qeth/group

# Set up portname parameterecho portname:OSAPORT \ > /sys/bus/ccwgroup/qeth/0:5c00/parameters

# Set device onlineecho 1 \ > /sys/bus/ccwgroup/qeth/0:5c00/online

Device model (cont.)

Page 32: What's new in Linux 2.6? · New BIO data structure Efficiently unifies all types of I/O requests Challenges Rewrite much of the block I/O layer Adapt all low-level drivers and remappers

Resources

Linux for zSeries developerWorks pagehttp://www.software.ibm.com/developerworks/opensource/linux390/index.html

Linux for zSeries technical contact [email protected]

Linux for zSeries mailing list at Marist Collegehttp://www.marist.edu/htbin/wlvindex?LINUX-VM