FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.
-
Upload
candice-townsend -
Category
Documents
-
view
219 -
download
1
Transcript of FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.
![Page 1: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/1.jpg)
FAST-OS BOF SC 04
http://www.cs.unm.edu/~fastos
Follow link to subscribe to the mail list
![Page 2: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/2.jpg)
Projects
• ColonyTerry Jones, LLNL
• Config FrameworkRon Brightwell, SNL
• DAiSESPat Teller, UTEP
• K42Paul Hargrove, LBNL
• MOLARStephen Scott, ORNL
• Peta-Scale SSIScott Studham, ORNL
• Rightweight KernelsRon Minnich, LANL
• Scalable FTJarek Nieplocha, PNNL
• SmartAppsL. Rauchwerger, T A&M
• ZeptoOSPete Beckman, ANL
![Page 3: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/3.jpg)
www.HPC-Colony.org
Services & Interfaces For Very Large Linux Clusters
Terry Jones, LLNL, Coordinating PI Laxmikant Kale, UIUC, PI
Jose Moreira, IBM, PICelso Mendes, UIUCDerek Lieber, IBM
![Page 4: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/4.jpg)
Overview
Lawrence Livermore National Laboratory
University of Illinois at Urbana-Champaign
International Business Machines
• Parallel Resource Instrumentation Framework
• Scalable Load Balancing• OS mechanisms for Migration• Processor Virtualization for Fault
Tolerance• Single system management space• Parallel Awareness and Coordinated
Scheduling of Services• Linux OS for cellular architecture
Services and Interfaces to Support Systems with Very Large Numbers of Processors
Collaborators
Topics
Title
Colony
![Page 5: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/5.jpg)
Motivation
• Parallel resource management
Strategies for scheduling and load balancing must be improved. Difficulties in achieving a balanced partitioning and dynamically scheduling workloads can limit scaling for complex problems on large machines.
• Global system management
System management is inadequate. Parallel jobs require common operating system services, such as process scheduling, event notification, and job management to scale to large machines.
Colony
![Page 6: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/6.jpg)
Goals
• Develop infrastructure and strategies for automated parallel resource management
– Today, application programmers must explicitly manage these resources. We address scaling issues and porting issues by delegating resource management tasks to a sophisticated parallel OS.
– “Managing Resources” includes balancing CPU time, network utilization, and memory usage across the entire machine.
• Develop a set of services to enhance the OS to improve its ability to support systems with very large numbers of processors
– We will improve operating system awareness of the requirements of parallel applications.
– We will enhance operating system support for parallel execution by providing coordinated scheduling and improved management services for very large machines.
Colony
![Page 7: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/7.jpg)
Approach• Top Down
– Our work will start from an existing full-featured OS and remove excess baggage with a “top down” approach.
• Processor virtualization – One of our core techniques: the programmer divides the computation into a
large number of entities, which are mapped to the available processors by an intelligent runtime system.
• Leverage Advantages of Full Featured OS & Single System Image– Applications on these extreme-scale systems will benefit from extensive
services and interfaces; managing these complex systems will require an improved “logical view”
• Utilize Blue Gene– Suitable platform for ideas intended for very large numbers
of processors
Colony
![Page 8: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/8.jpg)
Configurable OS Framework
• Sandia, lead– Ron Brightwell, PI– Rolf Riesen
• Caltech– Thomas Sterling, PI
• UNM– Barney Maccabe, PI– Patrick Bridges
![Page 9: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/9.jpg)
Issues
• Novel architectures– Lots of execution environments
• Programming models– MPI, UPC, separating processing from location
• Shared services– File systems, shared WAN
• Usage model– Dedicated, space shared, time shared
![Page 10: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/10.jpg)
Approach• Build application specific OS
– Architecture, programming model, shared resources, usage model
• Develop a collection of Micro services– Compose and distribute
• Compose services– Services may adapt
• Kinds of services– Memory allocation, signal delivery, message
receipt and handler activation
![Page 11: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/11.jpg)
The Picture
![Page 12: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/12.jpg)
Challenges
• How to reason about combinations• Dependencies among services• Efficiency
– Overhead associated with transfers between micro services
• How many operating systems will we really need?
![Page 13: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/13.jpg)
Generalized Customized resource management
Fixed Dynamically Adaptable OS/runtime services
Enhanced Performance
GoalsDynamic Adaptability in Support of Extreme Scale
![Page 14: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/14.jpg)
Determining
• What to adapt
• When to adapt
• How to adapt
• How to measure effects of adaptation
ChallengesDynamic Adaptability in Support of Extreme Scale
![Page 15: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/15.jpg)
• Develop mechanisms to dynamically sense, analyze, and adjust common performance metrics, fluctuating workload situations, and overall system environment conditions
• Demonstrate, via Linux prototypes and experiments, dynamic self-tuning/provisioning in HPC environments
• Develop a methodology for general-purpose OS adaptation
DeliverablesDynamic Adaptability in Support of Extreme Scale
![Page 16: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/16.jpg)
identify adaptationtargets
characterize workloadresource usage patterns
(re)determine adaptation intervals
define/adapt heuristics to trigger adaptation
generate/adapt monitoring, triggering andadaptation code, and attach it to OS
potential adaptation targets
Methodology
KernInstmonitor application execution, triggering
adaptation as necessary
off line
off line/run time
Dynamic Adaptability in Support of Extreme Scale
![Page 17: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/17.jpg)
InstrumentationTool
Client
KernInst APIKernInst Device
Linux Kernel
KernInst Daemon
IBM pSeries eServer 690
KernInst
• KernInst and Kperfmon provide the capability to perform dynamic monitoring and adaptation of commodity operating systems.
• University of Wisconsin’s KernInst and Kperfmon make the problem of run-time monitoring and adaptation more tractable.
dynamic instrumentation of the kernel
Dynamic Adaptability in Support of Extreme Scale
![Page 18: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/18.jpg)
Customization of • process scheduling parameters and algorithms, e.g.,
scheduling policy for different job types (prototype in process)
• file system cache size and management• disk cache management• size of OS buffers and tables• I/O, e.g., checkpoint/restart • memory allocation and management parameters and
algorithms
Example AdaptationsDynamic Adaptability in Support of Extreme Scale
![Page 19: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/19.jpg)
University of Texas at El PasoDepartment of Computer SciencePatricia J. Teller ([email protected])
University of Wisconsin — MadisonComputer Sciences DepartmentBarton P. Miller ([email protected])
International Business Machines, Inc.Linux Technology CenterBill Buros ([email protected])
U.S. Department of EnergyOffice of ScienceFred Johnson ([email protected])
PartnersDynamic Adaptability in Support of Extreme Scale
![Page 20: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/20.jpg)
High End Computing with K42
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Paul H. Hargrove and
Katherine YelickLawrence Berkeley National Lab
Angela Demke Brown and
Michael StummUniversity of Toronto
Patrick Bridges University of New Mexico
Orran Krieger and
Dilma Da SilvaIBM
![Page 21: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/21.jpg)
Project Motivation
• The HECRTF and FastOS reports enumerate unmet needs in the area of Operating Systems for HEC, including– Availability of Research Frameworks– Support for Architectural Innovation– Performance Visibility– Ease of Use– Adaptability to Application Requirements
• This project uses the K42 Operating System to address these five needs
K42
![Page 22: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/22.jpg)
K42 Background
• K42 is a research OS from IBM– API/ABI compatibility with Linux– Designed for large 64-bit SMPs– Extensible object-oriented design
• Features per resource-instance objects• Can change implementation/policy for individual instances at
runtime
– Extensive performance-monitoring– Many traditional OS functions are performed in user-space
libraries
K42
![Page 23: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/23.jpg)
What Work Remains? (1 of 2)
• Availability of Research Frameworks & Support for Architectural Innovation K42 is already a research platform, used by IBM for their PERCS
project (DARPA HPCS) to support architectural innovation Work remains to expand K42 from SMPs to clusters
• Performance Visibility Existing facilities are quite extensive Work remains to use runtime replacement of object
implementations to monitoring single objects for fine-grained control
K42
![Page 24: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/24.jpg)
What Work Remains? (2 of 2)
• Ease of Use Work remains to make K42 widely available, and to bring
HEC user environments to K42 (e.g. MPI, batch systems, etc.)
• Adaptability to Application Requirements Runtime replacement of object implementations provides
extreme customizability Work remains to provide implementations appropriate to
HEC, and to perform automatic dynamic adaptation
K42
![Page 25: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/25.jpg)
MOLAR: Modular Linux and Adaptive Runtime Support for High-end Computing Operating and Runtime Systems
Coordinating Principal Investigator
Stephen L. Scott, ORNL
Principal Investigators
J. Vetter, D.E. Bernholdt, C. Engelmann – ORNL
C. Leangsuksun – Louisiana Tech University
P. Sadayappan – Ohio State University
F. Mueller – North Carolina State University
Collaborators
A.B. Maccabe – University of New Mexico
C. Nuss, D. Mason – Cray Inc.
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
![Page 26: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/26.jpg)
MOLAR research goals
• Create a modular and configurable Linux system that allows customized changes based on the requirements of the applications, runtime systems, and cluster management software.
• Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models.
• Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues.
• Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions.
MOLAR
![Page 27: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/27.jpg)
High-end Computing OS Research MapMOLAR: Modular Linux and Adaptive Runtime support
HEC Linux OS: modular, custom, light-weight
RAS
High availability
Monitoring
Root cause analysis
Kernel design
Performance Observation
Communications, IO
Testbeds
ProvidedExtend/adaptruntime/OS
PROBLEM:• Current OSs and runtime systems (OS/R) are unable to meet the various requirements to run large
applications efficiently on future ultra-scale computers.
GOALS:• Development of a modular and configurable Linux framework.• Runtime systems to provide a seamless coordination between system levels.• Monitoring and adaptation of the operating system, runtime, and applications.• Reliability, availability, and serviceability (RAS)• Efficient system management tools.
IMPACT:• Enhanced support and better understanding of extremely scalable architectures.• Proof-of-concept implementation open to community researchers.
MOLAR
![Page 28: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/28.jpg)
MOLAR crosscut capability deployed for RAS
• Monitoring Core Daemon• service monitor• resource monitor• hardware health monitor
• Head nodes: active / hot standby
• Services: active / hot standby
• Modular Linux systems deployment & development
MOLAR
![Page 29: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/29.jpg)
MOLAR Federated System Management (fSM)
• fSM emphasizes simplicity• self-build• self-configuration• self-healing• simplified operation
• Expand MOLAR support:• Investigate specialized
architectures• Investigate other
environments & OSs
• Head nodes: active / active
• Services: active / active
MOLAR
![Page 30: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/30.jpg)
Peta-Scale Single-System Image A framework for a single-system image Linux environment
for 100,000+ processors and multiple architectures
Coordinating Investigator
R. Scott Studham, ORNL
Principal Investigators
Alan Cox, Rice University
Bruce Walker, HP
Investigators
Peter Druschel, Rice University
Scott Rixner, Rice University
Collaborators
Peter Braam, CFS
Steve Reinhardt, SGI
Stephen Wheat, Intel
![Page 31: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/31.jpg)
Project Key Objectives
OpenSSI to 10,000 nodes Integration of OpenSSI with nodes with high processor counts The scalability of a shared root filesystem to 10,000 nodes Scalable booting and monitoring mechanisms Research enhancements to OpenSSI’s P2P communications The use of very large page sizes (superpages) for large address spaces Determine the proper interconnect balance as it impacts the operating
system (OS) Establish system-wide tools and process management for a 100,000
processor environment OS noise (services that interrupt computation) effects Integrating a job scheduler with the OS Preemptive task migration.
Peta-Scale SSI
![Page 32: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/32.jpg)
Reduce OS-Noise and increase cluster scalability via efficient compute nodes
LVSDLM
Lustreclient
ICS
Install and sysadmin
Boot and Init
Applicationmonitoring and restart
MPIHA Resource
Mgmt and Job
Scheduling
Service Nodessingle install; local boot (for HA); single IP (LVS)connection load balancing (LVS);single root with HA (Lustre):single file system namespace (Lustre); single IPC namespace; single process space and process load leveling;application HA strong/strict membership;
Compute Nodessingle install; network or local boot; not part of single IP and no connection load balance single root with caching (Lustre);single file system namespace (Lustre); no single IPC namespace (optional); single process space but no process load leveling;no HA participation; scalable (relaxed) membership; inter-node communication channels on demand only
Processload levelingIPCDevices
ClusterFilesystem
CFSRemote File Block
Vproc
CLMS Lustreclient
ICS
BootMPI
CLMSLite
Remote File Block
Vproc
Peta-Scale SSI
![Page 33: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/33.jpg)
Researching the intersection of SSI and large kernels to get to 100,000+ processors
2048 CPUs
Sing
le L
inux
Ker
nel
1 CPU10,000 NodesSoftware SSI Clusters1 Node
Stock Linux KernelTypical SSI
Continue SGIs work on single kernel scalability
Continue OpenSSI’s work on SSI scalability
Test the intersection large kernels with software OpenSSI to establish the sweat spot for 100,000 processor Linux environments
1) Establish scalability baselines2) Enhance scalability of both approaches3) Understand intersection of both methods
Peta-Scale SSI
![Page 34: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/34.jpg)
Right-Weight Kernels
The right kernel, in the right place, at
the right time
![Page 35: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/35.jpg)
OS effect on Parallel Applications
• Simple problem: if all processors save one arrive at a join, then all wait for the laggard [Mraz SC ’94]– Mraz resolved the problem for AIX, interestingly,
with purely local scheduling decisions (i.e., no global scheduler)
– Sandia resolved it by getting rid of the OS entirely (i.e., creation of the “Light-Weight Kernel”)
• AIX has more capability than many apps need• LWK has less capability than many apps want
RWK
![Page 36: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/36.jpg)
Hence Right-Weight Kernels
• Customize the kernel to the app• We’re looking at two different approaches• Customized, Modular Linux
– Based on 2.6– With some scheduling enhancements
• “COTS” Secure LWK– Based, after some searching, on Plan 9– With some performance enhancements
RWK
![Page 37: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/37.jpg)
Balancing Capability and Overhead
• We need to balance the capabilities that an full OS gives the user with the overhead of providing such services
• For a given app, we want to be as close to the “optimal” balance as possible
• But how do we measure what that is?
AIX, Tru64, Solaris,
Linux, etc.
No OS
increasing per node capability
decreasing OS impact on appRWKRWK
RWK
RWK
![Page 38: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/38.jpg)
Measuring what is “good”
• OS activity is periodic, thus we need to use techniques such as time series analysis to evaluate the measured data– Use this data to figure out what is “good” and “bad”
• Caveat: you must practice good sampling hygiene [Sottile & Minnich, Cluster ’04]– Must follow rules of statistical sampling– Measuring work per unit of time leads to statistically
sound data– Measuring time per unit of work leads to
meaningless data
RWK
![Page 39: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/39.jpg)
Conclusions
• Use sound statistical measurement techniques to figure out what is “good”
• Configure compute nodes on a per app basis (Right-Weight Kernel)
• Rinse and repeat!
• Collaborators– Sung-Eun Choi, Matt Sottile, Erik Hendriks (LANL)– Eric Grosse, Jim McKie, Vic Zandy (Bell Labs)
RWK
![Page 40: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/40.jpg)
SFT: Scalable Fault Tolerant Runtime and Operating Systems
Pacific Northwest National LaboratoryLos Alamos National Laboratory
University of IllinoisQuadrics
![Page 41: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/41.jpg)
Team
• Jarek Nieplocha, PNNL
• Fabrizio Petrini and Kei Davis (LANL)
• Josep Torrellas and Yuanyuan Zhou (UIUC)
• David Addison (Quadrics)
• Industrial Partner: Stephen Wheat (Intel)
SFT
![Page 42: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/42.jpg)
Motivation
• With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications.
• Application Driver– Multidisciplinary, multiresolution, and multiscale nature of scientific
problems – drive the demand for high end systems – applications place increasingly differing demands on the system
resources: disk, network, memory, and CPU.
• Therefore, it will not be cost-effective or practical to rely on a single fault tolerance approach for all applications.
SFT
![Page 43: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/43.jpg)
Goals
• Develop scalable and practical techniques for addressing fault tolerance at the Operating System and Runtime levels– Design based on requirements of DoE
applications– Minimal impact on application performance
SFT
![Page 44: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/44.jpg)
Petaflop Architecture
Tightly coupled node Globally addressable but non-coherent between nodes
......processors
memories
interconnection network
Tightly coupled node Globally addressable but non-coherent between nodes
......processors
memories
......processors
memories
interconnection network
SFT
![Page 45: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/45.jpg)
Scope
• We will investigate, develop, and evaluate a comprehensive range of techniques for fault tolerance. – System level incremental checkpointing approach
• based on Buffered CoScheduling• temporal and spatial hybrid checkpointing• in-memory checkpointing and efficient handling of I/O
– Fault awareness in communication libraries• while exploiting high performance network communication • MPI, ARMCI• scalability
– Feasibility analysis of incremental checkpointing
SFT
![Page 46: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/46.jpg)
Buffered CoScheduling
SFT
![Page 47: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/47.jpg)
SmartApps: Middleware for Adaptive
Applications on Reconfigurable Platforms
Lawrence Rauchwergerhttp://parasol.tamu.edu/~rwerger/
Parasol Lab, Dept of Computer Science, Texas A&M
![Page 48: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/48.jpg)
Today: System Centric Computing
•Compilers are conservative
•OS offers generic services
•Architecture is generic
No Global Optimization
•No matching between Application/OS/HW
•intractable for the general case
WHAT’s MISSING ?
Classic avenues to performance:
•Parallel Algorithms
•Static Compiler Optimization
•OS support
•Good Architecture
Application
Compiler
HW
OS
System-Centric ComputingSystem-Centric Computing
Compiler(static)
Application(algorithm)
System(OS & Arch)
Execution
Development,Analysis &Optimization
Input Data
SmartApps
![Page 49: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/49.jpg)
Our Approach: SmartAppsApplication Centric Computing
Application
Compiler
HW
OS
Application-Centric Computing
Compiler (static) +run-time techniques
Application(algorithm)
Run-time System:Execution, Analysis& Optimization
Development,Analysis &Optimization
Input DataArchitecture
(reconfigurable)
OS(modular)
Compiler(run-time)
SmartApp
Compiler + OS + Architecture + Data + Feedback
Application ControlInstance-specific optimization
SmartApps
![Page 50: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/50.jpg)
SmartApps Architecture
Compiled code + runtime hooks
Static STAPL CompilerAugmented withruntime techniques
Predictor &Optimizer
STAPL STAPL ApplicationApplication
advanced advanced stagesstages
development development stagestage
ToolboxToolbox
Get Runtime Information(Sample input, system information, etc.)
Execute Application
Continuously monitor performance and adaptas necessary
Predictor &Optimizer
Predictor &Evaluator
Adaptive Software
Runtime tuning (w/o recompile)
Compute Optimal Applicationand RTS + OS Configuration
Recompute Applicationand/or Reconfigure RTS + OS
Configurer
Predictor &Evaluator
Smart Application
Small adaptation (tuning)
Large adaptation(failure, phase change)
DataBase
Adaptive RTS+ OS
SmartApps
![Page 51: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/51.jpg)
SmartApps written in STAPL
• STAPL (Standard Template Adaptive Parallel Library): – Collection of generic parallel algorithms, distributed
containers & run-time system (RTS)– Inter-operable with Sequential Programs– Extensible, Composable by end-user– Shared Object View: No explicit communication– Distributed Objects: no replication/coherence– High Productivity Environment
SmartApps
![Page 52: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/52.jpg)
The STAPL Programming Environment
RTS + Communication Library (ARMI)
OpenMP/MPI/pthreads/native
pAlgorithms pContainers
User Code
pRange
Interface to OS (K42)
SmartApps
![Page 53: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/53.jpg)
SmartApps to RTS to OSSpecialized Services from Generic OS Services
– OS offers one size fits all services. – IBM K42 offers customizable services– We want customized services BUT…. we do not want to
write them
Interface between SmartApps(RTS) & OS(k42) • Vertical integration of Scheduling/Memory
Management
SmartApps
![Page 54: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/54.jpg)
Collaborative Effort:• STAPL (Amato/Rauchwerger)
• STAPL Compiler (Rauchwerger/Stroustrup/Quinlan)
• RTS – K42 Interface & Optimizations (Krieger/Rauchwerger)
• Applications (Amato/Adams/ others)
• Validation on DOE extreme HW BlueGene (Moreira) , possibly PERCS
(Krieger/Sarkar) Texas A&M (Parasol, NE) + IBM + LLNL
SmartApps
![Page 55: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/55.jpg)
ZeptoOSStudying Petascale Operating Systems
with Linux
Argonne National Laboratory
Pete Beckman
Bill Gropp
Rusty Lusk
Susan Coghlan
Suravee Suthikulpanit
University of Oregon
Al Malony
Sameer Shende
![Page 56: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/56.jpg)
Observations:
• Extremely large systems run an “OS Suite”– BG/L and Red Storm both have at least
4 different operating system flavors
• Functional Decomposition trend lends itself toward a customized, optimized point-solution OS
• Hierarchical Organization requires software to manage topology, call forwarding, and collective operations
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
ZeptoOs
![Page 57: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/57.jpg)
ZeptoOS
• Investigating 4 key areas:– Linux as an ultra-lightweight kernel
• Memory mgmt, scheduling efficiency, network
– Collective OS calls• Explicit collective behavior may be key (DLLs?)
– OS Performance monitoring for hierarchical systems
– Fault tolerance
ZeptoOs
![Page 58: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/58.jpg)
Linux as a Lightweight KernelWhat does an OS steal from a selfish CPU application?
• Purpose: Micro benchmark measuring CPU cycles provided to benchmark application
• Helps understand “MPI-reduce problem” and gang scheduling issues
ZeptoOs
![Page 59: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/59.jpg)
Collective OS Calls
• Collective messaging passing calls have been very efficiently implemented on many architectures
• Collective I/O calls permit scalable, efficient (non-Posix) file I/O
• Collective OS calls, such as dynamically loading libraries, may provide scalable OS functionality
ZeptoOs
![Page 60: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/60.jpg)
Scalable OS Performance Monitoring(U of Oregon)
• TAU provides a framework for scalable performance analysis
• Integration of TAU into hierarchical systems, such as BG/L, will all us to explore:– Instrumentation of light-weight kernels
• Call forwarding, memory, etc
– Intermediate, parallel aggregation of performance data at I/O nodes
– Integration of data from the OS Suite
ZeptoOs
![Page 61: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/61.jpg)
Exploring Faults: Faulty Towers
• Modify Linux so we can selectively and predictably break things
• Run user code, middleware, etc at ultra scale, with faults
• Explore metrics for codes with good “survivability”
Memory KernelMPI/Net Disk Middleware
It’s not a bug, it’s a feature!
Dial-a-Disaster
ZeptoOs
![Page 62: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/62.jpg)
Simple Counts
• OSes (4): Linux (6.5), K-42 (2), Custom (1), Plan 9 (.5)
• Labs (7): ANL, LANL, ORNL, LBNL, LLNL, PNNL, SNL
• Universities: Caltech, Louisiana Tech, NCSU, Rice, Ohio State, Texas A&M, Toronto, UIUC, UTEP, UNM, U of Chicago, U of Oregon, U of Wisconsin
• Industry: Bell Labs, Cray, HP, IBM, Intel, CFS (Lustre), Quadrics, SGI
![Page 63: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/63.jpg)
Apple Pie• Open source• Partnerships: Labs, universities, and industry• Scope: basic research, applied research,
development, prototypes, testbed systems, and deployment
• Structure: “don’t choose a winner too early”– Current or near-term problems -- commonly used,
open-source Oses (e.g., Linux or FreeBSD)– Prototyping work in K42 and Plan 9– At least one wacko project (explore novel ideas
that don’t fit into an existing framework)
![Page 64: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/64.jpg)
A bit more interesting• Virtualization
– Colony
• Adaptability– DAiSES, K42, MOLAR, SmartApps– Config, RWK
• Usage model & system mgmt (OS Suites)– Colony, Config, MOLAR, Peta-scale SSI, Zepto
• Metrics & Measurement– HPC Challenge (http://icl.cs.utk.edu/hpcc/)– DAiSES, K42, MOLAR, RWK, Zepto
• Fault handling– Colony, MOLAR, Scalable FT, Zepto
![Page 65: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/65.jpg)
continued• Managing the memory hierarchy• Security• Common API
– K42, Linux
• Single System Image– Peta-scale SSI
• Collective Runtime– Zepto
• I/O– Peta-scale SSI
• OS Noise– Colony, Peta-scale SSI, RWK, Zepto
![Page 66: FAST-OS BOF SC 04 fastos Follow link to subscribe to the mail list.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649e4e5503460f94b447f4/html5/thumbnails/66.jpg)
Application Driven
• Meet the application developers– OS presentations– Apps people panic -- what are you doing to
my machine?– OS people tell ‘em what we heard– Apps people tell us what we didn’t
understand