20121115 open stack_ch_user_group_v1.2

60
OpenStack and CERN Tim Bell [email protected] @noggin143 First OpenStack Switzerland User Group Meeting 15 th November 2012

description

OpenStack User Group Switzerland meeting in Zurich

Transcript of 20121115 open stack_ch_user_group_v1.2

Page 1: 20121115 open stack_ch_user_group_v1.2

OpenStack and CERN

Tim [email protected]

@noggin143

First OpenStack Switzerland User Group Meeting15th November 2012

Page 2: 20121115 open stack_ch_user_group_v1.2

What is OpenStack ?

• OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface

OpenStack Swiss User Group Tim Bell, CERN 2

Page 3: 20121115 open stack_ch_user_group_v1.2

Principles

• Open Source – Apache 2.0 license, NO ‘enterprise’ version

• Open Design – Open Design Summit, anyone is able to define core architecture

• Open Development – Anyone can involve development process via Launchpad & Github

• Open Community – OpenStack Foundation in 2012, Now 190+ companies, 3000+

developers, 6000+ members

OpenStack Swiss User Group Tim Bell, CERN 3

Page 4: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 4

Page 5: 20121115 open stack_ch_user_group_v1.2

OpenStack Cloud Components

• Nova – Compute Layer• Swift – Object Store• Quantum – Networking• Horizon – Dashboard• Cinder – Block Storage• Keystone – Identity• Glance – Image management

• Each component has an API and is pluggable• Other non-core projects interact with these components

OpenStack Swiss User Group Tim Bell, CERN 5

Page 6: 20121115 open stack_ch_user_group_v1.2

OpenStack Folsom

OpenStack Swiss User Group Tim Bell, CERN 6

Page 7: 20121115 open stack_ch_user_group_v1.2

October Summit in San Diego

• 1,400 people attended with 320 talks and up to 8 parallel streams

– Double the spring 2012 summit in San Francisco

• International involvement is increasing– Asia especially China– Europe

• Companies and Open Source community generally working well

– Transparency and shared goals

OpenStack Swiss User Group Tim Bell, CERN 7

Page 8: 20121115 open stack_ch_user_group_v1.2

Talent Competition

OpenStack Swiss User Group Tim Bell, CERN 8

Page 9: 20121115 open stack_ch_user_group_v1.2

What’s new from the summit ?

OpenStack Swiss User Group Tim Bell, CERN 9

Component Folsom Grizzly (plan for Q2/2014)

Nova (Compute) Host AggregatesHyper-V/PowerVM support

CellsBare metal support

Cinder (Block Storage) Now in coreNFS as a block device

Multiple backendsQoS for backendsResize

Glance Replicated imagesNew client

Incremental images

Keystone PKI supportLDAP support for Active Directory

DomainsUser Groups

Horizon Support for volumes Realtime notificationsCross project signalling

Swift Versioned objectsSSD acceleration

Multi-range get

Quantum Now in corePluggable backends

Security groupsHigh Availability

Page 10: 20121115 open stack_ch_user_group_v1.2

Upcoming Projects• Ceilometer

– Metering and billing

• Heat– Cloud orchestration

• Red Dwarf– Database as a Service – being refactored to be on top of Nova

• Equilibrium– Load Balancing as a Service – 3 proposals being merged to 1

• DNS as a Service– May become part of Quantum

• High Availability Nova Controller– Cisco, Hastexo proposals

OpenStack Swiss User Group Tim Bell, CERN 10

Page 11: 20121115 open stack_ch_user_group_v1.2

OpenStack Foundation• The OpenStack Foundation is an independent body providing

shared resources to help achieve the OpenStack Mission by Protecting, Empowering, and Promoting OpenStack software and the community around it, including users, developers and the entire ecosystem.

• Provides– Promotion of the OpenStack brand– Event management such as Summits, User Groups, …– Legal coverage of trademark, contributions– Continuous integration and Testing infrastructure– Developer tools to ease contribution

OpenStack Swiss User Group Tim Bell, CERN 11

Page 12: 20121115 open stack_ch_user_group_v1.2

OpenStack Board Structure• ”Platinum Members” are companies which make a significant

strategic commitment to OpenStack in funding and resources. Platinum Members each appoint a representative to the Board of Directors

• ”Gold Members” are companies which provide funding and resources, but at a lower level than Platinum Members. Associate Members as a class elect representatives to the Board of Directors.

• "Individual Members" who participate on their own or as part of their paid employment. It’s free to join as an Individual Member and Individual Members have the right to run for, and vote for, a number of leadership positions.

OpenStack Swiss User Group Tim Bell, CERN 12

Page 13: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 13

Page 14: 20121115 open stack_ch_user_group_v1.2

Individual Members

OpenStack Swiss User Group Tim Bell, CERN 14

• Elected by the membership of the foundation• Rob Hirschfeld (DELL)• Monty Taylor (HP)• Hui Cheng (SINA)• Yujie Du (99 Cloud)• Troy Toman (Rackspace)• Tim Bell (CERN)• Tristan Goode (Aptira)• Jesse Andreas (Nebula)

• No company can have more than 2 directors• Rules for individual member voting are currently under

discussion with the community to find best ways of representing all the members

Page 15: 20121115 open stack_ch_user_group_v1.2

Foundation Governance

OpenStack Swiss User Group Tim Bell, CERN 15

Page 16: 20121115 open stack_ch_user_group_v1.2

What is CERN ?

OpenStack Swiss User Group Tim Bell, CERN 16

• Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics

• Between Geneva and the Jura mountains, straddling the Swiss-French border

• Founded in 1954 with an international treaty

• Our business is fundamental physics , what is the universe made of and how does it work

Page 17: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 17

Answering fundamental questions…• How to explain particles have mass?

We have theories and accumulating experimental evidence.. Getting close…

• What is 96% of the universe made of ?We can only see 4% of its estimated mass!

• Why isn’t there anti-matterin the universe?

Nature should be symmetric…

• What was the state of matter justafter the « Big Bang » ?

Travelling back to the earliest instants ofthe universe would help…

Page 18: 20121115 open stack_ch_user_group_v1.2

Tim Bell, CERN 18

The Large Hadron Collider

OpenStack Swiss User Group

Page 19: 20121115 open stack_ch_user_group_v1.2

19

The Large Hadron Collider (LHC) tunnel

OpenStack Swiss User Group Tim Bell, CERN

Page 20: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 20

Page 21: 20121115 open stack_ch_user_group_v1.2

Accumulating events in 2009-2011

OpenStack Swiss User Group Tim Bell, CERN 21

Page 22: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 22

Page 23: 20121115 open stack_ch_user_group_v1.2

Heavy Ion Collisions

OpenStack Swiss User Group Tim Bell, CERN 23

Page 24: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 24

Page 25: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 25

Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis

Tier-0 (CERN):•Data recording•Initial data reconstruction•Data distribution

Tier-2 (~200 centres):• Simulation• End-user analysis

• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid• In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs

Page 26: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 26

Page 27: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 27

45,000 tapes holding 73PB of physics data

Page 28: 20121115 open stack_ch_user_group_v1.2

New data centre to expand capacity

OpenStack Swiss User Group Tim Bell, CERN 28

• Data centre in Geneva at the limit of electrical capacity at 3.5MW

• New centre chosen in Budapest, Hungary

• Additional 2.7MW of usable power

• Hands off facility• Deploying from 2013

with 200Gbit/s network to CERN

Page 29: 20121115 open stack_ch_user_group_v1.2

Time to change strategy• Rationale

– Need to manage twice the servers as today– No increase in staff numbers– Tools becoming increasingly brittle and will not scale as-is

• Approach– CERN is no longer a special case for compute– Adopt an open source tool chain model– Our engineers rapidly iterate

• Evaluate solutions in the problem domain• Identify functional gaps and challenge them• Select first choice but be prepared to change in future

– Contribute new function back to the community

OpenStack Swiss User Group Tim Bell, CERN 29

Page 30: 20121115 open stack_ch_user_group_v1.2

Building Blocks

OpenStack Swiss User Group Tim Bell, CERN 30

Bamboo

Koji, Mock

AIMS/PXEForeman

Yum repoPulp

Puppet-DB

mcollective, yum

JIRA

Lemon /Hadoop

git

OpenStack Nova

Hardware database

Puppet

Active Directory /LDAP

Page 31: 20121115 open stack_ch_user_group_v1.2

Prepare the move to the clouds• Improve operational efficiency

– Machine ordering, reception and testing– Hardware interventions with long running programs– Multiple operating system demand

• Improve resource efficiency– Exploit idle resources, especially waiting for disk and tape I/O– Highly variable load such as interactive or build machines

• Enable cloud architectures– Gradual migration to cloud interfaces and workflows

• Improve responsiveness– Self-Service with coffee break response time

OpenStack Swiss User Group Tim Bell, CERN 31

Page 32: 20121115 open stack_ch_user_group_v1.2

Service Model

OpenStack Swiss User Group Tim Bell, CERN 32

• Pets are given names like pussinboots.cern.ch

• They are unique, lovingly hand raised and cared for

• When they get ill, you nurse them back to health

• Cattle are given numbers like vm0042.cern.ch

• They are almost identical to other cattle• When they get ill, you get another one

• Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed

Page 33: 20121115 open stack_ch_user_group_v1.2

Current Status of OpenStack at CERN

• Focus on the Cattle to start with• Working on an Essex code base from the EPEL repository

– Excellent experience with the Fedora cloud-sig team– Cloud-init for contextualisation, oz for images with RHEL/Fedora

• Components– Current focus is on Nova with KVM and Hyper-V– Keystone running with Active Directory and Glance for Linux and

Windows images

• Pre-production facility with around 200 Hypervisors, with 2000 VMs integrated with CERN infrastructure, Puppet deployed and used for simulation of magnet placement using LHC@Home and batch physics programs

OpenStack Swiss User Group Tim Bell, CERN 33

Page 34: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 34

Page 35: 20121115 open stack_ch_user_group_v1.2

Next Steps• Deploy into production at the start of 2013 with Folsom running the Grid

software on top of OpenStack IaaS• Support multi-site operations with 2nd data centre in Hungary• Exploit new functionality

– Ceilometer for metering– Bare metal for non-virtualised use cases such as high I/O servers– X.509 user certificate authentication– Load balancing as a service

• Support the pets with live migration and external block storage such as Gluster

Ramping to 15K hypervisors with 100K to 300K VMs by 2015 OpenStack Swiss User Group Tim Bell, CERN 35

Page 36: 20121115 open stack_ch_user_group_v1.2

Conclusions• Production at CERN in next few months on Folsom

– Our emphasis will shift to focus on stability– Work together with others on scaling improvements

• Community is key to shared success– Foundation gives a structure for collaboration– Our problems are often resolved before we raise them– Packaging teams are producing reliable builds promptly

• CERN contributes and benefits– Thanks to everyone for their efforts and enthusiasm– Not just code but documentation, tests, blogs, …

OpenStack Swiss User Group Tim Bell, CERN 36

Page 37: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 37

Page 38: 20121115 open stack_ch_user_group_v1.2

References

OpenStack Swiss User Group Tim Bell, CERN 38

CERN http://public.web.cern.ch/public/Scientific Linux http://www.scientificlinux.org/Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/

http://rtm.hep.ph.ic.ac.uk/Jobs http://cern.ch/jobsDetailed Report on Agile Infrastructure http://cern.ch/go/N8wpTalk on OpenStack at San Diego Summit http://www.openstack.org/summit/san-di

ego-2012/openstack-summit-sessions/presentation/accelerating-science-with-openstack

Page 39: 20121115 open stack_ch_user_group_v1.2

Backup Slides

OpenStack Swiss User Group Tim Bell, CERN 39

Page 40: 20121115 open stack_ch_user_group_v1.2

Community collaboration on an international scale

Tim Bell, CERN 40OpenStack Swiss User Group

Page 41: 20121115 open stack_ch_user_group_v1.2

Our Challenges - Data storage

OpenStack Swiss User Group Tim Bell, CERN 41

• >20 years retention• 6GB/s average• 25GB/s peaks• 30PB/year to record

Page 42: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 42

• Data Centre by Numbers– Hardware installation & retirement

• ~7,000 hardware movements/year; ~1,800 disk failures/year

Xeon 51502%

Xeon 516010%

Xeon E5335

7%Xeon

E534514%

Xeon E5405

6%

Xeon E541016%

Xeon L5420

8%

Xeon L552033%

Xeon 3GHz4%

Fujitsu3%

Hitachi23% HP

0% Maxtor

0% Seagate15%

Western Digital

59%

Other0%

High Speed Routers(640 Mbps → 2.4 Tbps) 24

Ethernet Switches 350

10 Gbps ports 2,000

Switching Capacity 4.8 Tbps

1 Gbps ports 16,939

10 Gbps ports 558

Racks 828

Servers 11,728

Processors 15,694

Cores 64,238

HEPSpec06 482,507

Disks 64,109

Raw disk capacity (TiB) 63,289

Memory modules 56,014

Memory capacity (TiB) 158

RAID controllers 3,749

Tape Drives 160

Tape Cartridges 45,000

Tape slots 56,000

Tape Capacity (TiB) 73,000

IT Power Consumption 2,456 KW

Total Power Consumption 3,890 KW

Page 43: 20121115 open stack_ch_user_group_v1.2

Public Procurement Purchase ModelStep Time (Days) Elapsed (Days)

User expresses requirement 0

Market Survey prepared 15 15

Market Survey for possible vendors 30 45

Specifications prepared 15 60

Vendor responses 30 90

Test systems evaluated 30 120

Offers adjudicated 10 130

Finance committee 30 160

Hardware delivered 90 250

Burn in and acceptance 30 days typical 380 worst case

280

Total 280+ Days

OpenStack Swiss User Group Tim Bell, CERN 43

Page 44: 20121115 open stack_ch_user_group_v1.2

Training and Support

• Buy the book rather than guru mentoring• Follow the mailing lists to learn• Newcomers are rapidly productive (and often know more than us)• Community and Enterprise support means we’re not on our own

OpenStack Swiss User Group Tim Bell, CERN 44

Page 45: 20121115 open stack_ch_user_group_v1.2

Staff Motivation

• Skills valuable outside of CERN when an engineer’s contracts end

OpenStack Swiss User Group Tim Bell, CERN 45

Page 46: 20121115 open stack_ch_user_group_v1.2

When communities combine…• OpenStack’s many components and options make

configuration complex out of the box• Puppet forge module from PuppetLabs does our configuration• The Foreman adds OpenStack provisioning for user kiosk to a

configured machine in 15 minutes

OpenStack Swiss User Group Tim Bell, CERN 46

Page 47: 20121115 open stack_ch_user_group_v1.2

Foreman to manage Puppetized VM

OpenStack Swiss User Group Tim Bell, CERN 47

Page 48: 20121115 open stack_ch_user_group_v1.2

Supporting the Pets with OpenStack

• Network– Interfacing with legacy site DNS and IP management– Ensuring Kerberos identity before VM start

• Puppet– Ease use of configuration management tools with our users– Exploit mcollective for orchestration/delegation

• External Block Storage– Currently using nova-volume with Gluster backing store

• Live migration to maximise availability– KVM live migration using Gluster– KVM and Hyper-V block migration

OpenStack Swiss User Group Tim Bell, CERN 48

Page 49: 20121115 open stack_ch_user_group_v1.2

Active Directory Integration• CERN’s Active Directory

– Unified identity management across the site– 44,000 users– 29,000 groups– 200 arrivals/departures per month

• Full integration with Active Directory via LDAP– Uses the OpenLDAP backend with some particular configuration

settings– Aim for minimal changes to Active Directory– 7 patches submitted around hard coded values and additional filtering

• Now in use in our pre-production instance– Map project roles (admins, members) to groups– Documentation in the OpenStack wiki

OpenStack Swiss User Group Tim Bell, CERN 49

Page 50: 20121115 open stack_ch_user_group_v1.2

Welcome Back Hyper-V!

• We currently use Hyper-V/System Centre for our server consolidation activities

– But need to scale to 100x current installation size

• Choice of hypervisors should be tactical– Performance– Compatibility/Support with integration components– Image migration from legacy environments

• CERN is working closely with the Hyper-V OpenStack team– Puppet to configure hypervisors on Windows– Most functions work well but further work on Console, Ceilometer, …

OpenStack Swiss User Group Tim Bell, CERN 50

Page 51: 20121115 open stack_ch_user_group_v1.2

What are we missing (or haven’t found yet) ?

• Best practice for– Monitoring and KPIs as part of core functionality– Guest disaster recovery– Migration between versions of OpenStack

• Roles within multi-user projects– VM owner allowed to manage their own resources (start/stop/delete)– Project admins allowed to manage all resources– Other members should not have high rights over other members VMs

• Global quota management for non-elastic private cloud– Manage resource prioritisation and allocation centrally– Capacity management / utilisation for planning

OpenStack Swiss User Group Tim Bell, CERN 51

Page 52: 20121115 open stack_ch_user_group_v1.2

Opportunistic Clouds in online experiment farms

• The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape

• When the accelerator is not running, these machines are currently idle

– Accelerator has regular maintenance slots of several days– Long Shutdown due from March 2013-November 2014

• One of the experiments are deploying OpenStack on their farm– Simulation (low I/O, high CPU)– Analysis (high I/O, high CPU, high network)

OpenStack Swiss User Group Tim Bell, CERN 52

Page 53: 20121115 open stack_ch_user_group_v1.2

Federated European Clouds

• Two significant European projects around Federated Clouds– European Grid Initiative Federated Cloud as a federation of grid sites providing

IaaS– HELiX Nebula European Union funded project to create a scientific cloud based

on commercial providers

OpenStack Swiss User Group Tim Bell, CERN 53

EGI Federated Cloud Sites

CESGA CESNET INFN SARA

Cyfronet FZ Jülich SZTAKI IPHC

GRIF GRNET KTH Oxford

GWDG IGI TCD IN2P3

STFC

Page 54: 20121115 open stack_ch_user_group_v1.2

Federated Cloud Commonalities

• Basic building blocks– Each site gives an IaaS endpoint with an API and common security

policy• OCCI? CDMI ? Libcloud ? Jclouds ?

– Image stores available across the sites – Federated identity management based on X.509 certificates– Consolidation of accounting information to validate pledges and usage

• Multiple cloud technologies– OpenStack– OpenNebula– Proprietary

OpenStack Swiss User Group Tim Bell, CERN 54

Page 55: 20121115 open stack_ch_user_group_v1.2

OpenStack Swiss User Group Tim Bell, CERN 55

Page 56: 20121115 open stack_ch_user_group_v1.2

CERN’s tools

• The world’s most powerful accelerator: LHC– A 27 km long tunnel filled with high-tech instruments– Equipped with thousands of superconducting magnets– Accelerates particles to energies never before obtained– Produces particle collisions creating microscopic “big bangs”

• Very large sophisticated detectors– Four experiments each the size of a cathedral– Hundred million measurement channels each– Data acquisition systems treating Petabytes per second

• Top level computing to distribute and analyse the data– A Computing Grid linking ~200 computer centres around the globe– Sufficient computing power and storage to handle 25 Petabytes per

year, making them available to thousands of physicists for analysis

OpenStack Swiss User Group Tim Bell, CERN 56

Page 57: 20121115 open stack_ch_user_group_v1.2

Our Infrastructure

• Hardware is generally based on commodity, white-box servers– Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF– Compute nodes typically dual processor, 2GB per core– Bulk storage on 24x2TB disk storage-in-a-box with a RAID card

• Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise

– Focus is on stability in view of the number of centres on the WLCG

OpenStack Swiss User Group Tim Bell, CERN 57

Page 58: 20121115 open stack_ch_user_group_v1.2

New architecture data flows

OpenStack Swiss User Group Tim Bell, CERN 58

Page 59: 20121115 open stack_ch_user_group_v1.2

Virtualisation on SCVMM/Hyper-V

OpenStack Swiss User Group Tim Bell, CERN 59Mar-

10

Apr-10

May-10

Jun-10Jul-1

0

Aug-10

Sep-10

Oct-10

Nov-10

Dec-10

Jan-11

Feb-11

Mar-11

Apr-11

May-11

Jun-11Jul-1

1

Aug-11

Sep-11

Oct-11

Nov-11

Dec-11

Jan-12

Feb-12

Mar-12

Apr-12

May-12

Jun-12Jul-1

2

Aug-12

Sep-12

Oct-12

0

500

1000

1500

2000

2500

3000

3500

WindowsLinux

Page 60: 20121115 open stack_ch_user_group_v1.2

Scaling up with Puppet and OpenStack

• Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC

• Naturally, there is a puppet module puppet-boinc• 1000 VMs spun up to stress test the hypervisors with Puppet,

Foreman and OpenStack

OpenStack Swiss User Group Tim Bell, CERN 60