© 2009 IBM Corporation zLinux Measurements and Architecture Clea Zolotow Senior Technical Staff...

© 2009 IBM Corporation

zLinux Measurements and Architecture

Clea Zolotow

Senior Technical Staff Member

IBM IT Delivery(borrowed extensively from the ECM z/Linux Team.)

April 2009


IBM Virtualization – Enterprise Data Center Journey

Agenda

• IBM Commitment/Announcement Highlights

―IBM Transformation and “Big Green”

―Internal Infrastructure Challenge, Approach and Benefits

• IBM Virtualization Update

―Virtualization Progress

―Business Case Approach and Client View of Savings

―Application and Workload Selection

―Successful Techniques and Lessons Learned


Project ‘Big Green’

IBM to reallocate $1 billion each year

To accelerate “green” technologies and services

To offer a roadmap for clients to address the IT energy crisis while leveraging IBM hardware, software, services, research, and financing teams

To create a global “green” team of almost 1,000 energy efficiency specialists from across IBM

Re-affirming a long standing IBM commitment

Energy conservation efforts from 1990 – 2005 have resulted in a 40% reduction in CO2 emissions and a quarter billion dollars of energy savings

Annually invest $100M in infrastructure to support remanufacturing and recycling best practices

Double compute capacity with no increase in consumption or impact by 2010

IBM will consolidate and virtualize thousands of servers onto approximately 30 IBM System z™ mainframes

Substantial savings expected in multiple dimensions: energy, software and system support costs

The consolidated environment will use 80% less energy and 85% less floor space

This transformation is enabled by the System z sophisticated virtualization capability

Major proof point for Project Big Green


IT Organizations are Challenged by Operational Issues

Challenges

Rising costs of systems and networking operations

Explosion in volume of data and information

Difficulty in deploying new applications and services

Security of your assets & your clients’ information

Landslide of compliance requirements

Systems and applications need to be available

Rising energy costs & rising energy demand

Power & thermal issues inhibit operations

Environmental compliance & governance mandates

Costs & Service Delivery

Business Resiliency

& Security

EnergyRequirements


Data Center Efficiencies Achieved Consolidation of infrastructure, applications Enterprise architecture optimization Global resource deployment

IBM Metrics 1997 Today

CIOs 128 1

Host data centers 155 7

Web hosting centers 80 5

Network 31 1

Applications 15,000 4,700

TE

CH

NO

LO

GY

IBM’s Globally Integrated Enterprise Data Centers

IBM Strategic Delivery Model

GlobalResources

Strategic IGA Location

Strategic Web Locationfor IGA

Ethernet & Power9 Networks

Next Level of Infrastructure Challenge Floor space challenges in key facilities Underutilized assets in outdated Web infrastructure Continued infrastructure cost pressure Increase % IT spending to transformation initiatives


Stages of Adoption: IBM Journey

Physical consolidation of data centers, networks and applications

Simple like-for-like server and storage virtualization

Service tools, energy facilities mgmt

Drives IT efficiencySignificant progress toward highly virtualized

environment to enable pooled System z, Power Systems, System x and storage

Green production and advanced data center facilities

Shared service delivery model

Rapid deployment of new infrastructure and services

IBM Research “Cloud”Business-driven service

management pilotsGlobally Integrated Enterprise

Highly responsive and business goal driven


Enterprise Business Value – Expectations

Business case

Energy savings

Quality service

Early modeling identified significant potential for savings through zLinux virtualization Performed TCO virtualization assessment on IBM portfolio as cross-IBM effort

• System z, SW Migration Services, STG Lab Services, IBM Academy, ITO Migration FactoryIdentified substantial savings opportunity

• Energy

• Floor space

Annual energy usage to be reduced by 80%

Total floor space to be reduced by 85%

• 11,045 square feet for distributed solution

• 1,643 square feet for System z solution

Leverages maturity of System z stack products - high availability, resiliencyReduces complexity and increases stability, centralizes service mgmt Potential for faster provisioning speed (months days)Dynamic allocation of compute power Provides world-class security

• Labor

• Software

Comparison of Annual Energy Usage

for Workloads

Distributed Solution System z Solution

Kilowatt hours (K) Cost* ($K) Kilowatt hours (K) Cost* ($K)

Power 24,000 $2,400 4,796 $479

Cooling** 14,400 $1,440 2,877 $287

Total Energy 38,400 $3,840 7,673 $767* Electrical cost calculated at rate of .10 per kW ** Cooling is 60% of power cost


IBM Virtualization – Enterprise Data Center Journey

Agenda

• IBM Commitment/Announcement Highlights

―IBM Transformation and “Big Green”

―Internal Infrastructure Challenge, Approach and Benefits

• IBM Virtualization Update

―Virtualization Progress

―Business Case Approach and Client View of Savings

―Application and Workload Selection

―Successful Techniques and Lessons Learned


IBM System z Linux Virtualization Progress

Established phased approach for quick wins

Migrated initial servers from early ‘wave’ teams• Thousands of servers inventoried

• Multiple successful migrations delivering benefits as expected

• Decommission pipeline of hundreds of servers for reuse or removal

Comprehensive project plan and management system in place

• Integrated business priorities with transformational objectives

• ‘Work in progress’ approach to maximize server migrations

• Pipeline, process, technical, finance and communications support

Developed internal business case (RACE*) • Created detailed cash flow and labor analysis, migration expense, iterated

Technical solution, education plan and operational plan developed• Built upon IBM prior consolidation/simplification efforts, utilizing IBM offerings and capabilities

Highest level of support from IBM senior executive team

Phase 1

Phase 2

Phase 3-n

“Quick wins”

Highest Savings First

Finish Virtualization

*Formerly zRace


IBM System z Linux Virtualization Progress

IBM implementing New Enterprise Data Center through achievements in

• Server and storage virtualization

• Energy efficiency and resiliency improvements

Benefits are on track with expectations

• Migration management key

• Business case is compelling

• Using System z10 technology, the number of machines could be cut by about half, with greater savings in energy, floor space, software and support costs

Lessons Learned, including:

• Enterprise strategy and sponsorship needed to drive business case and execution

• Compelling business imperative accelerates execution and drives support

• Enterprise view of migration managed by waves drives experience; savings for investment

IBM experience is driving Time to Value initiatives, integrated into IBM capabilities

• Dramatic reduction in labor through new processes supporting workload migrations

• Fall in/out analysis, working with business units, to close gaps in workload pipeline

• Piloting new testing strategy, processes & tools to automate

5th Year

4th Year

3rd Year

2nd Year

1st Year

z9 Cumulative

Cumulative 5-Year Cost Comparison

DistributedCum


Virtualization Benefits are Significant; Migration Management is Key

Clients are able to leverage IBM experience and

capabilities to accelerate value

Expected Benefits of Virtualization

Substantial savings in multiple dimensions: energy, software and system support costs

80% less energy, 85% less floor space for consolidated environment

Improved inventory hygiene, including application to server mapping

Dramatically faster provisioning

Improved security and resiliency

Higher quality through reduced complexity, increased stability and availability

Large Scale Migration Challenges Exist

Decision-making: Integrating Enterprise and Business Unit view

Mindset/Culture related to distributed and mainframe worlds

Workload selection - multidimensional nature of selection process

Dated inventory records that are not centrally maintained

Detailed data required for internal business case

Project and program complexity – integrating multiple priorities


Business Case Leveraged RACE Tool, Iterative Approach

Utilized RACE commercial modeling tool

• Foundation for internal business case, constructed specific environmental variables

.

Engaged SME’s within IBM

• Provided business case assumptions (i.e. depreciation/maintenance), modified as appropriate

Iterative Process

• Continuously engaged with core SME’s to ensure most current information

Project Metrics

• Weekly report of migrated servers and their disposition status (reuse or disposal using GARS*) and Energy Certificate status

- Working to incorporate actuals into the Business Case such that we can refresh our assumptions

Created financial plan for “known universe”

• Identified relevant sample (5-10%) of most likely servers to be migrated and gathered financial profile information for each

*IBM Global Asset Recovery Services


Integration

• Integrated Functionality vs. Functionality to be implemented (possibly with 3rd party tools)

• Balanced System

• Integration of / into Standards

Further Availability Aspects

• Planned outages

• Unplanned outages

• Automated Take Over

• Uninterrupted Take Over (especially for DB)

• Workload Management across physical borders

• Business continuity

• Availability effects for other applications / projects

• End User Service

• End User Productivity

• Virtualization

Skills and Resources

• Personnel Education

• Availability of Resources

TCO: A Range of IT Cost Factors – Often Not ConsideredAvailability

• High availability

• Hours of operation

Backup / Restore / Site Recovery

• Backup

• Disaster Scenario

• Restore

• Effort for Complete Site Recovery

• SAN effort

Infrastructure Cost

• Space

• Power

• Network Infrastructure

• Storage Infrastructure

• Initial Hardware Costs

• Software Costs

• Maintenance Costs

Additional development/implementation

• Investment for one platform – reproduction for others

Controlling and Accounting

• Analyzing the systems

• Cost

Operations Effort

• Monitoring, Operating

• Problem Determination

• Server Management Tools

• Integrated Server Management – Enterprise Wide

Security

• Authentication / Authorization

• User Administration

• Data Security

• Server and OS Security

• RACF vs. other solutions

Deployment and Support

• System Programming

―Keeping consistent OS and SW Level

―Database Effort

• Middleware

―SW Maintenance

―SW Distribution (across firewall)

• Application

―Technology Upgrade

―System Release change without interrupts

Operating Concept

• Development of an operating procedure

• Feasibility of the developed procedure

• Automation

Resource Utilization and Performance

• Mixed Workload / Batch

• Resource Sharing―shared nothing vs. shared everything

• Parallel Sysplex vs. Other Concepts

• Response Time

• Performance Management

• Peak handling / scalabilityRoutinely Assessed Cost Factors


Client View of TCO Comparison for Similar Distributed Workload vs. System z Linux results in Potential 60-75% Gross Costs Savings / 5 yrs

* HW Acquisition compares server/disk refresh of distributed environment to the cost of acquiring new mainframes/storage

Software45%

Connectivity2%

HW Acquisition*17%

Labor19%

Facilities6%

HW Maintenance11%

Potential Savings: Categories as a % of Gross Savings

Dramatic Simplification

Unit DistributedSystem z

Linux%

Reduction

Software Licenses

26,700 1,800 93%

Ports 31,300 960 97%

Cables 19,500 700 96%

Physical Network Connections

15,700 7,000 55%

Operating Cost: Distributed vs. Mainframe

Distributed Cost System z Linux Cost

Rel

ativ

e C

ost

Facilities

HW Acquisition

HW Maintenance

Software

ConnectivityLabor

Results will vary based on several factors including # of servers and work load types


By formally decommissioning servers, IBM is able to demonstrate energy savings and receive energy efficiency credits (EECs)

Energy Efficiency Certificates Deliver Savings

Diagnose

Build

Cool Virtualize

Manage & MeasureGreen

Data Center

Client requirementsLower energy costs and achieve business benefit of Energy

Efficiency

Demonstrate Energy Efficiency Commitment

SolutionVirtualized workloads onto System z platform and reduced energy

consumption

Hundreds of servers in pipeline to be redeployed, sent to GARS* and/or energy efficiency certificates issued

IBM applied for EECs for eligible decommissioned servers to receive Energy Efficiency Credits

GARS for asset reuse, recycling and/or reclamation

BenefitsQuantifable energy reductions, tradable certificates

Demonstrated commitment to energy efficiency

IBM Launches World's First Corporate-Led Energy Efficiency Certificate Program

In Conjunction With Neuwing Energy, Program Will Provide Clients Documentation and Third-Party Verification of the Energy Saving Results of Their Projects. Read more

November 2, 2007 Press Release

EECs quantify, measure, verify, certify and monetize data center energy efficiency projects

What is an Energy Efficiency Credit?

A Neuwing EEC (Energy Efficiency Credit) is a measured & verified Megawatt Hour (MWh) of Energy Savings i.e., Energy Efficiency



IBM is Using a ‘Work in Process' Approach to Manage the Migration

Management Approach and Reporting

Process approach borrowed from factory line management

Metrics for each process and sub-process

Quality measured with process fallout – tracked by cause

Daily status calls for issue resolution

Weekly status reporting for CIO and management team


Decommission Process Overview

Server available as a result of virtualization efforts

Capture savings in business plan and business case

Tracking tool is updated to reflect disposition of the assets in the project

Request completed to

coordinate shipping

and update property

control

Complete Machine List Database and

ship to GARS*

If redeployed If not redeployed


Server Ready

Check for technical viability and asset value to determine if h/w is a

redeployment candidate

Apply to Neuwing for energy efficiency

certificates


Application ViewBusiness Unit Partnership

Environment ViewManaged ‘Offerings’

DevelopmentIntranet

BU Environments

Location ViewBoulder

PoughkeepsiePortsmouth

RaleighRochester Southbury

Technology ViewDominoEmail

Static Web DB2

Linux on x86

Migration Candidates Sourcing

Enterprise Approach to Workload Migration


Each Workload is Evaluated for Suitability Based on Technical Attributes

Priority Workloads for Consolidation:

WebSphere® applications

Domino® Applications

Selected tools: Tivoli®, WebSphere® and internally developed

WebSphere MQ

DB2® Universal Database™

Work

load

s with

softw

are

avai

lable

on uniq

ue sy

stem

onlyUnique

require

men

ts®

Already easily

virtualized on AIX®/UNIX®

Decision criteria/thresholds can be tailored by the client

Technical Attributes Software compatibility or

platform dependency

Workload characteristics

Not fully virtualized / optimized

apply

Grey area (not clearly one platform or another)

Other components

of app on mainframe

High and/or transactional

I/O

Proximity to other data on

mainframe

Test in support of prod on

AIX®/UNIX®

Small counts of isolated images

Already virtualized on Intel®

Default is chosen

when no other criteria apply


Blue Harmony is an enterprise transformation program that will enable IBM’s Globally Integrated Enterprise (GIE) vision

Blue Harmony will enable the GIE by …

• Integrating core processes that are now business unit and region-specific, such as Quote to Cash, horizontally across the enterprise on a common systems platform – SAP

• More effectively integrating the global processes that are already in place

―For example, Blue Harmony will integrate financial reporting with business reporting, providing real time access to consolidated data

The significance of the Blue Harmony name

• The name reflects how we are harmonizing process, technology and people across the IBM enterprise


Blue Harmony: Main Hardware Components

Portsmouth, U.K.

Poughkeepsie, N.Y.

oc48 oc48

dcx-4schannel extenders

subnetCisco 3750E-48T

CD 6504

ATTWAN1200 GSR

Xseries

Tape

pair of 2027-140 Ficon directors

z10 z10

DS8300A DS8300B

Metro mirror

z/OS Global Mirror

2109-m482109-m48

2498-M48

p595 p595

2498-M48

Tape


Blue Harmony Mainframe Configuration

Hardware – z/10 - 56-way – 7,674 MIPS on general CPUs

• 11 General Purpose CPUs (CPs) for z/OS

• 6 Internal Coupling Facilities (ICFs) for sysplex sharing

• 6 Integrated Information Processor (zIIPs) for DB2

• 17 Integrated Facility for Linux (IFLs)

Connectivity

• 10 OSA 1000 Base T (4 ports)

• 10 OSA 10GbE (2 ports)

• 2 ESCON (16 port)

• 7 ESCON Channel Ports

• 2 FICON Express4 (SX, 4 ports)

• 36 FICON Express4 (4KM, LX, 4 ports)

• 8 ICB4 (data rate of 2GBps)


Blue Harmony US Implementation

Z/OS NABA/B/C/DSAP

Z/OS NABF/G/H/ISAP

SY

SP

LE

X 1

Z/OS DB2

Z/OS DB2

ICF

8ICB-4

TCP/IP

Hypersockets(QDIO)

Z/VM

Z/L

iNU

X 1

Z/L

iNU

X 2

Z/L

iNU

X 3

Z/L

iNU

X 4

Z/L

iNU

X 5

Z/L

iNU

X 6

Z/VM

Z/L

iNU

X 1

Z/L

iNU

X 2

Z/L

iNU

X 3

Z/L

iNU

X 4

Z/L

iNU

X 5

Z/L

iNU

X 6

GDPSRESTORE

OSA

Blue Harmony Mainframe 1 – Serial Number A40D0

ICF

8ICB-4

OSA

Blue Harmony Mainframe 2– Serial Number 8D5FC

2-I

Cs

2-IC

s

FIRST Z/10 SECOND Z/10 BlueZone

© 2009 IBM Corporation24 IBM Confidential24

What is z/Linux at IBM

What is z/Linux

• Provisioned underneath a hypervisor

• Provisioned barebones (not supported within my current architecture)

Currently supported are:

z/ECM: The Enterprise Computing Model (ECM) is an IBM initiative to virtualize 25 percent of our own distributed IT infrastructure onto System z mainframes. ECM is also an IBM showcase for Project Big Green and the New Enterprise Data Center.

Blue Harmony: IBM's major transformation program to radically simplify the design and operation of three Globally Integrated Support Processes - Finance, Opportunity To Order, and Order To Cash

© 2009 IBM Corporation25

What is z/Linux the OS

z/VM is an operating system that runs on the z/Series mainframes. It functions both as a fully functional operating system as well as as hypervisor, much like VMWARE.

z/VM builds on more than three decades of virtualization experience. z/VM’s CP (Control Program) is a hypervisor which has been built with isolation in mind and extended this to in-memory network virtualization.

The System z based workload and as such Linux on z based services are ideal for low utilized services which exploit the fast context switching capabilities provided by the hardware, the z/VM hypervisor and the many hardware assist functions.

z/VM allows the virtualization of memory, CPU cycles, storage, consoles, and even can emulate non-existing hardware like for example a virtual zAAP processor to speed up JAVA workload.

The IO subsystem provides many advantages to high volume IO workloads, and for LINUX on z SCSI devices can be assigned as well as the mainframe typical CKD devices.

From ACS, “Mainframe on the Cloud”From ACS, “Mainframe on the Cloud”


IBM’s Enterprise Architecture for ECM: Security Zones

Possible Architecture Path for Single LPAR with Multiple Security Zones and Multiple Clients is represents a possible solution to have multiple security zones and multiple customers (MSZ) on the same LPAR.

This would be used for supporting small and medium business and Clients who deploy multi-network-zone tiered architectures where firewall separation of zones happens externally.

This architecture is currently in place today supporting IBM internal businesses.


Enterprise Architecture

Security zone 2

SANs

VLAN Security Zone 3

Vlan

Vlan

Security zone 1 VPN

Firewalls

VLzH Operational Model - Conceptual

Tools Network

SRM GCMS TLM ? eESM Tivoli TEC & TMR IIP

Tivoli Gateway

Tivoli Gateway

backup

backup

Operations team System

Administrators

Build Server

Build Server

Build Server

Build Server

)

Build Server

Build Server

Hardware Asset Mgt Security

Management

Dynamic Resource

Prov

Other Config Mgt

Provisioning Server

Proxy Load Balancer

Internal Network

Proxy Load Balancer

Z System

LPAR

LPAR

LPAR

Internet

Single LPAR on z

Possible design path for using a single LPAR with multiple security zones or multiple Clients. This involves network

design and security approval

Net

wor

k V

irtua

lizat

ion


How to manage and access the Linux guests

The four major components for managing and access the Linux guests are:

Hardware Monitor Console (HMC)

Integrated Communications Console (ICC)

VSWITCH

z/VM Management

The z/VM management for the LPARS can share the same physical ports while the VSWITCH(s) have dedicated ports.

http://news.cnet.com/8301-19413_3-10237274-240.htmlhttp://news.cnet.com/8301-19413_3-10237274-240.html


How to manage and access the Linux guests: HMC

Hardware Monitor Console (HMC)

The HMC is the lowest level of management connectivity of a physical zSeries processor. Through the HMC hardware components are mapped and assigned to various parts of the box and defines the bounds of the different logical partitions inside the box. Operations use the HMC to implement the configuration of the box and IPL LPARs.

The HMC is physically connected to the processor by two Service Element ports.

These HMCs provide the interface to configure the processor. An administrator can physically access the HMC to configure the box or remotely connect to the HMC via an internal web service.

Private Network

Hardware Master Console(Linux or OS/2)

ServiceElement

ServiceElement

Rest of Network


How to manage and access the Linux guests: ICC

Integrated Communications Console (ICC)

Operator management of the zSeries processor can also be made through the Integrated Communications Controller (ICC). The two ICC_OSA ports are connected to two different LAN segments for redundancy in case there are network issues accessing one of the ports. The port has an IP address that is routed in the network for AF/Remote Server connectivity can be established. OSA-ICC’s are shared between LPARS.

The AF/Remote Server communicates to the ICC_OSA via TN3270 traffic. Administrators utilize a software client to connect to the AF Remote Server which then downloads a list of LPARs the administrator is able to control. When the administrator opens a connection to the LPAR, the connection is proxied through the AF Remote Server before being passed to the processor.

The client on the administrator’s workstation communicates to the AF Remote Server over port 1035 (default). From there, the AF Remote Server communicates with the ICC_OSA over port 1024 (default).

Access to ICCCopper Only

OSA pairTN3270 traffic

AF Remote ServerSA IOM

(Windows Server)

AF Remote ServerSA IOM(backup)

ICC_OSA ICC_OSA


How to manage and access the Linux guests: VSWITCH


VSWITCH

The z/VM management for the LPARS can share the same physical ports while the VSWITCH(s) have dedicated ports. External connectivity to a network from inside an LPAR can be established via a Layer 2 VLAN aware VSWITCH.

The OSA ports plug directly into a Layer 2 switched distribution box. This utilizes the VSWITCH as the access switch in a multi-tiered architecture. The deployments in questions however vary on how the VLANs are deployed to the given LPARs.

VMController

OSA

VSwitch

LPAR

zVM

Linux Linux

OSA

Port 1 Port 2

Port 52Port 51

Linux

Port 50

zSeriesProcessor

VMController

VSwitch

LPAR

zVM

Linux Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

VMController

OSA

VSwitch

LPAR

zVM

Linux Linux

OSA

Port 1 Port 2

Port 52Port 51

Linux

Port 50

zSeriesProcessor

VMController

VSwitch

LPAR

zVM

Linux Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

SiSi SiSi

Backup OSA

SD Block

SiSi SiSi

Service Core

VMController

OSA

VSwitch

LPAR

zVM

Linux Linux

OSA

Port 1 Port 2

Port 52Port 51

Linux

Port 50

zSeriesProcessor

VMController

VSwitch

LPAR

zVM

Linux Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

HSRP Active

OSA OSA OSA OSAOSA OSA


How to manage and access the Linux guests: z/VM TCPIP


z/VM Management TCPIP

Once an LPAR is created on the processor, with a TCP/IP Stack for CP/CMS access, an administrator can logon as a controller session to handle the creation and management of Linux guests inside the LPAR. The controller only manages inside the LPAR and is separated from the production traffic of the Linux Guests. The TCP/IP stack will have a VIPA IP address with two OSA interfaces for redundancy.

The first option shows OSAs pairs that are used for shared management only, across all z/VM instances that exist on a single processor. It is defined by sharing the OSA across all LPARs which is separate from production traffic. Access to these management OSAs need connectivity to the network segments where administrators sit.

OSA

VMController

VSwitch

zSeriesProcessor

zVM

Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

Group 1

Linux

OSA

Port 49

OSA

VMController

VSwitch

LPAR

zVM

Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

Group 2

Linux

OSA

Port 49

OSA

VMController

VSwitch

LPAR

zVM

Linux

Port 1 Port 2

Port 52Port 51

Linux

Port 50

Group 3

Linux

OSA

Port 49

OSA OSA

LPAR


Performance and Capacity Tooling

To the right, you can see the overlap of capacity planning and performance.

Note that both team rely on the VMACCT data. Although Omegamon has an agent, it is not providing good response time in our environment.

SRM data utilizes standard SUSE10 monitors which are used illegally (as the CPU time is incorrect due to the virtualization). However, midrange people prefer the interface.

Work is ongoing to load the VMPTK hypervisor data into SRM to correct the CPU utilization data.

VMPTK does provide several linux level reports lxmemory and lxcpu logs that show memory and cpu usage at the linux level. (VM- Performance Tool Kit).

Z9 ECS38/512GB

LINUX1

LIN

UX

1

LIN

UX

2

LIN

UX

3

LIN

UX

4

LIN

UX

5

LIN

UX

6

SLES10 64bit

WAS DB2

Application

Omegamon XESAR

vmstatps

NmonITM

PTKOmegamon XE

ELAPSE

Other/Application Specific

Capacity

zRACE 1080z/VM Capacity Planner

SCPON (internal)

SRM (example)Smoothstart

Software Requirements

Other/Application Specific

Performance

Reduce Agents on GuestEach agent consume 2% MIPS


Performance and Capacity: Separation of duties

Note the different responsibilities here between Capacity, z/VM, Performance, and z/Linux.

The interesting thing here was that z/VM retained their z/VM skills and z/Linux moved to folks who had previous mid-range UNIX skills.

Z9 ECS38/512GB

LINUX1 LINUX2 ZOS1

LIN

UX

1

LIN

UX

2

LIN

UX

3

LIN

UX

4

LIN

UX

5

LIN

UX

6

SLES10 64bit

WAS DB2

Application Application Developers

Middleware Competency

zLinux Performance

z/VM Performance

Capacity Planning

Responsibilities


Tooling for Alerting

Z/VM LPAR Native Console and Hardware Alerting

HMF, WWAO Native Console alerts can be filtered and acted upon by HMF and ultimately focal pointed to another Z/OS system via PSM with WWAO.

Z/VM LPAR CPU PTK and Omegamon XE Performance Toolkit and/or Omegamon XE provide the ability to proactively monitor CPU utilization.

Linux Guest Linux Middleware - DB2 / MQ Series

ITM for Databases*Omegamon XE for Messaging (ITM 6.1)

"ITM for xxxx" are monitoring extensions whch are used for either standard middleware like Lotus DB2, MQ, Oracle - or industry wide applications like Notes, Siebel, SAP. A monitoring extension will contain both resource monitors checking for something specific (i.e a DB2 table space becoming full) or checking logs (i.e the MQ Series error log)

Linux Guest OS - Linux IBM Tivoli Monitoring Base - OS Agent ( Disk, CPU, OS Processes)*Logfile adapters / Log Monitors for OS Logs

This is primarily handled through some form of OS Agent monitoring. The key advantage is that no matter how many occurrences of a problem you get, there will be one alert generated until the problem is resolved. A logfile adapter can also be utilized as an adhoc monitoring aid where NO existing agent can satisfy the requirement.


ITM6 Tooling Integration

Middleware and Applications

Customer Locations(Customer Premise/Extended

Premise)

(VM Lpars)Mainframe Customer Systems

(SuSE, Redhat, etc.)Linux Customer

Systems

Systems Management Infrastructure

IBM Locations(IBM Intranet / SNI Network)

Omegamon / ITM 6

Infrastructure

Intranet User

Problem Management

Service Management

View

Rep

orts

Situ

atio

n D

ata

Age

nt C

onfig

urat

ion,

Pol

icy

& S

ituat

ions

T/EC

Internet User

View

Portal D

ata

System AdminsOperations

ServerV

iew P

ortal Data

View

Portal D

ata

View

Portal D

ata

Configure M

onitoring

Update Portal Status

Receive Event / Perf Data

Future Integration

TDW

Situ

atio

n D

ata

Age

nt C

onfig

urat

ion,

P

olic

y &

Situ

atio

ns

Long Term Reporting

DB, SQL, Oracle

Websphere, Exchange

Situation Data

Agent Configuration, Policy & Situations

Situation Data

Agent Configuration, P

olicy & Situations

zOS Focal

z/VM LPAR

z/VM

Operations

View

Event D

ata

z/VMFo

cal P

oint

ing


Tooling: Tivoli

• The Tivoli Enterprise Management Server (TEMS) can be seen as the central repository for all monitoring data and resources. In particularly it stores the definitions for conditions that indicate a problem with a particular resource, these are known as situations. It also controls the security for the monitoring solution. Each enterprise monitoring solution must contain one Hub TEMS, and can include multiple Remote TEMS. Unlike previous Tivoli Monitoring products where the lower levels of the Architecture (Endpoint or Gateway) interface directly with event management, it’s the HUB TEMS that controls forwarding of Events to the T/EC Component.

• The Tivoli Enterprise Portal Server (TEPS) functions as a graphical access server that displays near time and real time status indications on the health of resources in the enterprise. It also serves as a repository for all user data, such as the user IDs, user access control for the monitoring data, meaning what data each user will be able to access, and how it is displayed. It connects to the Hub TEMS and can be accessed by the TEP clients who are either desktop clients or browser clients. Data is only available in the TEPS for a maximum of 24 hours, and the component should not be considered a historical reporting capability. The TEPS Database is also used to store internal data used by the TEPS (TEP user IDs, queries, and navigator views).

• Tivoli Enterprise Console Server provides a centralized location for the management of events. The server creates an entry in the event database for each incoming event and then evaluates these events against a set of rules to determine if the event server should automatically perform any predefined tasks or modify the event. If human intervention is required, the event server notifies the appropriate operator. The operator performs the required tasks and then notifies the event server when the condition that caused the event has been resolved. In addition, there is the option of enabling a form of automated trouble ticketing to a desired Problem Management system.

• Tivoli Enterprise Monitoring Agents (TEMA) are the data collectors for the ITM monitoring solution. They are installed to gather data from one or multiple systems that are critical to the client business.

• Data that is collected by the agents is sent to the central Tivoli Enterprise Monitoring Server. Agents are type-specific (I.e. Unix, Linux, SAP, DB2, Omegamon XE on z/VM and Linux). The Omegamon XE on z/VM and Linux agent will be an integral part of this solution and helps address performance bottlenecks on z/VM resources. The three main areas where monitoring can be beneficial are:

• Monitoring processor usage at the system level

• Monitoring paging and spooling activity

• Monitoring resources and workloads across LPARs


Monitoring Diagram for Existing ECM Farm

Existing z/os Mainframe

Existing WWAO z/OS Mainframe Focal Point

Existing z/os Mainframe

New ECMz Mainframe

Problem Ticketing system

Existing z/VM MainframeNew ECMz Mainframe

Existing Customer Servers

Existing Tivoli Enterprise Console

(TEC)

Tivoli Enterprise Management Server

(TEMS)Tivoli Enterprise Portal Server(TEPS)

Mainframe OperatorWorkstation

Distributed OperatorWorkstation

Existing Auto TicketingExisting Auto Ticketing

Existing Problem Management

Existing Distributed Operator alerting and command/response

Existing Mainframe Opertor alerting and command/response

Linux Guest Alerting

ECMz MonitoringArchitectural

Overview Diagram

Version 2.0 26/11/07 Richard Short Version 3.0 March 10, 2008 Gene Capano

Performance Monitoring data

z/VM Alerting

Existing Auto Ticketing

Tivoli Enterprise Monitoring Agent

(TEMA)

Tivoli Enterprise Monitoring Agent

(TEMA)


Tivoli Monitoring Flow

Local Cache

Agents

TEMS Hub

TEPS

TEPS DB

TEMS Data (Built in operational DB)

Warehouse

TEP Browser or Desktop

MonitoringInfrastructure

TEMS Data Remote TEMS

Local Cache

Agents

Local CacheLocal Cache

AgentsAgents

TEMS HubTEMS Hub

TEPSTEPS

TEPS DBTEPS DB

TEMS Data (Built in operational DB)TEMS Data (Built in operational DB)

Warehouse

TEP Browser or Desktop

MonitoringInfrastructure

TEMS DataTEMS Data Remote TEMSRemote TEMS

Local CacheLocal Cache

AgentsAgents

Key:

TEP – Tivoli Enterprise PortalTEPS – Tivoli Enterprise Portal ServerTDW – Tivoli Data WarehouseTEMS – Tivoli Enterprise Monitoring ServerTEMA – Tivoli Enterprise Monitoring Agent


z/VM and z/Linux Monitoring Flow

Here is the monitoring flow from the VMPTK into and the previous Omegamon XE solution (now lxmemory and lxcpu from VMPTK).

Our data ends up in a DB2 DB on z/OS.

Not much has really changed in 20 years!

CP Control

Blocks

Application

DataVM Events

*MONITOR System Service

MONDCSS

Segment

MONWRITE Utility

Performance

Toolkit

Raw

Monwrite

History

Files

HTTP

TCP/IPNetwork

DCSSSEGOUT

LINUXGuest

3270

Browser

OMEGAMON XE

VMRM

TEMS

VMIRA


How Tivoli interacts with the z/Linux Guests

• This is a logical diagram, not a physical one.

• Note that all data flows from the z/VM host.

• Internal monitors are supported by SRM, but not by Tivoli.

z/VMz/VM

PTK

Extensions

CMD(Guest)

Linux(Guest)Linux

(Guest)Linux

(Guest)Linux

(Guest)Mon

VMIRA

LinuxIRA

LinuxIRA

Linux(Guest)Linux

(Guest)DCSSDCSS

LinuxIRA

LinuxIRA

LinuxIRA

LinuxIRA

TEMS

TEPSTEP TDW

Tivoli Management Services

z/VM command and response

flows

Data and synchronization flows

CPCP


Autoprovisioning using IBM Director

IBM Director Administrator

Console

IBM Director Mgmt Server

Linux SuSE

CIMServer

z/VM MAP

NIC 1

NIC 2

VSWITCH

MAPLAN

MAPSERVE

TCPIP

VSMSERV

PORTMAP

NIC 1

OSA

DIRMAINT

VMRMSVMLinux RH

Linux S10

Linux SuSE

KickStart Server

OS ImagesSoftware Packages

NIC 1

NIC 1

NIC 1

OSA

z/VM Center Console extension

z/VM Center Server extension

z/VM Mgmt Access Point Agent extension

Director Console

Director Server

SBLIM Client

CIMServer

NIC

NIC

LAN

TPM Server

TPM for Software

Fusion

NIC

Global Delivery CentersGlobal Delivery

CentersGlobal Delivery Centers

Linux RH

Linux RH

Linux S10

Linux S10

1

1

2

3

4

2

z/VM

1

Cloning Solution(Canada)

(IGA + Commercial)

IBM Director SWD

Virtual Machine Manager

NIC

Tactical ApproachStrategic Solution

Breeder Solution(EMEA)

(IGA + Commercial)

Cloning/Breeder Solution provide subset of the strategic solution

Remote Access to IBM Director and other Consoles(via Remotely Anywhere)


Autoprovisioning: The way it really works (cloning)


Autoprovisioning: Why

• z/VM Center manifests itself inside the z/VM hypervisor as a zLinux service machine, making available the CIM Instrumentation (CIM is also known as Pegasus, and stands for Common Information Model) for z/VM, and creates the API by means of the z/VM MAP function. As seen above, both, IBM DIRECTOR Virtualization Manager and the IBM TIVOLI Provisioning Manager exploit the same API and CIM as both products have been designed to be platform (Operating System) agnostic.

• The flexibility by separating the GUI from the automation and the hardware line it runs on makes it also inflexible for zLinux deployments. So far it is not clear how much it can be tweaked to fulfill the above mentioned target architecture requirements. Testing needs to be performed and with a focus on exploiting Exits rather than changing the product.


Backup Methodology.

zLinux Operating System (OS) image and essential filesystems and data to the zLinux OS will be maintained on FICON/ESCON attached disk. OS data and images are not backed up using the same technology as Application Data. The primary technology deployed for backup of Application Data on zLinux guests is the IBM Tivoli Storage Manager (TSM) using Automated Tape Libraries (ATLs) or newer Virtal Tape Libraries (VTSs) to store backup data.


Backup and Recovery Model

Security Zone 4

LAN

Security Zone 3

LAN

Security Zone 2LAN (DMZ)

ECM System Z Servers

Unix/AIX Servers

NE

TW

OR

K F

IRE

WA

LL

NE

TW

OR

K F

IRE

WA

LL

LTO drives in ATLs

359x Drives in VTSs ECM System Z

Servers

Unix/AIX Servers

FCP

Unix/AIX Servers

LTO drives in ATLs

Sec

urity

Zon

e 1

(In

tern

et)

NETWORK FIREWALL

ECM/z BZ AU

ECM/z RZAU

ECM/z RZ (Internet)Application User (AU)

Storage Services for ECM/z: Architecture Overview Diagram

MSmith IBM GTS Stor Comp 2007-11-30

FICON

Datacentre Perimeter

TSM Backup Servers

TSM Backup Servers

TSM Backup Servers

FCP

FC

P

FICON

FCP

FCP

LTO drives in ATLs

FCP

FCP

DS8300

359x Drives in VTSs

ECM/z BZ SAs

ECM System Z Servers

MSS YZ SAs &SAN Designers

DS8300

FCP

FC-SAN FCP

FCP

FC-SAN

FCP

FCP

FICON

FICON

Security Zone 3Security Zone 2Security Zone 1

Security Zone 4

FCP

FCP


Backup


Defining cloud computing.

Cloud computing is a new consumption and delivery model inspired by consumer Internet services. Cloud computing exhibits the following 5 key characteristics:

•On-demand self-service •Ubiquitous network access•Location independent resource pooling•Rapid elasticity•Pay per use


What does it really mean?

On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed without requiring human interaction with each service's provider.

Ubiquitous network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).

Location independent resource pooling. The provider's computing resources are pooled to serve all consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. The customer generally has no control or knowledge over the exact location of the provided resources. Examples of resources include storage, processing, memory, network bandwidth, and virtual machines.

Rapid elasticity. Capabilities can be rapidly and elastically provisioned to quickly scale up and rapidly released to quickly scale down. To the consumer, the capabilities available for rent often appear to be infinite and can be purchased in any quantity at any time.

Pay per use. Capabilities are charged using a metered, fee-for-service, or advertising based billing model to promote optimization of resource use. Examples are measuring the storage, bandwidth, and computing resources consumed and charging for the number of active user accounts per month. Clouds within an organization accrue cost between business units and may or may not use actual currency.

http://news.cnet.com/8301-19413_3-10237274-240.htmlhttp://news.cnet.com/8301-19413_3-10237274-240.html


Pay no attention to the man (person) behind the curtain!


How does this apply to Z/Linux?

Provide on demand service levels.

Fast service provisioning.

Capacity Planning

SO must prepare to react on IBM’s marketing strategies (Cloud and Virtualization)

Cost:

Datacenter

Hardware

Management

Hypervisor Management

Author: Gerhard WidmayerAuthor: Gerhard Widmayer


How does this apply to Z/Linux? (Service)

Provide on demand service levels.On demand service levels are for example “zero planned outage”, “scale-up” and “scale-out”, or “pay per usage”. The service levels can only be provided in a strict virtualized server resource pools infrastructure. SMB customers can not afford the hardware price for a cluster which they would utilize only at a small percentage.

Author: Gerhard WidmayerAuthor: Gerhard Widmayer


How does this apply to Z/Linux? (Provisioning using the free pool)

Fast service provisioning

Server ordering and infrastructure adjustments is making provisioning a week long activity at best. This is also true because SO must create a new design for each service request rather than establishing a service inside a predefined infrastructure unless there is an enterprise architecture, such as ODCS. Competition is delivering on demand basic server services within 24 hours after accepting the request.

Using a virtualized free pool there will always be some resource available in the headroom in an appropriate resource pool (selected target location) if it is built to be multi-tenant enabled. Any service establishment is just a set of commands away, true for servers, storage, but also the real network. Increase of the resource server pool is only needed when thresholds are reached.

Patent 7685283 shows how to size for the free pool, “METHOD FOR MODELING ON DEMAND FREE POOL OF RESOURCES”

Author: Gerhard Widmayerwith adds by Clea



How does this apply to Z/Linux? (Capacity Planning)

Capacity Planning –

Behind the curtain, define servers by actual consumption rather than by peak workload plus uplift.

Today’s distributed servers face the problem of hardware underutilization. Normal utilization values seen are 8% average in the distributed world.

Currently, the customer pays what was ordered regardless of usage. At the ODCS, there was an LPAR charge and a utilization charge.

In order to establish “on demand” pricing there is a need to put multiple services onto a single box, exploiting virtualization, like the ODCS. With the ever increasing compute power year by year the “boxes” became that big that utilizing single boxes by single network zones or SMB customers is very unlikely.

The ability to do coadunation is the ability to put each virtualized LPAR in it’s mathematically correct spot.




Coadunation Example

Mixing workload shares headroom but you pay in response time at low utilization....workload management shifts peaks based on business priorities to use

"white space" but response time of lower priority work is traded off...


How does this apply to Z/Linux? (Moolah)

Datacenter Cost

With the Green IT there is a drive to move towards larger boxes for network, storage, and servers in order to optimize the capacity per watt. For optimization of environmental savings (space, power, cooling) these large boxes need to perform at a high utilization level, which requires a very flexible setup with many isolated network zones on it.

Migrating pSeries hosts into z/Linux is proving to be quite cost-effective. The ECM consolidated environment will use 80% less energy and 85% less floor space.

Specific energy savings follows.

Hardware Cost

HW acquisition and maintenance cost goes down from a distributed environment to a z/Series environment. Further, SPOF failover is cheaper as well in the mainframe world.

Software45%

Connectivity2%

HW Acquisition*17%

Labor19%

Facilities6%

HW Maintenance11%

Potential Savings: Categories as a % of Gross Savings




Ubiquitous network access: Based on the vSWITCH




VSWITCHES and OSAs

From a security risk mitigation perspective a distinct VSWITCH and dedicated physical OSA ports and real switches is worth a consideration.

However the basic design should verify that the VSWITCH and VLAN isolation if bundled together with customer and application traffic is secure. In such a case the (S)CF firewall becomes a PF (public firewall), and the 3/4-NIC design would be preserved.




VSwitch, Continued

VSWITCH is a z/VM established virtualization of a real switch. In this deployment guidance the VSWITCH is replacing the real access layer switch from an Ethernet hierarchy. For each of the three (or four) NIC’s in the NIC virtual server design a distinct VSWITCH is built. Each customer receives its own VLAN number, and each network zone receives also its own VLAN number. This VLAN assignment is true for all three or four NICs, so in effect always a triple or quad of VLAN numbers are assigned to each single network zone.

All VLAN’s of a type, like all Extended Premises, are combined in a single VSWITCH. Because the design is for SMB environments, this means that multiple customers and/or multiple network zones share a VSWITCH. The VSWITCH function must therefore provide the VLAN isolation and must disallow any VLAN cross-over interaction.

Remember that the basic design keeps the security control for allowed interaction flows in the firewalls outside the System z box, where it exists today.




How does this apply to Z/OS?

This page intentionally left blank.This page intentionally left blank.


Questions?

© 2009 IBM Corporation zLinux Measurements and Architecture Clea Zolotow Senior Technical Staff...

Documents

Transcript of © 2009 IBM Corporation zLinux Measurements and Architecture Clea Zolotow Senior Technical Staff...