(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

47
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Keith Blizard, Bob Tordella October 2015 Self-service Cloud Services How J&J Is Managing AWS at Scale for Enterprise Workloads ARC305

Transcript of (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Page 1: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Keith Blizard, Bob Tordella

October 2015

Self-service Cloud Services

How J&J Is Managing AWS at Scale

for Enterprise Workloads

ARC305

Page 2: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

What to Expect from the Session

- Reviewing Enterprise Challenges & Incorporating Cloud Capabilities

- Provide approach for enabling Enterprise Controls

- Example Architecture & Implementations

- Example Patterns (HPC & Workspaces)

- Lessons Learned

Page 3: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

J&J is a Global Health Care Leader

More than 270 Operating Companies in 60 Countries, with 126,000 employees

Selling Products in more than 175 Countries

The world’s sixth-largest consumer health, pharmaceuticals, and biologics company

The world’s largest medical devices and diagnostics business

Page 4: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Big Company, Big Challenges

Thousands of Systems

Complex IT Ops

Limited Financial Impact

Cloud Patterns & Acceleration

Automated IT Cost Transparency

Current State of Enterprise IT Cloud Strategy Offers Agility

Page 5: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Transformation to a Flexible Hybrid Cloud Strategy

N. America

DC

Provides complete infrastructure platform through

Amazon Web Services and integrated with J&J

processes and policies

On-Premise Cloud (OPCx)Virtual Private Cloud (VPCx)

Provides a highly flexible reference architecture (built

on VMware stack) to deliver ‘on-demand’ VMs inside

our Enterprise Data Centers or Co-location facilities

in each region

Europe

DCAP DC

Compliance Data Protection Operation Transparency Speed + Agility

N. America

Region

Europe

RegionAP Region

Page 6: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Virtual Private Cloud (VPCx) VisionEmpower the business by providing an integrated, scalable, secure self-service cloud IT platform that

enables agility, enforces policy, and accelerates best practices

Enable Agility

• Self Service

• Rapid Provisioning

• Capacity Mgmt.

• Full stack Availability

Ensure Policy

• AD Integration

• J&J AMIs

• Enterprise Logging

• Backup & Retention

• Firewall & Security Rules

Accelerate Best Practice

• Monitoring & Alerts

• VM Scheduling

• Encryption

• Software Config. Mgmt.

Page 7: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Enterprise Control without the Bottleneck

Preventative Controls

Detective Controls

Core principles for security,

compliance & management

Enforce Least Privilege Approach

Log Everything

J&J Identity & Group

Management

J&J Network Extension

Enforce our Images

Account Isolation

Page 8: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

xbot

Big Data Account

Workspaces

Account

Xbot / Management Architecture

AWS Services

VPCx

Help

Assurance

Monitor

VPCx

DB

xbot

Admin

AD

Console

Billing

AWS

Console

Billing

Project Owners

VPCx Administrators

HPC Account

• Centralized Policy Enforcement - xbot

• Each Application Account is completely

isolated from each other

• Controls are executed through both

Assurance and Enforcement tests run

every 10 minutes

• Tickets are created for drift to

allowable values

Page 9: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Enterprise Control - Queue Management & Automation

Work

Queue

Work

Items

API Execution @

Each Account:

List, Info, Delete,

Update, Setup,

Admin, Login

Metadata:

Project Details,

Allowable Cloud Objects,

Chargeback,

Acceptable Values

Ex: HPC Account

Ticket

System

Page 10: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

image = project.get_ec2_images(project_info['Id'], region, image_ids=image_id)

images = []

for img in image_objs:

unserialized_obj = binascii.a2b_qp(img['image'])

images.append(img)

instance_info[key][i.id]['Name']=i.tags.get('Name', '')

instance_info[key][i.id]['Env'] = i.tags.get('Environment', '')

instance_info[key][i.id]['Hostname'] = i.tags.get('Hostname', '')

instance_info[key][i.id][’ImageId'] = i.tags.get(’ami-id', ‘’)

If instance_info.img_id != allowable value

error.name = ‘instance-value-error’

error.value = instance_info

create_support_ticket(error.name=‘instance-value-error’)

Sample Control – Only Allowing Approved Images

Page 11: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Amazon DynamoDB – Project Metadata

Page 12: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Amazon DynamoDB – Project Level Exceptions

Page 13: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

CLI – Automation – Member Info

User Level Information

And access list

Page 14: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

CLI – Automation – Project Info

Project Lists including

account-code and

friendly name

Page 15: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

CLI – Automation – Project Info

Project Metadata

Project Level Service

Listing

Page 16: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

CLI – Automation – Adding Services

Adding New Service

for this Project

Page 17: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

CLI – Automation – Project Info

New Service Added with

corresponding IAM

roles, policies

Page 18: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

AWS Account & Infrastructure Layer Control

Xbot Account

Payer

Account

(Consolidated

Billing)

Consolidated Billing

Xbot Administration

Scalable to 1000s of accounts

App AWS

Account

(001)

Core

Project

Services

Users

Alarms

HPC

App AWS

Account

(002)

Core

Project

Services

Users

Alarms

HPC

Page 19: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Core

Page 20: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Project

Page 21: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Services

Page 22: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Users

Page 23: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Alarms

Page 24: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

HPC

Page 25: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Operating System & Database Layer Control

Xbot Account

App AWS Account (001)

RDS Amazon

RedshiftEC2

Operating System Database

Page 26: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Managing Amazon Redshift Controls

Encrypt

Sensitive Data

Work

Queue

Work

Items

Account Metadata:

Ex: HPC Account

Ticket

System

Checks 100s of

accounts every 10 min

for new instance;

enforces policy

AD Security

Group Sync

xbot

KMS

Page 27: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Sample Control ― Managing Redshift

audit policy requires:

# rotate_master_passwords=1hour

# apply_cw_metrics=95%CPUutil>60mins;85%DiskUsed>60mins;HealthStatus<1=10mins

# require_ssl=True

# enable_user_activity_logging=True; bucket_name=RegionalS3LogBucket

# backup_retention_period=35days

# modify_cluster(master_user_password=newpassword)

# publicly_accessible=False

# add_tags=‘Environment’;’Production’

# rotate_user_passwords=90days

# sync_users=(conn.rscluster)

## add users, set groups, revoke public schema

## drop users, move schema ownership

Page 28: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

User Federates into Account

User creates Cluster

Page 29: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Cluster Created

Within 10 minutes,

xbot takes over

Master User

Master User

Password is reset by

xbot every hour

Page 30: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Master User takes over, abstracts

itself by syncing with AD Security

Groups tied to that AWS Account

Page 31: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Begins to build a Profile / Group

Grants various permissions to group

and associates DBAs

Page 32: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Revokes Access to Public Schema to

ensure least privilege

Page 33: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Xbot detects new Cluster;

applies CloudWatch Alarms

Page 34: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Alarms

Page 35: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Xbot enables logging & sets

the maximum backup retention

Page 36: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Xbot updates Parameter Group

for SSL & User Activity Logging

Xbot resets the

parameter group

within 10 minutes to

enforce policy

Page 37: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
Page 38: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Xbot notifies users of

the changes to their

environment

Page 39: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Enterprise Log Management

Queries logs

out of DB

Rotates logs

every week

Temp Location

for Log Movement

Elastic Load

Balancing

S3

Amazon

Redshift

Data Pipeline

EMR

CloudFrontCloudTrail Config

EC2

RDS

Regional S3

Logging Bucket

No API Action to

send DB user

Activity Logs to S3

Regional S3

Logging Bucket

Copies to S3

Bucket

Page 40: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
Page 41: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

EC2 Elastic Load

BalancingS3EBS Amazon

Glacier

RDS Amazon

Redshift

Compute Storage & Content Delivery Database

AWS Components Orchestrated

DynamoDB

Amazon

Kinesis

Data Pipeline

EMR

VPC Direct Connect

Auto Scaling

CloudFront ElastiCache

CloudFormation CloudWatchCloudTrail

IAM SESSNSCloudSearch SQSSWF Python (boto)

WorkSpacesWorkDocs

Directory

Service

Trusted

Advisor Config

Networking Management Tools

Enterprise Applications

Page 42: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Common Architecture Pattern for Big Data or HPC

us-east-1 (10.X.X.X/25)

us-east-1a

10.X.X.0/27

us-east-1b

10.X.X.32/27

Connected VPC

VPC Peering

Amazon S3

Win/Lin

EC2

DynamoDB

us-east-1 (10.X.X.X/19)

Disconnected VPC for EMR

IGW

us-east-1a

10.X.0.X/21

us-east-1b

10.X.7.X/21

us-east-1c

10.X.15.X/20

Burst High Performance Computing (HPC) workloads

in Private Address Space in same Account

Take advantage of multiple

subnets / AZs for Spot

Instance Pricing

Common Use Cases

• Statistical Analysis on large data sets; e.g.

Genomic Sequencing

• Transformations of large complex data sets for

Advanced Analytics (Sales & Supply Chain)

• Machine Learning engines on unstructured or

non-relatable data

Large volumes of

Structured & Unstructured

DataDirect Connect

VGW

On-Premise Internal Data SourcesAdmins

OIA

Page 43: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

J&JDCs

JJNET

MFA

SCCM Site & DP

J&J Resources J&J Facility

Zero Client

ELB

Workspaces Account

Infra Comp Account

Core Infra Account Zero Client Account

TeradiciConnection

Manager

Workspaces Architecture Patterns

Comments

• Global implementation across NA, EMEA and AP

• Infrastructure components living within AWS for scale,

performance and management

• J&J Network extended into AWS

Page 44: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Tradeoff / Lessons Learned

- DevOps is heavily recommended for approach to cloud. Focus on

velocity of new capabilities & operational improvements

- Security Engagement and Partnership is critical

- Identify, Design and remain Diligent with your Cloud Principles

- Early evaluation with CMP – focus has been too much on IaaS &

Provisioning only

- Partnership with 3rd Party is crucial (Log Management, Web

Application Firewall, Utilization & Spend)

- Training of Enterprise IT Users is critical

Page 45: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Key Takeaways

- Lean into PaaS services

- Enable agility of the cloud to your end users through self-service

- Automate your enterprise controls

- Unleash power of the cloud for small to large patterns

Page 46: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Thank you!

Contact Details:

Keith Blizard – [email protected]

Bob Tordella - [email protected]

Page 47: (ARC305) How J&J Manages AWS At Scale For Enterprise Workloads

Remember to complete

your evaluations!