CTD Taking Advantage of Cloud Elasticity and Flexibility...

Post on 04-Jul-2020

1 views 0 download

Transcript of CTD Taking Advantage of Cloud Elasticity and Flexibility...

1©Cloudera,Inc.Allrightsreserved.

FredKoopmansSr.DirectorofProductManagement

TakingAdvantageofCloudElasticityandFlexibility

1©Cloudera,Inc.Allrightsreserved.

2©Cloudera,Inc.Allrightsreserved.

Publiccloudadoptionissurging

3©Cloudera,Inc.Allrightsreserved.

Clouderacustomersareleadingtheway

4©Cloudera,Inc.Allrightsreserved.

Speed Convenience Scale

Self-Service TCO

Hadoopwasbornforthecloud

5©Cloudera,Inc.Allrightsreserved.

Performance BillShock ApplicationPortability Security

DataGovernance

DataSovereignty HybridCloud Lock-in

But,cloudcomeswithitsownsetofchallenges

6©Cloudera,Inc.Allrightsreserved.

Liftandshifttheplatform

Optimizeeachapplicationindividually

ReconstructanEnterpriseData

Hub

Astepwiseapproach

7©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.

Liftandshifttheplatform

8©Cloudera,Inc.Allrightsreserved.

OpenEnvironment

Runthesameplatformindifferentcloudsoron

baremetal,socustomerscanmoveasneededwithoutmigration

orretraining

OpenEcosystem

450+certifiedISV’sassuresbackwardcompatibilityacrossreleases,socustomers

canleveragetheirpre-existinginvestments

OpenSource

Avoidvendorlock-in,andleveragecomponentssupported

bythecommitterswhodrivethe

communityroadmap

Opennessisevenmoreimportantinthecloud

9©Cloudera,Inc.Allrightsreserved.

Inon-premenvironments,manyapplicationstypicallyshareasingle,multi-tenantcluster

HDFS

10©Cloudera,Inc.Allrightsreserved.

Thecloudcreatesmore&smallerclusters,specializedforeachapplication

S3 AzureDataLake GoogleStorage*

11©Cloudera,Inc.Allrightsreserved.

• Differentdataconsistencymodels• Differentdirectorystructuresupport

Notadrop-inreplacementforHDFS

• Differentaccesscontrolmodels• Differentmaturitylevels

NotallObjectStorescreatedequal

• MostlyfinishedforS3• JustgettingstartedforADLS• NotyetstartedforGCS

NotyetuniversallysupportedbyCDH

Wheretostorethedata?

ObjectStoragegenerallybestchoice•Performanceoftengoodenough•GenerallycheaperperTBthanDAS•Scalesindependentlyfromcompute

12©Cloudera,Inc.Allrightsreserved.

SeparationfromHDFS

•S3Aconnector•ADLSconnector

Fillingthegaps

•Performance•Consistency•Renames

ClouderaFunctionalEquivalence

•Security•Governance•Backup&Recovery

Cross-ClusterSharing

•Permissions•Catalogue•Lineage

ObjectStoragesupportisrapidlyreachingmaturity

S3 ADLS

MapReduce Y Y

Hive Y Y

HiveonSpark Y -

Spark Y Y

HBase -

Impala Y -

Hue Y -

SupportasofC5.11

13©Cloudera,Inc.Allrightsreserved.

Howtoprovisionandmanagecloudinfrastructurecosteffectively?

Provisioningrequirements•Spinclustersup&downquickly•Grow&shrinkclustersdynamically•Selectrightinstancetypesforeachservice

•Leveragedemandbasedpricingwheneverpossible

Managementrequirements•Fullyautomatedandparallelizedinstallationandconfiguration

•Manageallaspectsofclustersecurityautomatically

•Retaindiagnosticandloginformationafterclusterisgone

•Supporttransientandlong-livedclusters

14©Cloudera,Inc.Allrightsreserved.

Easy• Singlepaneofglassforallcloudinfrastructure• Createtemplatestorunapplicationsinapre-optimizedmanner

Flexible•Multi-cloud:AWS,Azure,GCP• Hourlypricingwithautobilling&metering• Spotinstance/blocksupport

Enterprise-grade• IntegrationacrossClouderaEnterprise•ManagementofCDHdeploymentsatscale• DeeplyintegratedwithClouderaManager

ClouderaDirectorautomatesclusterlifecyclemanagement

15©Cloudera,Inc.Allrightsreserved.

Easyadministration• Spotinstanceresiliency• Automatedsecuritycredentialhandling

Transientclusteroperations• Optimizedclusterprovisioning• Automaticcollectionofdiagnosticsandlogs

Long-livedclusteroperations• Downtime-lessupgrade,patch,restart,andreconfiguration

•Monitoring,alerting,healthchecking,reporting,etc.

ClouderaManagerautomatesclusteroperations

ObjectStore

16©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.

Optimizeeachapplicationindependently

17©Cloudera,Inc.Allrightsreserved.

Really,fourdiscreteapplicationsononeunifiedplatform

Moderndataprocessing(ETL)atscale

DataEngineering

Explore,analyze,andunderstandallyourdata

AnalyticDatabase

Data-drivenapplicationstodeliverreal-timeinsights

OperationalDatabase

Multi-Storage,Multi-Environment

Exploratorydatascienceandmachinelearningforthe

enterprise

DataScience

18©Cloudera,Inc.Allrightsreserved.

DataScience&Engineering

AccessPatterns• Batch• Canbetransientorpersistent

PerformanceNeeds• Relativelyinsensitvetolatencyanddatalocality

Security• Securityoftennotrequiredformanyusecases

OperationalDatabase

AccessPatterns• Real-time• Typicallypersistent

PerformanceNeeds• Typicallyquitesensitivetolatencyanddatalocality

Security• Fine-grainedsecurityoftenrequired

AnalyticDatabase

AcessPatterns• Batchorinteractive• Canbetransientorpersisent

PerformanceNeeds• Relativelyinsensitvetolatencyanddatalocality

Security• Fine-grainedsecurityoftenrequired

Needsofeachapplicationcanvarygreatly

19©Cloudera,Inc.Allrightsreserved.

DataScience&Engineering inthecloudThree architecturalpatternstooptimizeprice,convenience,performance

BatchCluster

TransientBatch(mostflexible)Spinupclustersasneeded● On-demand/spotinstances● Usage-basedpricing● Sizedforworkload● Clusterpertenant/user

BatchCluster

BatchCluster

PersistentBatch(mostcontrol)Persistentcluster(s)forfrequentETL● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentClusterBatch

PersistentBatchonHDFS(fastest)TopperformanceforfrequentETL● Reservedinstances● Node-basedpricing● Grow/shrink● Sharedacrosstenantgroups

Batch Batch

PersistentClusterHDFS

Batch Batch

DefaultChoice

ObjectStorage

20©Cloudera,Inc.Allrightsreserved.

AnalyticDBinthecloud

NewInsights,NewRevenue

BI/Analytics

Exploreandanalyzealldata,whereveritlives

▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment

Onlypayforwhatyouneed,whenyouneedit

▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment

ETL

ReduceOperatingCosts

RefertoDataScience&Engineeringguidelines

Presentsnewsetofchoices

21©Cloudera,Inc.Allrightsreserved.

BI/AnalyticsinthecloudThree architecturalpatternstooptimizeprice,convenience,performance

ObjectStorage

TransientCluster

TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser

PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentCluster

PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata

PersistentCluster HDFSand/orKudu

PersistentCluster

TransientCluster

DefaultChoice

22©Cloudera,Inc.Allrightsreserved.

OperationalDBinthecloudNotaswellsuitedforcloud,buttargetedbenefitsarepossible

CostGoals

• Low-costbackupanddisasterrecovery• Developmentandtestingenvironmentseasytodeployanddecommission

ConvenienceGoals

• Elasticgrowthfortightlyprovisionedworkloadsmakesexpansioneasy,andenablesalower-coststeadystate

• Fastandeasyprovisioningofadditionalclustershelpsprojectsmovequickly

23©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.

ReconstructanEnterpriseDataHub

24©Cloudera,Inc.Allrightsreserved.

ManyproblemsareacombinationofSQL&predictive,batch&online

EnterpriseDataWarehouse

ApplicationsDataSources OperationalDataStores

TraditionalArchitecture

EnterpriseDataWarehouse

ServeELT

Archive

BISystem

Modeling

Reporting

ETL

HPCGRID

Storage#2

Storage#1

Ingest

Process Load

Unstructured

FinancialLedgerP&L

RisksMarket,

Counterparty,Ratings

PaymentsCollectionsCharges

Ingest

Ingest

PortfolioContractsPortfolio

25©Cloudera,Inc.Allrightsreserved.

CommonOperations

ObjectStore ObjectStore

DeveloperWorkbench

CommonGovernance

CommonSecurity

ReimaginingtheEnterpriseDataHubinthecloud

Common:Operations,Governance,Security,Schema,Catalog

SQLWorkbenchPartnerEcosystem

©Cloudera,Inc.Allrightsreserved. 26

ThankyouThankYouFredKoopmans