Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

Post on 16-Apr-2017

757 views 5 download

Transcript of Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier

BigData,BigOpportunityAPrimerforUnderstandingTheBigDataFrontier

SanjaiMarimadaiah

Mainframe

CATechnologiesProductManagement,OfficeoftheCTO,BigDataManagementMFX01E

@SanjaiM1#CAWorld

MichaelHarer @MikeHarer Hiren Mandalia @hiren0210

2 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Abstract

BigDataenvironmentsnowarebusiness-criticalforanyorganization.LearnthebasicsofBigDataandsomeoftheemergingtechnologiestargetingtheBigDataspace

SanjaiMarimadaiah

MichaelHarer

Hiren MandaliaCATechnologiesProductManagementOfficeoftheCTOBigDataManagement

3 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Agenda

WHATISBIGDATA?

BIGDATAUSECASES

HADOOPBASICS

1

2

3

NOSQL BASICS4

CASSANDRABASICS5

MONGODB BASICS6

4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HowdoIdeliveraflawlessexperienceeverytimeanapplicationtouchesthemainframe?

Intheapplicationeconomyit’sallaboutyourcustomers.Youneedtothinkaboutyourmainframereframed.

Connectmobile-to-mainframeapplications

Createmainframeinfrastructureflexibility

forthefuture

Unleashthepowerofdataonthemainframe

4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

5 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisBigData?

Datasetswhosevolume,velocity,varietyandcomplexityexceedabilityofcommonlyusedsoftwaretoolstocapture,process,store,manage,andanalyzethem.

Information Sources

MobileTransactionalData

SearchTextsCRM,SCM,ERP

$ € ¥

ImagesEmail SocialMedia

ITOps AudioVideo

Velocity Volume

Variety Complexity

BigData

6 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

EvolutionofDataManagementSolutionsRelationalDatabasesarenotsuitedforBigData

HierarchicalDataModels

RelationalDataModels

1960 1970 1980 1990 2000 2010

DocumentDataModels

Structured DataUnstructured Data

IBMIMS

SybaseInformixOracleIBM

GoogleHadoop

7 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

StateofDatabaseWorkloadsBigDataworkloadsenablebroaderOLAPworkloads

Database- RDBMSOnline TransactionProcessing

DataWarehouseOnlineAnalyticalProcessing

BigDataBigDataWorkloads

BetterAnalyticsforhighervaluetransactions

Collecthistoricaltransactionaldataforanalytics

Addingmorecompletedataenhances analytics

Enhancedinsightsfromoperationalworkloads&

informationaccessapplications

Multimedia

WebLogs

SocialData

Sensordata:images

RFID

TextData:emails

8 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisdrivingBigDataSolutionsCostefficiencyandStandardizedPlatformisfosteringinnovation

Scale-OutArchitecture Open-SourceSoftware

• Protects Investment : Just add more servers to expand capacity

• Lower cost of Infrastructure: Less expensive commodity servers (x86 based)

• Standardization leads to Innovation: A common programing interface is enabling innovation up the SW stack

• Lower software cost: Open source software is lowering software cost

100’s of inexpensive servers

HadoopCassandra

9 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

AdoptionofBigDataSolutions

2X INCREASEinnumberoforganizationsthathavedeployed/implementeddatadrivenprojectssince2014

KeyTrends• Greaterpriorityonstructureddatainitiatives

• Topvendorcriteria- Integrationwithexistinginfrastructure

- Security- EaseofUse

• Necessaryskill sets:BusinessAnalysts,DataArchitects,DataAnalysts&DataVisualizers

40% oforganizationsarestillplanningtoimplementdataprojects

oforganizationsarestillplanningtoimplementdataprojects30%

Source:2015CASponsoredResearch:Vanson Bourne GlobalBigDataUserSurvey

10 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

OverallBigDataMarket

§ TheBigDatamarketwas$27.36Bin2014,upfrom$19.6Bin2013.

§ 89%ofbusiness leadersbelieveBigDatawillrevolutionizebusinessopsthesamewaytheInternetdid.

§ 83%havepursuedBigDataprojectsinordertoseizeacompetitiveedge.

Wikibon projectstheBigDatamarketwilltop$84Bin2026,attaininga17%Compound AnnualGrowthRate(CAGR)fortheforecastperiod2011to2026.

Source:2015Wikibon BigDataMarketForecast

11 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

DatabaseforBigDataOverallBigDatadatabasemarkettoprojectedtogrowat33%CAGRuntil2017

Source:©Wikibon BigDataModel2011-2017,BigDataMarketDatabase Projection,2011-2017($USbillions)

• BigDatadatabasemarketwillgrowatapprox.60%from2011-2017(6-year)

• MarketforNoSQLdatabasewas$0.2Bin2012,growingto$1.6Bin2017.

• Technologyprogression inData-in-DRAM-MemoryandData-in-Flash-Memorywillimprovethescalability ofSQLdatabases.

• Applications areeasiertoprogramandrequirelowermaintenanceifSQLisused;NoSQLhasgreaterscalabilityandlowertechnologycostsforverylargebig-dataapplications.

Source:2015Wikibon BigDataModel2011-2017

12 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

VendorLandscape– BroaderParticipantsBIGDATAMARKETSEGMENT

HARDWARESERVERS(CHIPS) STORAGE NETWORKING

HP EMC/Dell CiscoDell NetApp AristaNetworksIntel Fusion-io Infeineta Systems

SOFTWAREHADOOP NOSQL *NGDW ANALYTICS &BI Management Solutions

Hortonworks Cassandra HP Vertica DigitalReasoning CABigDataControlCenter

Informatica

Cloudera MongoDB EMCGreenplum RevolutionAnalytics Vmware IBM BigInsights

MapR Couchbase TeradataAster Jaspersoft HPHAVEn ZettasetHadapt DataStax IBMNetezza Dataeet BluedataEPIC Syncsort

EMCGreenplum 10gen SAP Pentaho StackIQ BMC Control-M

SERVICESCLOUD SERVICES TECHNICAL SERVICES PROFESSIONALSERVICES

Amazon Hortonworks ThinkBigAnalyticsGoogle Cloudera IBMMapR Cloudwick EMCIBM EMC Accenture

Microsoft IBM Deloitte

*NGDW=NextGenerationDataWarehouse

CoreInfrastructureHadoopCassandraMongoDB AmazonBigDataMAPRElasticSearch

BigDataUseCaseStudies

14 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Media&EntertainmentUseCasePROBLEM SOLUTION POTENTIALBENEFITS§ Acompany’s streamingbusiness

hasexpandedfromthousandsofmemberswatchingoccasionallytomillionsofmemberswatchingovertwobillionhourseverymonth.

§ Acollectionofeventsdescribingwhat isbeing viewedmust begathered. Giventhatviewingiswhatmembersspendmostoftheirtimedoing,what’sneededisarobustandscalablearchitecturetomanageandprocessthis.

§ Certain thingswillbreakthearchitecturethatprocessesbillionsofviewing-relatedeventsperday.

§ Focusontheminimumviablesetofusecases

§ Availabilityoverconsistency- ourprimaryusecasescantolerateeventuallyconsistentdata,sodesignfromthestartfavoringavailabilityratherthanstrongconsistencyinthefaceoffailures.

§ Byfocusingontheminimumviablesetofusecases,ratherthanbuildingagenericall-encompassingsolution,wehavebeenabletobuildasimplearchitecturethatscales.

§ The company’sviewingdataarchitectureisdesignedforavarietyofusecases,rangingfromuserexperiencestodataanalytics.Thefollowingarethreekeyusecases,allofwhichaffecttheuserexperience:

15 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HealthCareUseCase

15

*SystemzVSAMdatabaserequiresspecialskillstoaccesswithoutvStorm ConnectDataStreamingforBigData

PROBLEM SOLUTION POTENTIALBENEFITS

§ Relapsesincardiacpatients§ “Onesizefitsall”

treatment§ Medicare readmission

penalties§ Sensitivepatientdataon

zSystemsVSAMfiles§ Noefficientwaytooffload

§ Identifyriskfactorsbyanalyzingpatientdata*

§ Factorsusedtopredictlikelyoutcomes

§ Reductioninreadmissions§ Savingsinnopenalty fees§ Nomanualintervention§ Noincrease instaffing

16 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

RetailUseCase

16

PROBLEM SOLUTION POTENTIALBENEFITS

§ Streamsofuserdatanotcorrelated

§ e.g.storepurchases,websiteusagepattern,cardusage,historicalcustomerdata

§ Historical customerdataSystemzVSAM&DB2based– noefficient,secureoffload

§ HDFSsecurelypopulatedwithhistoricalcustomerdata,cardusage,storepurchases,websitelogs

§ Splunk scorescustomersbasedonthevariousdatastreams

§ Highscoringcustomersofferedcoupons,specialdealsonwebsite

§ Increaseinonlinesalesinthemiddleofretailslowdown

§ Improved conversionrateofwebsitebrowsingcustomers(shoppingcarttosales)

§ Eliminationofdatasilos–sincenowanalyticscoveralldatanomorereliance onmultiple reports/formats

HadoopBasics

18 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

WhatisHadoop?

Hadoopis…open-sourcesoftwaredesignedforHighScalability,FaultTolerant andHighlyDistributed

Keyelements:1. Distributedprocessing ofBigData(e.g.MapReduce)2. Distributedstorage(HadoopDistributedFileSystemorHDFS)

HDFS(DistributedReliableStorage)

MapReduce(ResourceManagement

&DataProcessing)

HDFS(DistributedReliableStorage)

YARN(ResourceManagement)

MapReduce(Dist.Programming)

Hadoop1.0 Hadoop2.0

Spark(InMemory)

1

23

HBase

(NoSQLstore)

Hive(Query)

Pig(Scripting)

Oozie(Workflow)

45

19 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MapReduce– CoreHadoop1

§ Hadoop’sMapReduceframeworkinvolvestwophases:1. MapPhase:Distributesdatasetamongmultiple serversand

operatesonthedatalocally.2. ReducePhase:Recombinesthepartialresults.

AdistributedcomputingFramework

20 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MapReduce– CoreHadoop1

AdistributedcomputingFramework

• JobTracker-OneoftheCoreHadoopservices thatmanagesthejobs andtheresourcesinthecluster(tasktrackers).JobTrackertriestoschedule a“map”asclosetotheactualdatabeingprocessed.

• TaskTracker–deployedonthedatanodes andareresponsible forrunningthemapandreducetasksasinstructedbyjobtracker

JobTracker

Job-1

Job-2

Job-3

Job-4

Job-5

MR

Processeslargejobsinparallelacrossmanynodesandcombinestheresults.

245

125

134

235

134

DataNodes

TaskTrackers

MasterNode

SlaveNodes

21 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Job-1

Job-2

Job-3

Job-4

Job-5

HDFS

DataNodes

TaskTrackers

HadoopDistributedFileSystem(HDFS)Self-healing,highbandwidthClusteredStorage

• NameNode-OneoftheCoreHadoopservicesthatmaintainsthenamespace–knowswheredataisandmanagesblocks ondatanodes

• DataNode- serves thatactualstorethedataintheirlocaldisks.

• SecondaryNameNode-performsperiodic checkpointofprimarynamenodetoserveasabackupincaseoffailure

SlaveNodes

245

125

134

235

134

HDFSbreaksincomingfilesintoblocksandstoresthemredundantlyacrossthecluster.

NameNode(primary)

NameNode(secondary)

MasterNode

PeriodicCheckpoint

2

22 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

YARN

YARNis…§ ResourceManagement§ NextgenerationMapReduce(MRv2)§ Splits JobTrackerinto:

– ResourceManager– Scheduling /Monitoring

3

WhatdoesYARNdo?§ Provides aclusterlevelresourcemanagerfor

improvedresourcemanagement&scaling§ Formsthenewsystem formanaging

applications inadistributedmanner§ Provides slotsforjobsotherthan

Map/Reduce§ Improvesresourceutilization ResourceManagementmovesintoYARN

YetAnotherResourceNegotiator

23 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

HBASE

Whatisit?§ AHadoopopen source(Java)NoSQLdatabase§ Provides real-timeread/writeaccesstothose

largedatasets§ Distributedwithautomaticfailover

Anon-relational(NoSQL)databasethatrunson topofHDFS

4

Whyuseit?§ Provides anaturaldatastoragemechanism forall

kinds ofdata(especially unstructured)§ Forrandom,realtimeaccesstodatainHadoop§ Whentheprojectgoalistohostverylargetables

i.e.billions ofrowsandmillions ofcolumns§ Combines datasources thatuseawidevarietyof

differentstructuresandschemas§ Greatfor: storingsemi-structureddatalikelogdata

HBase(NoSQLstore)

LogicalViewofCustomerContactInformationinHBase

24 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Hive

Whatisit?§ AqueryenginewrapperbuiltonMapReduce§ TreatedasadatawarehousetoolfortheHadoop

ecosystem§ PrimarilyforuserswithSQLskills§ ProvidesHive=QL(similartoSQL)§ StoresdatainHDFS

ADataWarehouseinfrastructurebuiltonHadoop

5

Whyuseit?§ Dataanalysisandreportingpurposes§ HidesHadoopcomplexityfromendusers§ CanbeusedwithinanELTfunction– i.e.toconvert

StructuredQuerylanguagetounstructuredMapReducejobs torunonaHadoopcluster

§ Goodfor:BatchProcessing tasks:logs, textmining,documentindexing, customerBI)

§ Notgoodfor:Onlinetransactionprocessing, real-timequeries.

Hive(Query)

25 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cross-IndustryUseCase– ApacheHadoopELTPROBLEM SOLUTION BENEFITS§ Traditional DataWarehousing resourcesare

EXPENSIVE (e.g.transactionalMainframesystems)

§ Needtoreducecosts associatedtoStorage,CPUcapacityand3rd partyETLtools

§ Current systems cannotscale(i.e.process§ Lackefficient tools§ Toolstypicallyonlyhandlestructured data

(RDBMS)but BigDatainsightisderivedfromalltypesofdata(structured, unstructured, semi-structured

§ ApacheHadooptoolsto:

1. perform ETLfunctions

2. forhandlingallofthespecific datatypes.

3. Toshiftawayfromtraditional ETLtoELT(extract, load, andtransform).Thisshiftismainlydrivenbybigdata,whichfollowsthe“storefirst, analyzelater”modelthatisbecomingthenewstandard.

§ Compared totraditional transactional systems,Hadoopprovidesfast,low-cost processing

§ Newvaluecanbederivedfromability tohandlestructured andnon-structured data

§ Greater flexibility &choice:e.g.theTransformfunction canuseMapReduce,Hive,Pig,R,ShellScripts, Java…etc.

§ Vastsupport model:opensourcedevelopercommunity

ExtractTransform

Load

Load

Load

DWH

DataMining

Reporting

OLAP Analysis

Traditional ETLProcess

Web

CRM

ERP

Web

CRM

ERP

Social Media

Sensor Logs

Structured

Unstructured

Flume

Sqoop

Extract/Load

DataMining

Reporting

AnalyticsHDFS

HadoopDistributedFileSystem

PigMapReduce

Hive

Transform

26 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Pureopensource– OpenCore– Compatible

CommercialDistributionsofHadoop

Cloudera Hadoop

HDFS OOZIE

Hortonworks

MAPR

Apache

27 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

TheEvolvingHadoopEcosystemComponents Description

mahout RDataMining/machinelearningtoolsusedagainstHadoop datatodetectpatternsandtrends

PigScriptinglanguageforanalyzinglargedatasets.CompilestoMapReduce jobs

MapReduce YARNProgrammingmodelforprocessinglargedatasets.YARNperforms overall resourcemgmt

Oozie Aworkflowscheduler tooltomanageHadoop MapReduce jobs

Sqoop HiveEnableSQLforHadoop data:Sqoop - DatatransferbetweenHadoopandstructureddatastores.HIVE - datawarehouseforHadoop.Drill - opensource,lowlatencySQLqueryengineforHadoop andNoSQL.

Drill

ZooKeeperCoordinationofconfig.data,namingandsynchronizationofHadoop projects

Components Description

BigTopPackagingservicesforHadoopprojectstoeasetestinganddeployment

HBaseAnon-relational,distributeddatabasethatrunsontopofHDFS

Thrift /AVRO Schema-baseddata serializationsystemusingRPCcalls

Solrhutch Indexingandsearchtoolsfor

datastoredinHDFSforHadoopElasticsearch

Kafka /FlumeCollect,aggregate,andmovestreamingdatafrommultiplesourcesinto Hadoop

SparkAppDev toolfor Hadoop appscombiningbatch,streaming,andinteractiveanalytics

Anbari Chukwa Monitoring&ManagementofHadoop clustersandnodes

NoSQLBasics

29 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

NoSQL DatabasesOverview

§ Farbetterathandlingsemi-structuredandunstructureddata

§ Databaseconsistencyiscompromisedforavailabilityandeaseofpartitioning

§ Supportsobject-orientedprogrammingthatiseasytouseandflexible

§ Efficient,scale-outarchitectureinsteadofexpensive,monolithicarchitecture

30 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

NoSQLtypes

Type DatabaseexamplesColumnDataModel HBase,Cassandra, Accumulo

DocumentDataModel MongoDB

Key-ValueDataModel OpenTSDB,Redis

GraphDataModel Neo4j,ArangoDB

CassandraBasics

32 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– History

BigTable,2006 Dynamo,2007

OpenSource,2008

CassandraDSE– Dec2011

Google Amazon

Facebook

Datastax

33 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraisIdealFor…

§ Massive,linearscaling

§ Extremelyheavywrites

§ Highavailability

CERN Barracuda

CISCO BlueMountain

Comcast Netflix SoundCloud

34 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– DataModel

BenefitsofCassandraDataModel:§ Easilyaddnewcolumnswithoutdowntime

§ Schemafree/schemalessdatabase

§ Compressionpermitscolumnaroperations(MIN,MAX,SUMetc.)rapidly

ColumnFamily(similar toRDBMStable) ColumnFamily- JSONFormat

35 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraArchitecture

§ Allnodesthesame

§ Datapartitionedamongallnodesincluster

§ EachnodecommunicateswithothernodesusingGossipprotocol

§ Acommitlogisusedoneachnodetocapturewriteactivityfordatadurability

Client

Storage :CassandraFileSystemProcessing :CassandraQueryLanguage(CQL)

36 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Cassandra– Keyfeatures

§ Nosinglepointoffailure

§ Multi-datacenterandzonesupport

§ Purepeer-to-peerclustersetup

§ Allowsfor“tunableconsistency”

§ CassandraQueryLanguage(CQL)

§ CassandraFileSystem(CFS)

37 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CassandraatNetflix

Usecases:§ WhattitleshaveIwatched?§ Whattitlesarerecommendedforme?§ WheredidIleaveofflast?§ Whatelseisbeingwatched?§ Measurememberengagement§ Informproduct&contentdecisions

Solution:§ Captureall‘view’ eventsinscalable

Cassandraclusters

Challenges:§ Ability toscalebillionwriteevents/day§ Provideresponsive titlebrowsingexp.

Source:techblog.netflix.com

MongoDB Basics

39 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

2007Founded

2009MongoDB 1.0Open-sourced

2012MongoDB 2.0

2015MongoDB 3.0

2013MongoDB Inc.

10gen 10gen 10gen MongoDB MongoDB

40 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB isidealfor…

§ RDBMSreplacementforWebApplications

§ Semi-structuredContentManagement

§ Real-timeAnalyticsandHigh-Speedlogging

§ CachingandHighScalability

Web2.0,Media,SAAS,Gaming

HealthCare,Finance, Telecom,Government

Notsogreatfor– HighTransactionalDatabases

DisneyEventbriteIntuitIGN

Craigslist

41 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB – Datamodel

RDBMS Document-oriented

BenefitsofDocument-orientedDBMS:

• Databaseschemaisoptional

• Flexibleindealingwithchangeandoptionalvalues

{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“zip”: “65432”}

{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“County”: “Santa Clara”“zip”: “65432”}Present

Future

42 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB – Sharding

43 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ShardedProductionClusterSetup

Imagesource:mongodb.org

§ Shards storethedata.Toprovidehighavailabilityanddataconsistency,inaproductionshardedcluster,eachshardisareplicaset

§ ReplicaSetAclusterofMongoDB serversthatimplementsmaster-slavereplicationandautomatedfailover

§ QueryRouters,or mongos instances,interfacewithclientapplicationsanddirectoperationstotheappropriateshardorshards.

§ Config servers storethecluster’smetadata.Thisdatacontainsamappingof thecluster’sdatasettotheshards.

44 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB– KeyFeatures

§ ScalableHigh-PerformanceOpen-Source,Document-orienteddatabase

§ BuiltforSpeed

§ RichdocumentformatallowsforEasyReadability

§ FullindexsupportforHighPerformance

§ ReplicationandFailoverforHighAvailability

§ Auto-Sharding forEasyScalability

§ Map/ReduceforAggregation

45 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MongoDB atCraigslist

Usecases:§ Createnewposts§ Browseallmyposts§ Allowforpostclassification§ Searchrelevantposts

Solution:§ MigratefromMySQLtoMongoDB

Challenges:§ Archivebillions ofrecordsinmultiple formats§ Query/reportonarchivesatruntime§ Needcontinuous availabilitymandatedfor

regulatorycompliance§ Support 700sitesin70differentcountries

CraigslistEnvironment

• 5Billiondocuments• Avg Size:2KB• 3Replicasets/3serverseach• 2Datacenters• Sharding key– PostingID

Closing

47 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

CABigDataControlCenter– Vision

Bringefficiencytoroot-causeanalysis atalllevelsofBigDatasolution stack

SimplifymanagementbyabstractingthecomplexitiesofunderlyingBigDataTechnologies

HolisticallymeettheneedsofDevOpsbymanagingthelifecycleofApplications,DataandServices

BigDataTechnologies

LOB/BizAnalysts

AppDev./DataSci.

DataEng./DataAdmin

ITOps/ITMgmt.

BigData/SysAdmin

PrimaryPersonas

1

2

3

SecondaryPersona

End-to-EndManagementofBigDataEnvironments fortheApplicationEconomy

Application

Data Services

DataSources

ITSolutions CABigDataControlCenter

48 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ManageBigDataWithAUnifiedView

JobMonitoring

HeterogeneousSystemManagement

IntelligentAlertManagement

ResourceReporting

Cluster/Job/NodeManagement

49 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

UnifiedView– Details

50 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

RecommendedSessions

SESSION# TITLE DATE/TIME

MFT05S BigIron+BigData=BIGDEAL!Unlock ThePowerofYourMainframeData

1/18/2015 at2:00pmLocation:MainframeTheater

MFX15S PredictingWhenYourApplicationsWillGoOfftheRails!ManagingDB2Application PerformanceusingAnalytics

1/18/2015 at4:30pmLocation:BreakersI

MFT15TNewMainframeITAnalytics:ActionableInsightintoRootCauseAnalysis ofPerformanceIssues

1/18/2015 at3:45pmLocation:MainframeAreaTechTalk

MFX06S CA'sStrategyandVision forMainframeDataManagementandAnalytics

1/18/2015 at1:00pmLocation:BreakersI

MFT01S TheBigData,BigPicture:CanYouSeeIt? 11/19/2015 at3:45pmLocation:MainframeTheater

51 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

MustSeeDemos

SeetheFutureofBigDataManagement

CABigDataControlCenter

AppEconomyAreaStation:APPECN001

UnleashthePowerof

MainframeData

vStorm ConnectDataStreamingforBigData

MainframeAreaStation:MNFSE001

MaximizeYourMainframe

DatabaseValue

CAIDMS/CADatacom

MainframeAreaStation:MNFSE002

PerformanceAnalyticsforDB2

DB2Analytics

MainframeAreaStation:MNFSE004

52 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

FollowOnConversationsAt…

SmartBarDB2ToolsandPerformance

Analytics

MainframeAreaonExpoFloor

TechTalksFiveStepstoPowerfulDatabase

Experience

MainframeAreaonExpoFloor

53 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

InfluencingOurRoadmap

WinningwithCA

§ Submityourideasoncommunities.ca.com

§ Vote&commentonideasthatareimportanttoyou

§ CAProductManagementreviewsideasandupdatesstatusastheymovethroughthelifecycle

§ “CurrentlyPlanned”ideastatusindicatesinclusioninAgileBacklogorProductRoadmap

Taketheopportunity to influenceourproductdevelopment.Helpensurethatwedeliveriswhatyouneedandwant.

AgileDevelopment

CACommunities Ideation§ Registertoparticipatein:– LiveDemos/End-of-SprintReviews

– Private-MembersOnly-OnlineCommunity

– Pre-ReleaseOnsiteTestingandSupport(Beta)

– UpgradeSupportfromSWATTeam

§ Howtoregister:https://validate.ca.com

CustomerValidation

54 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

AgileDevelopmentTransformation

DrivingSignificantBusinessValueforourCustomers!

Speed Quality

Performance

UKCustomerStandardLifebenefitsfromCAagileprocess

251 uniquecustomersparticipatedin56 productreleasesduringayear

99.5%reductionincost98%reductioninmonthendcycletime

45products releasedagainstzerodefectpolicy20%decreaseinsupportissues

55 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

ForInformationalPurposesOnlyTermsofthisPresentation

©2015CA.Allrightsreserved.Alltrademarksreferencedhereinbelongtotheirrespectivecompanies.Thepresentationprovided atCAWorld2015isintendedforinformationpurposesonlyanddoesnotformanytypeofwarranty.Someofthespecificslideswith customerreferences relatetocustomer'sspecificuseandexperienceofCAproductsandsolutionssoactualresultsmayvary.

CertaininformationinthispresentationmayoutlineCA’sgeneralproductdirection.Thispresentationshallnotserveto(i)affecttherightsand/orobligationsofCAoritslicenseesunderanyexistingorfuturelicenseagreement orservicesagreementrelatingtoanyCAsoftwareproduct;or(ii)amendanyproductdocumentationorspecificationsforanyCAsoftwareproduct.Thispresentationisbasedon currentinformationandresourceallocationsasofNovember18,2015,andissubjecttochangeorwithdrawalbyCAatanytimewithoutnotice.Thedevelopment,release andtimingofanyfeaturesorfunctionalitydescribedinthispresentationremainatCA’ssolediscretion.

Notwithstandinganythinginthispresentationtothecontrary,uponthegeneralavailabilityofanyfutureCAproductrelease referenced inthispresentation,CAmaymakesuchrelease availabletonewlicenseesintheformofaregularlyscheduledmajorproductrelease.SuchreleasemaybemadeavailabletolicenseesoftheproductwhoareactivesubscriberstoCAmaintenanceandsupport,onawhen andif-availablebasis.Theinformationinthispresentationisnotdeemedtobeincorporatedintoanycontract.

56 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD

Q&A