Analyze Big Data Faster and Store it Cheaper - Squarespace · PDF fileAnalyze Big Data Faster...

37
Analyze Big Data Faster and Store it Cheaper Dominick Huang – CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP

Transcript of Analyze Big Data Faster and Store it Cheaper - Squarespace · PDF fileAnalyze Big Data Faster...

AnalyzeBigDataFasterandStoreitCheaperDominickHuang–CenterPointEnergyHenryLe-Utegra8onRussellHull-SAP

ABOUTCENTERPOINTENERGY,INC.

Ø  PubliclytradedonNewYorkStockExchange

Ø  HeadquarteredinHouston,Texas

Ø  Over5000squaremilesofelectrictransmissionanddistribu8onservicearea

Ø  Assetstotalmorethan$22billion

Ø  Over8,700plusemployees

Ø  CNP&itspredecessorcompaniesinbusinessforover130years

Ø  Domes8cEnergyDelivery

Ø  Operate,Serve,andGrow

Ø  SmartGridEnabled

Ø  Twenty-EightStateGeography

Ø  OverFiveMillionMeteredCustomers

Ø  2.3millionSmartMeters

Ø  4000MilesofTransmission

Ø  47,000MilesofDistribu8on

Ø  ElectricTransmission&Distribu8on

Ø  NaturalGasDistribu8on

Ø  Compe88veNaturalGasSalesandServices

CenterPointEnergyProprietaryandConfiden8al

AGENDA

Ø  Key Drivers and Strategy of HANA Initiative

Ø  Use Case – Smart Meter Big Data Analytics

Ø  Technology Overview

Ø  POC Results

Ø  Value and Comparison

KEYDRIVERSFORHANAINITIATIVES

Ø  SAP HANA as CNP strategic platform for critical transactional applications and Analytics

Ø  Cost effective solution to manage and contain data storage growth

Ø  Analytics platform simplification and consolidation to HANA

Ø  Key technology enabler for future business solutions

Ø  Maximize CNP investment on HANA license (40TB)

Ø  Enable business resiliency implementation for CRM/ECC/BPC

Ø  Leverage HANA in-memory capability for real time analytics

STRATEGY–3YEARHANAROADMAP

Ø  Technical Migration and Consolidation Ø  Migrate critical business applications (SAP and Mainframe)

Ø  Consolidate Analytics solutions (BW, ISAS, eMA, etc.) onto HANA

Ø  HANA Platform Optimization Ø  Enhance performance of core business process and mass business functions

Ø  Enable real-time reporting from the HANA (in-memory) database

Ø  HANA Platform Innovation Ø  Innovative solutions to align with long-term business strategy and roadmap

Ø  SIMPLE Finance, Predictive Asset Health Analytics, Situational Awareness, Internet of Things, Predictive Analytics for customer services, etc.

USECASE–SMARTMETERBIGDATAANALYTICS

BUSINESSCHALLENGE

•  1+PBofSmartMeterData•  2.3MMSmartMeterstakingreadingsevery15minutescrea8ng225MMReadingsperday,orover800BillionReadingsinaYear.

•  Regulatoryrequirementsrequirehistoricalreadingstobeavailablefor10years.

•  UncompressedDataGrowthof8TBpermonthandover1PBina10yearperiod.

•  CurrentDWtechnologyisapproachingEndofLife

•  Massiveamountsofdatastoredinproprietaryvendorsolu8on,washardtomanageandhasasignificantlyhightotalcostofownership.

•  NeedacosteffecUvesoluUonfortoday'sanaly8cs,regulatoryrequirementsandprepara8onforfutureusecases.

CenterPointEnergyProprietaryandConfiden8al

• Dataisreadand/orwri`enfrequently•  Inmemory• Norestric8ons,allfeaturesavailable

•  Infrequentaccess• Ondisk,noneedtokeepinmemoryallthe8me• Norestric8ons,allfeaturesavailable

• Sporadicaccess• NotstoredinHANADB;storedinNear-lineStorage• RestrictedtoNLScapabili8es

DATATIERSOLUTIONDATAVOLUMEMANAGEMENT:MULTITEMPERATUREDATAAPPROACH

Non-Ac8veDataConcept

ProvidinglowerTCObyop8mizeddatavolumemanagement

hot

warm

cold NLSManagementforread-onlydata

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026$0

$5

$10

$15

$20

$2520162017201820192020202120222023202420252026

Millions

HANAO&M HANACapital NZO&M NZCapital

280380

480580

680780

880980

10801180

0

200

400

600

800

1000

1200

1400

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

9CenterPointEnergyProprietaryandConfiden8alInforma8on

ProjectedTotalSpend(CumulaUve&EsUmated)

ProjectedDataCapacity(TB)

O&MSaving

ProjectedGrowthProjectedSavings

75%CapexandOpexsaving

BUSINESSCASE–CAPEX&OPEXSAVINGS

SmartMeterDatagrowsmorethan100TB/year,1PB+in10years

CapexSaving

Businessasusual

MovetoHANA/Hadoop

SOLUTIONBENEFITS

Ø  Costeffec8veHOT+WARM+COLDdatamanagementstrategyleveragingHANAdatacompressionanddata8eringtechnology

Ø  SimplifiedBigDataownershipbycombiningSAPHANA,DynamicTieringandHadoopintoasinglelandscape.

Ø  SingleDatabaseExperience.QueryExecu8onu8lizesSDAandautoma8callyaccessesdatastoredinHANA,DynamicTieringandHadoop/Voradependingonloca8onofdata.

Ø  DataMovementautomatedbetweenstorage8ersusingtheDatabaseLifecycleManager(DLM).

Ø  Founda8onforadvancedpredic8veanaly8csandfuturebusinesscapabili8esØ  InstantReal8meAnaly8csviaHANAØ  75%savingsinstoragecostcomparedtocurrentsolu8on.

Ø  Data8eringtechnology(DynamicTiering,Hadoop)tomanagedatasizeandgrowth.

Ø  Seamlessintegra8onwithHadoopintegra8onallowsfordatascien8sttouseHANAtoolsettoaccessandmanageHadoopdata

Ø  Abilitytochargebusinessbasedonthedatabeingstoredandperformancerequirements

TECHNOLOGYREVIEW

SAPBigDataPlaform

CenterPointEnergyProprietaryandConfiden8al

Tier1(SAN,..)

Tier2(Hadoop)BatchLayer

Tier0(Memory)SpeedLayer

NEWSMARTMETERANALYTICSARCHITECTURE

3

26monthsofdataarestoredinDT(SybaseIQ)

2

10yearsofmeterdataisstoredinHadoop.TheplanistouseSAPHANAVoratoaccessthedata

13monthsofdataarestoredinHANAforfastanaly8cs

1

50TBDynamicTieringExtendedStorage

36TBHANAEDW

Hadoop (Vora)

750TB

PlannedArchitecture

StorageTiers

(Costsand

Perform

ance)

Netezza

zOS

CurrentArchitecture

1

2

3

ApplicaUon BusinessObjects/SAS/CustomApplica8on

Aggrega8

on

Aging

DLM

DYNAMICTIERING

Ø  SAPDynamicTieringisawarmstoretradi8onaldiskbaseddatabasesystemfullyintegratedintoHANA.

Ø  BaseduponSybaseIQ:ColumnStore&DiskbasedØ  ReducedTCObyloweringHANAmemoryfootprintØ  AllHANAfunc8onsareavailable.Read/Write/UpdateØ  SingleDatabaseexperience:AllDBaccessrequestsaremanaged

throughtheHANAplanorm.Ø  Centralizedopera8oncontrol:Alladministra8ontasksarehandled

throughtheHANAinterface.

SAPHANADYNAMICTIERINGDISK-BACKEDCOLUMNSTOREEXTENSIONTOHANAFORWARMDATAMANAGEMENT

WHATISAPACHEHADOOP?

HADOOPTECHNICALARCHITECTURE–HADOOPCLUSTER

SAPVORA-HANA/HADOOPINTEGRATIONWHAT’SINSIDEANDWHATDOESITDO?

Democra8zeDataAccess

MakePrecisionDecisions

SimplifyBigDataOwnership

SAPHANAVoraisanin-memoryqueryenginewhichleveragesandextendstheApacheSparkexecu8onframeworktoprovideenrichedinterac8veanaly8csonHadoop.DrillDownsonHDFS

MashupAPIEnhancements

CompiledQueriesHANA-SparkAdapterUnifiedLandscapeOpenProgramming

AnyHadoopClusters

SAPDATALIFECYCLEMANAGER(DLM)

SAPDATATIERINGARCHITECTURE

HANA

ProcessingEngines

IndexServer

In-MemoryStores

DynamicTiering XSEngine

DataLifecycleManager(DLM)

SDA(VirtualTable)

HDFS

FilesExtendedStorage Files Files

Spark

Hadoop

HANASparkController

SparkSQL

UploadTableintoVora

DLMWritesDatatoORCFile

DLMReadsDatafromHANA

Vora

DLMWritesData

POCREVIEW

POCOBJECTIVES

Ø  ResearchandtestSAPHANADataTieringtechnology,i.e.DLM(DataLifeCycleManagement),DynamicTiering,VoraHadoopIntegra8on

Ø  EvaluateHadooptechnology,understandHadoopecosystemandTCOØ  TestSAPVORA-HANAandHadoopintegra8ontechnologyØ  Developandvalidatesolu8onop8onsforseveralcri8cal2016projects:Smart

MeterAnaly8cs,customerdocumentrepositoryforMainframeMigra8onØ  BuildCNPin-houseexper8seinHadoopandSAPHANA/Hadoopintegra8on

technologyØ  Iden8fyusecaseandinnova8onopportuni8esatCNP

POCENVIRONMENTANDTESTCASES

Ø  POCTeam•  CenterPointEnergy/Utegra8on(LeadandArchitects);SAP(CoE,PE,GlobalITP);HP(Hardware);IBM(IBM

HadoopandCloud)Ø  Environment

Ø  Hardware•  HPLab:Hadoop12nodescluster,CS500HANA2TB,HANADynamicTieringNode•  IBMBigInsightsCloudØ  Sojware•  SAPHANASPS10,DLM,DynamicTiering,VORA•  HortonworksHDPHadoop,RedHatLinux•  IBMApacheHadoopwithBigSQL

Ø  TestCases•  DataLoad-Extract800GB,7BillionSmartMeterrecordsfromNetezzaandISAS,loaddataintoHANA

(Meterdatascrambledtoprotectdatasecurity)•  DLM–UseDLMtooltomovedatafromHANAtoDynamicTieringExtendedStorageandHadoop•  Runqueriesacrossalldata8ersandmeasureperformance•  Load,queryanddisplay19millionPDFsofCustomerBills(DummyPDFfilesused,norealcustomerdata)

POCSUCCESSCRITERIA

Ø  DataTiering–Movedataamongdifferent8ersincludingHANA,DTandHadoopØ  RunSQLquerieswithinandacrossdata8ersØ  Performance–Measureresponse8meforeachdata8erØ  DataCompression–evaluatecompressionra8oofHANA,DTandHadoopØ  SAPDLM–U8lizethetooltomovedatafromHottoWarmandCold8erØ  Customerdocumentstorage–StoreandretrievePDFdocumentswithonesecondØ  Comparisonofstoragecosts:HANA,DT(DynamicTieringExtendedStorage)and

Hadoop

POCTESTRESULTS

Hadoop HANA/DT/Spark/Vora DLM

HDPCustomerBillStoreandRetrieval

à40msresponse8metosearchanddisplayadocumentfrom19millionPDFs

HDPBatchdataloadviaSQOOPintoHadoop

à4min24stoload2.5millionrecords(singlethread);1min10s(10threads)

DataloadfromHANAtoHDPHadoopviaVORA

àTotalof6.2GBORCfilesstoredinHDFSagainstoriginalsizeof172GB.

àCompressionRate:9(3copiesinHDFS)

MovedatafromHANAtoDTà289millionrecordsmovedfromHANAtoDTà670Krecordsperminute

MovedatafromHANAtoHadoopviaVORAintoHDFS

à1.57billionrecordsmovedfromHANAtoHadoop

à22millionrecordsperminute

RunaggregaGonqueryacrossSAPHANA,HDPHadoop&DT(~4billionrecords):

0.2 2.6

360

19

050100150200250300350400

Respon

seTim

e[s]

QueryResponseTime[s]

VALUEANDCOMPARISONBETWEENDATATIERS

COMPARISONBETWEENDATATIERS

Component Performance CostFactor Volume Processing

HANA

$$$$

Upto10sTBs(notechnicallimit)

•  ACIDcompliant

•  SQL,SQLscript,graph,8meseries,spa8al,text,…

DynamicTieringorSybaseIQ

$$100sofTBintegratedinHANA

SeveralPBswithSybaseIQ

•  ACIDcompliant

•  SQL

Hadoop–Spark/Vora $ 100sofPBormore

•  ANSISQLcompliant

•  Read-onlySQLwhenusedfromHANAviaSDA

•  15UmeslessexpensivethanT1storage

•  Transforma8onsandAc8ons

•  PerformancecanbeimprovedsignificantlybyincreasingcomputenodesandusingSSDwithhighercost

Hadoop–VorainMemory $$

100sofTB(dependingonavailablememoryinHadoopcluster)

•  Dataloadedinmemorytoachievebe`erperformance

•  Read-onlySQLwhenusedfromHANAviaSDA

RECOMMENDEDUSECASES–SHORTTERM

Component RecommendedUseCase

HANA

•  ManaginguptoseveralTBsofhighvaluedata•  Veryhighprocessingperformancerequired•  SAPHANAna8veprocessingfeatures(PAL,..)required•  OLTPwithmanyfine-granularupdatesneeded

DynamicTiering

•  ManaginguptoseveralPBsofdataatT2/T3storagecost•  Highperformanceforcomplexqueriesrequired•  DeepSAPHANAintegra8onrequired(singledatabaseexperience)•  Updatesanddeletesrequired

Hadoop-Spark

•  Managingupto100sPBsofdataatT4storagecost,15UmeslessexpensivethanT1storage

•  Read-onlysufficient(bulkload,nofinegranularwrites)•  Compara8velylow-coststorageimportant•  Looseintegra8onofadministra8onandlife-cyclemanagementacceptable

Hadoop-Vora•  HighOLAPqueryperformanceonHadoop•  Addi8onalqueryfeatures(hierarchies)

THANKYOUContactinforma8on:

•  DominickHuang

•  Sr.Manager,EnterpriseTechnology&Architecture

•  CenterPointEnergy

•  [email protected]

RussellHull

ChiefSupportArchitect

SAPAmerica

[email protected]

HenryLe

VPofAnaly8cs

Utegra8onInc.

[email protected]

FOLLOWUS

ThankyouforyourUmeFollowusonat@ASUG365

APPENDIX

CNPHANALANDSCAPE-ANALYTICS(BW+OW)

0.5TB

2TB

Exis8ngblade

NewHPNode

0.5TB0.5TB

0.5TB0.5TB0.5TB0.5TB

0.5TB0.5TB0.5TB

2TB

HIP(PRD)36TB(Memory)

Situa8onAwareness,

MfMTes8ng&otherApps4.5TBs

2TB Failoverblade

HIS(SBX)

0.25TB

HIQ(QA)HID(DEV)12TB

AnalyUcs(BW+OW) ES ExtendedStorage

(NLS/DT/Hadoop)

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

2TB

ES(NLS/DT/Hadoop)

2TB

2TB

2TB

2TB

2TB

2TB

ES(NLS/DT/Hadoop)

HADOOPARCHITECTURE

RECOMMENDEDHADOOPUSECASES–MID&LONGTERM

MAJORHADOOPDISTRIBUTIONS

HADOOPECOSYSTEM