Analyze Big Data Faster and Store it Cheaper - Squarespace · PDF fileAnalyze Big Data Faster...
Transcript of Analyze Big Data Faster and Store it Cheaper - Squarespace · PDF fileAnalyze Big Data Faster...
AnalyzeBigDataFasterandStoreitCheaperDominickHuang–CenterPointEnergyHenryLe-Utegra8onRussellHull-SAP
ABOUTCENTERPOINTENERGY,INC.
Ø PubliclytradedonNewYorkStockExchange
Ø HeadquarteredinHouston,Texas
Ø Over5000squaremilesofelectrictransmissionanddistribu8onservicearea
Ø Assetstotalmorethan$22billion
Ø Over8,700plusemployees
Ø CNP&itspredecessorcompaniesinbusinessforover130years
Ø Domes8cEnergyDelivery
Ø Operate,Serve,andGrow
Ø SmartGridEnabled
Ø Twenty-EightStateGeography
Ø OverFiveMillionMeteredCustomers
Ø 2.3millionSmartMeters
Ø 4000MilesofTransmission
Ø 47,000MilesofDistribu8on
Ø ElectricTransmission&Distribu8on
Ø NaturalGasDistribu8on
Ø Compe88veNaturalGasSalesandServices
CenterPointEnergyProprietaryandConfiden8al
AGENDA
Ø Key Drivers and Strategy of HANA Initiative
Ø Use Case – Smart Meter Big Data Analytics
Ø Technology Overview
Ø POC Results
Ø Value and Comparison
KEYDRIVERSFORHANAINITIATIVES
Ø SAP HANA as CNP strategic platform for critical transactional applications and Analytics
Ø Cost effective solution to manage and contain data storage growth
Ø Analytics platform simplification and consolidation to HANA
Ø Key technology enabler for future business solutions
Ø Maximize CNP investment on HANA license (40TB)
Ø Enable business resiliency implementation for CRM/ECC/BPC
Ø Leverage HANA in-memory capability for real time analytics
STRATEGY–3YEARHANAROADMAP
Ø Technical Migration and Consolidation Ø Migrate critical business applications (SAP and Mainframe)
Ø Consolidate Analytics solutions (BW, ISAS, eMA, etc.) onto HANA
Ø HANA Platform Optimization Ø Enhance performance of core business process and mass business functions
Ø Enable real-time reporting from the HANA (in-memory) database
Ø HANA Platform Innovation Ø Innovative solutions to align with long-term business strategy and roadmap
Ø SIMPLE Finance, Predictive Asset Health Analytics, Situational Awareness, Internet of Things, Predictive Analytics for customer services, etc.
BUSINESSCHALLENGE
• 1+PBofSmartMeterData• 2.3MMSmartMeterstakingreadingsevery15minutescrea8ng225MMReadingsperday,orover800BillionReadingsinaYear.
• Regulatoryrequirementsrequirehistoricalreadingstobeavailablefor10years.
• UncompressedDataGrowthof8TBpermonthandover1PBina10yearperiod.
• CurrentDWtechnologyisapproachingEndofLife
• Massiveamountsofdatastoredinproprietaryvendorsolu8on,washardtomanageandhasasignificantlyhightotalcostofownership.
• NeedacosteffecUvesoluUonfortoday'sanaly8cs,regulatoryrequirementsandprepara8onforfutureusecases.
CenterPointEnergyProprietaryandConfiden8al
• Dataisreadand/orwri`enfrequently• Inmemory• Norestric8ons,allfeaturesavailable
• Infrequentaccess• Ondisk,noneedtokeepinmemoryallthe8me• Norestric8ons,allfeaturesavailable
• Sporadicaccess• NotstoredinHANADB;storedinNear-lineStorage• RestrictedtoNLScapabili8es
DATATIERSOLUTIONDATAVOLUMEMANAGEMENT:MULTITEMPERATUREDATAAPPROACH
Non-Ac8veDataConcept
ProvidinglowerTCObyop8mizeddatavolumemanagement
hot
warm
cold NLSManagementforread-onlydata
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026$0
$5
$10
$15
$20
$2520162017201820192020202120222023202420252026
Millions
HANAO&M HANACapital NZO&M NZCapital
280380
480580
680780
880980
10801180
0
200
400
600
800
1000
1200
1400
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
9CenterPointEnergyProprietaryandConfiden8alInforma8on
ProjectedTotalSpend(CumulaUve&EsUmated)
ProjectedDataCapacity(TB)
O&MSaving
ProjectedGrowthProjectedSavings
75%CapexandOpexsaving
BUSINESSCASE–CAPEX&OPEXSAVINGS
SmartMeterDatagrowsmorethan100TB/year,1PB+in10years
CapexSaving
Businessasusual
MovetoHANA/Hadoop
SOLUTIONBENEFITS
Ø Costeffec8veHOT+WARM+COLDdatamanagementstrategyleveragingHANAdatacompressionanddata8eringtechnology
Ø SimplifiedBigDataownershipbycombiningSAPHANA,DynamicTieringandHadoopintoasinglelandscape.
Ø SingleDatabaseExperience.QueryExecu8onu8lizesSDAandautoma8callyaccessesdatastoredinHANA,DynamicTieringandHadoop/Voradependingonloca8onofdata.
Ø DataMovementautomatedbetweenstorage8ersusingtheDatabaseLifecycleManager(DLM).
Ø Founda8onforadvancedpredic8veanaly8csandfuturebusinesscapabili8esØ InstantReal8meAnaly8csviaHANAØ 75%savingsinstoragecostcomparedtocurrentsolu8on.
Ø Data8eringtechnology(DynamicTiering,Hadoop)tomanagedatasizeandgrowth.
Ø Seamlessintegra8onwithHadoopintegra8onallowsfordatascien8sttouseHANAtoolsettoaccessandmanageHadoopdata
Ø Abilitytochargebusinessbasedonthedatabeingstoredandperformancerequirements
Tier1(SAN,..)
Tier2(Hadoop)BatchLayer
Tier0(Memory)SpeedLayer
NEWSMARTMETERANALYTICSARCHITECTURE
3
26monthsofdataarestoredinDT(SybaseIQ)
2
10yearsofmeterdataisstoredinHadoop.TheplanistouseSAPHANAVoratoaccessthedata
13monthsofdataarestoredinHANAforfastanaly8cs
1
50TBDynamicTieringExtendedStorage
36TBHANAEDW
Hadoop (Vora)
750TB
PlannedArchitecture
StorageTiers
(Costsand
Perform
ance)
Netezza
zOS
CurrentArchitecture
1
2
3
ApplicaUon BusinessObjects/SAS/CustomApplica8on
Aggrega8
on
Aging
DLM
DYNAMICTIERING
Ø SAPDynamicTieringisawarmstoretradi8onaldiskbaseddatabasesystemfullyintegratedintoHANA.
Ø BaseduponSybaseIQ:ColumnStore&DiskbasedØ ReducedTCObyloweringHANAmemoryfootprintØ AllHANAfunc8onsareavailable.Read/Write/UpdateØ SingleDatabaseexperience:AllDBaccessrequestsaremanaged
throughtheHANAplanorm.Ø Centralizedopera8oncontrol:Alladministra8ontasksarehandled
throughtheHANAinterface.
SAPVORA-HANA/HADOOPINTEGRATIONWHAT’SINSIDEANDWHATDOESITDO?
Democra8zeDataAccess
MakePrecisionDecisions
SimplifyBigDataOwnership
SAPHANAVoraisanin-memoryqueryenginewhichleveragesandextendstheApacheSparkexecu8onframeworktoprovideenrichedinterac8veanaly8csonHadoop.DrillDownsonHDFS
MashupAPIEnhancements
CompiledQueriesHANA-SparkAdapterUnifiedLandscapeOpenProgramming
AnyHadoopClusters
SAPDATATIERINGARCHITECTURE
HANA
ProcessingEngines
IndexServer
In-MemoryStores
DynamicTiering XSEngine
DataLifecycleManager(DLM)
SDA(VirtualTable)
HDFS
FilesExtendedStorage Files Files
Spark
Hadoop
HANASparkController
SparkSQL
UploadTableintoVora
DLMWritesDatatoORCFile
DLMReadsDatafromHANA
Vora
DLMWritesData
POCOBJECTIVES
Ø ResearchandtestSAPHANADataTieringtechnology,i.e.DLM(DataLifeCycleManagement),DynamicTiering,VoraHadoopIntegra8on
Ø EvaluateHadooptechnology,understandHadoopecosystemandTCOØ TestSAPVORA-HANAandHadoopintegra8ontechnologyØ Developandvalidatesolu8onop8onsforseveralcri8cal2016projects:Smart
MeterAnaly8cs,customerdocumentrepositoryforMainframeMigra8onØ BuildCNPin-houseexper8seinHadoopandSAPHANA/Hadoopintegra8on
technologyØ Iden8fyusecaseandinnova8onopportuni8esatCNP
POCENVIRONMENTANDTESTCASES
Ø POCTeam• CenterPointEnergy/Utegra8on(LeadandArchitects);SAP(CoE,PE,GlobalITP);HP(Hardware);IBM(IBM
HadoopandCloud)Ø Environment
Ø Hardware• HPLab:Hadoop12nodescluster,CS500HANA2TB,HANADynamicTieringNode• IBMBigInsightsCloudØ Sojware• SAPHANASPS10,DLM,DynamicTiering,VORA• HortonworksHDPHadoop,RedHatLinux• IBMApacheHadoopwithBigSQL
Ø TestCases• DataLoad-Extract800GB,7BillionSmartMeterrecordsfromNetezzaandISAS,loaddataintoHANA
(Meterdatascrambledtoprotectdatasecurity)• DLM–UseDLMtooltomovedatafromHANAtoDynamicTieringExtendedStorageandHadoop• Runqueriesacrossalldata8ersandmeasureperformance• Load,queryanddisplay19millionPDFsofCustomerBills(DummyPDFfilesused,norealcustomerdata)
POCSUCCESSCRITERIA
Ø DataTiering–Movedataamongdifferent8ersincludingHANA,DTandHadoopØ RunSQLquerieswithinandacrossdata8ersØ Performance–Measureresponse8meforeachdata8erØ DataCompression–evaluatecompressionra8oofHANA,DTandHadoopØ SAPDLM–U8lizethetooltomovedatafromHottoWarmandCold8erØ Customerdocumentstorage–StoreandretrievePDFdocumentswithonesecondØ Comparisonofstoragecosts:HANA,DT(DynamicTieringExtendedStorage)and
Hadoop
POCTESTRESULTS
Hadoop HANA/DT/Spark/Vora DLM
HDPCustomerBillStoreandRetrieval
à40msresponse8metosearchanddisplayadocumentfrom19millionPDFs
HDPBatchdataloadviaSQOOPintoHadoop
à4min24stoload2.5millionrecords(singlethread);1min10s(10threads)
DataloadfromHANAtoHDPHadoopviaVORA
àTotalof6.2GBORCfilesstoredinHDFSagainstoriginalsizeof172GB.
àCompressionRate:9(3copiesinHDFS)
MovedatafromHANAtoDTà289millionrecordsmovedfromHANAtoDTà670Krecordsperminute
MovedatafromHANAtoHadoopviaVORAintoHDFS
à1.57billionrecordsmovedfromHANAtoHadoop
à22millionrecordsperminute
RunaggregaGonqueryacrossSAPHANA,HDPHadoop&DT(~4billionrecords):
0.2 2.6
360
19
050100150200250300350400
Respon
seTim
e[s]
QueryResponseTime[s]
COMPARISONBETWEENDATATIERS
Component Performance CostFactor Volume Processing
HANA
$$$$
Upto10sTBs(notechnicallimit)
• ACIDcompliant
• SQL,SQLscript,graph,8meseries,spa8al,text,…
DynamicTieringorSybaseIQ
$$100sofTBintegratedinHANA
SeveralPBswithSybaseIQ
• ACIDcompliant
• SQL
Hadoop–Spark/Vora $ 100sofPBormore
• ANSISQLcompliant
• Read-onlySQLwhenusedfromHANAviaSDA
• 15UmeslessexpensivethanT1storage
• Transforma8onsandAc8ons
• PerformancecanbeimprovedsignificantlybyincreasingcomputenodesandusingSSDwithhighercost
Hadoop–VorainMemory $$
100sofTB(dependingonavailablememoryinHadoopcluster)
• Dataloadedinmemorytoachievebe`erperformance
• Read-onlySQLwhenusedfromHANAviaSDA
RECOMMENDEDUSECASES–SHORTTERM
Component RecommendedUseCase
HANA
• ManaginguptoseveralTBsofhighvaluedata• Veryhighprocessingperformancerequired• SAPHANAna8veprocessingfeatures(PAL,..)required• OLTPwithmanyfine-granularupdatesneeded
DynamicTiering
• ManaginguptoseveralPBsofdataatT2/T3storagecost• Highperformanceforcomplexqueriesrequired• DeepSAPHANAintegra8onrequired(singledatabaseexperience)• Updatesanddeletesrequired
Hadoop-Spark
• Managingupto100sPBsofdataatT4storagecost,15UmeslessexpensivethanT1storage
• Read-onlysufficient(bulkload,nofinegranularwrites)• Compara8velylow-coststorageimportant• Looseintegra8onofadministra8onandlife-cyclemanagementacceptable
Hadoop-Vora• HighOLAPqueryperformanceonHadoop• Addi8onalqueryfeatures(hierarchies)
THANKYOUContactinforma8on:
• DominickHuang
• Sr.Manager,EnterpriseTechnology&Architecture
• CenterPointEnergy
RussellHull
ChiefSupportArchitect
SAPAmerica
HenryLe
VPofAnaly8cs
Utegra8onInc.
CNPHANALANDSCAPE-ANALYTICS(BW+OW)
0.5TB
2TB
Exis8ngblade
NewHPNode
0.5TB0.5TB
0.5TB0.5TB0.5TB0.5TB
0.5TB0.5TB0.5TB
2TB
HIP(PRD)36TB(Memory)
Situa8onAwareness,
MfMTes8ng&otherApps4.5TBs
2TB Failoverblade
HIS(SBX)
0.25TB
HIQ(QA)HID(DEV)12TB
AnalyUcs(BW+OW) ES ExtendedStorage
(NLS/DT/Hadoop)
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
ES(NLS/DT/Hadoop)
2TB
2TB
2TB
2TB
2TB
2TB
ES(NLS/DT/Hadoop)