Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
SimplifyingBigDataBestPrac*cesforOn-PremisesandCloudArchitecturesCON8746
PaulKentVicePresidentBigData,SASJean-PierreDijcksMasterProductManagerBigData,Oracle
Presentedwith
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirecQon.ItisintendedforinformaQonpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfuncQonality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andQmingofanyfeaturesorfuncQonalitydescribedforOracle’sproductsremainsatthesolediscreQonofOracle.
3
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
• OracleBigDataAppliance– SimpletoDeploy– FastoutoftheBox– LowerCostthanDIYClusters
• OracleBigDataCloudService– Samearchitectureason-premises– Highperformancedataprocessing– EnablesBigDataSQLinOraclePublicCloud
4
• OracleBigDataSQL– OracleSQLacrossALLyourdata– SecureaccesstodatainNoSQL,HadoopandOracleDatabase
– Highperformanceonun-modeleddata
• OracleExadata– Bestpla]ormforallOracleDatabaseworkloads
– Uniqueso`warethatmaximizesOracleDatabase
– Standardize,opQmized,hardenedend-to-end
BigDataPla]ormopQonsfromOracle
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 5
100%UpwardCompaQbilitywithOn-PremisesEnablesCoexistenceandMigraQon
OracleCloud
CoExistenceandMigra*on
SameArchitecture
SameStandards
SameProducts
PrivateCloud
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
ProgramAgenda
RightEngine–RightJob
RightPla]ormfortheEngine
ConfiguringforMixedWorkloads
Real-WorldExampleswithSAS
1
2
3
4
6
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
Asimplesetofcriteria
8
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Concurrency
ComplexQueryResponseTimes
SingleRecordRead/WritePerformance
BulkWritePerformance
PrivilegedUserSecurity
GeneralUserSecurity
GovernanceTools
SystemperTBCost
BackupperTBCost
SkillsAcquisiQonCost
RDBMS
NoSQLDB
Hadoop
PerformanceSecurityCost
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
High-levelTechnologyComparisonHDFS NoSQL RDBMS
DataType Chunk Record TransacQon
WriteType Synchronous EventuallyConsistent ACIDCompliant
DataPreparaQon NoParsing NoParsing ParsingandValidaQon
9
Ingest
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
High-levelTechnologyComparisonHDFS NoSQL RDBMS
DataType Chunk Record TransacQon
WriteType Synchronous EventuallyConsistent ACIDCompliant
DataPreparaQon NoParsing NoParsing ParsingandValidaQon
DRType SecondCluster NodeReplica SecondRDBMS
DRUnit File Record TransacQon
DRTiming Batch Record TransacQon
10
Ingest
DR
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 11
High-levelTechnologyComparisonHDFS NoSQL RDBMS
DataType Chunk Record TransacQon
WriteType Synchronous EventuallyConsistent ACIDCompliant
DataPreparaQon NoParsing NoParsing ParsingandValidaQon
DRType SecondCluster NodeReplica SecondRDBMS
DRUnit File Record TransacQon
DRTiming Batch Record TransacQon
ComplexAnalyQcs? Yes No Yes
QuerySpeed Slow FastforsimplequesQons Fast
#ofDataAccessMethods One(fulltablescan) One(indexlookup) Many(OpQmized)
Ingest
DR
Acces
AffordableScale LowPredictableLatency FlexiblePerformance
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
BigDataApplianceX5-2Hardware
• SunOracleX5-2LServers,witheach– 2*18coreIntelXeonE5-2699v3– 128GBDDR4Memory– 96TBSASDiskSpace
• 3InfiniBandswitchesforallinternalHadoopandSparktraffic
• 1CiscoManagementSwitch
Note:SupportforClouderaCDHDataHubincluded
13
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
HadoopHardware–PricingTrendswithBDA(List)
14
$-
$500.00
$1,000.00
$1,500.00
$2,000.00
$2,500.00
BDAX2-2 BDAX3-2 BDAX4-2 BDAX5-2
PriceperTB(diskraw)
PriceperCore
PriceforGB/Mem
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
Storage• DensedrivesprovideflexibilityaswellaslowcostperTB
• HDFSsupportsdensedriveswithmorescalabilityenhancementsinCDH5.5
• RecommendaQon:Storageischeap,ignoreitforsizing
CPU• “Hadoop”nodesareincreasinglyrunningmixedworkloadsand“externalprocesses”:– HDFS– ETL&StreamProcessing– MachineLearning
• RecommendaQon:SizeCPUspernodetowardsthehigh-endcorecounts
15
Memory• Mixedworkloadsandexternalprocessesdoneedmemory
• LargeDIMMsdosQllcommandapremiumin2-socketservers
• RecommendaQon:Sizememorytothemid-spectrumduetohockeysQckeffects.Thisischanging!
HardwareProfilesforHadoop
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
HardwareProfilesforHadoop–Conclusion• Stopworryingaboutdiskspace.Itischeapandavailable.
– SSD?OncerealdataQeringisavailable,orforNoSQLDatabases• Worryinsteadabout:
– Network• Expensiveando`entheforgozenbozleneck• Thinkaboutdataingestandtherequiredcapacity
– CPU• MixedworkloadswillrequiremoreCPUswithinthenode• EveryonewantstoruncomputeatthesameQme
– Memory• BeconsciousofactualneedsandrestricQons• NotallSWcanmakeuseoflotsofmemory
16
Bewaryofmythsandlore.Thehardwareworldhaschanged…
Instead,dothemathandcalculatethebozlenecksandusagesyou
expect
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
VirtualizaQonProfilesforHadoop
17
Disk
CPU
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
VirtualizaQonProfilesforHadoop
18
Performance/DataStorage
Disk
CPU
VM1 VM2
Hyperviseroverheadistypicallylow,buttheIOoverheadhasimpact,whichiswhySR-IOVisappliedtoensurehighthroughput
Retain“directazacheddisks”forHadoopworkloads
Goal:Datapermanentlylivesinthe“Hadoopnodes”
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
VirtualizaQonProfilesforHadoop
19
Flexibility/Applica*onHos*ng
Disk
CPU
VM-C1 VM-C2
VM-S1 VM-S2
Drivetowardsflexibilityattheexpenseofperformance
Goal:Spinupenvironmentsandthedataquickly
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
RecommendaQonsandFuture?• Ifyoudon’tneedthem,don’tusethem
– RememberyouwillneedtoapplyOSupdatestoallVMs,andwhentheyhavedata,thismaynotbedesirable
• Don’tcreatemanysmallVMs,rathercreatelargeVMs– PayazenQontothenumberofVMsonaphysicalnode,andconsiderwhathappensifthephysicalnodegoesoffline(dataloss?)
• Futures– Docker?
20
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
SharingandManagingResources
• Note:thisisnotabout“security/isolaQon”butinsteadaboutmulQpleworkloads(tenants)onasinglecluster– HenceweareignoringVMs
• Twomainways(outsideofVM/Docker)ofprevenQngresourcegrabs:– LinuxControlGroups(cgroups)– YetAnotherResourceNegoQator(YARN)
22
PrivateCloud
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
YARN• CapacityScheduler
– DesignedformulQpletenants– Basedonresourcequeues– Enablesadministratorstodistributeresourcestoappsandusers
• FairScheduler– DesignedtoaverageoutresourcesperapplicaQon
– Doessupportqueues,butasaminimumguarantee
– Lowermaintenance,butlesscontrol
23
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
Whycgroups?• YARNdoesrequiredeveloperstowriteapplicaQonsinaspecificway
– SomeQmesitisquickerandsimplertonotadheretoYARN
• SomeQmestherequirementsarenotoverlycomplex– cgroupsenables“guarantees”by“hardparQQoning”ofthenodesattheLinuxlevel– cgroupsarestaQcbutcreateasimpletoadministerandrobustseparaQonofresources
24
Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|
4Segments
• SASandOraclePartnership• DataLakeDream• Customer#1–Telco(Asia)• Customer#2–Banking
• Thingsyoumightnotknow…
26
Big Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies, Oracle Corporation Maureen Chew, Oracle Corporation Gary Granito, Oracle Corporation 488-2013
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS AND ORACLE
WORKING TOGETHER TO CREATE CUSTOMER VALUE
• Joint R & D development and Product Management teams in Cary and Redwood Shores
• Focus on driving SAS technology components to run natively in Oracle database
• Joint performance engineering optimizations
• Template physical architectures developed based on use-cases
• Physically tested and benchmarked together
• Reduction in physical effort • Overall reduction in lifecycle
costs
• Best Practice papers • SAS and Oracle Engineers
provide joint "Sizing and Architecture Analysis and Design"
29 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
ORACLE ENGINEERED SYSTEMS FOR
SuperCluster ExaData ExaLogic Virtual
Compute
Appliance
Big Data
Appliance
Database
Backup, Recovery,
Logging Appliance
ZFS
Storage
Appliance
FutureState:DataLakePazern
DemandDepositsData
CreditApplicaQons
Data
CreditCardTransacQons
MortgageData
200+DataSources
ForecasQng
Profitability
AssetTracking
Others
FinanceClientScanning
OperaQonalExposure
Commercial
Consumer
Risk
Federal
SEC
ComplianceReporQng
Others
RegulatoryMoneyLaundering
CheckFraud
LoanFraud
Fraud
Datamart
Datamart
DataMart
SAS
Cognos
Cognos
QlikView
QueryTool
SASSAS
SAS
MathRackAnalysts
33 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS BIG DATA ON BIG DATA APPLIANCE
• Flexible Architectural options for SAS deployments • Can run on Starter, Half and Full configurations
• Optionally select nodes “N, N-1, N-2, …” for additional SAS Services such as SAS Compute Tier, SAS MidTier
• Optionally select node subset “N, N-1, N-2, N-3, …) for more dedicated resources for SAS Analytic Compute Environment by shifting Big Data Appliance roles
• Option to selectively add more memory on a per node basis depending on specific workload distribution
34 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS Midtier
STARTER BDA
…
…
SAS Visual Analytics Metadata Server SAS Compute
SAS HPA Root Node
SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP
35 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS Midtier
FULL RACK BDA
…
LASR Worker
17
HDFS Data 17
…
…
Metadata Server SAS Compute
SAS HPA Root Node
LASR Worker
18
HDFS Data 18
SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
RAS
NOC
SOC
RAS
CEM
Collection
EPQM
HR
Supply Chain
TELCO (ASIA) CHALLENGES THE SITUATION…
• IncreasingneedforData• Moredataheldatmoregranularlevels• ExpansionofAnalyQcsusersintheenterprise
DI Temp Space
Staging CMDM
CI Temp Space VA Workspace EM Workspace ADHOC
(ADM/ABTs)
ADM / ABTs
XXXXX BCA
XXXXX CLM
PLDT Home
PLDT EICB
XXXXX Network
IRM// Fraud Finance
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
TELCO (ASIA) BIG DATA ARCHITECTURE IMPLEMENTATION APPROACH
In Hadoop Processing
ADM & ABT DI Jobs (ETL Hadoop Target)
ADM & ABT DI Jobs (ETL SAS Target) SAN Storage
ADM DI Jobs using Sqoop (Hive)
ABT DI Jobs (ELT Hadoop Target)
(Pass Through + Explicit SQL + Hive Optimization)
Phase 0
Phase 1
Phase 2 EXADATA
BDA - CLOUDERA HADOOP
ADM (Hive)
ABTs (Hive)
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
TELCO (ASIA) DATA MANAGEMENT ON
HADOOP SAMPLE JOB FLOW - ADM
Source Table Extract Data
(Implicit SQL to partitioned table)
Table Loader (Proc Append)
Target Table (Hive) Phase 1
Phase 2 Source Table Sqoop Data
(Oozie & Sqoop Import)
Target Table (Hive)
Source Table Extract Data
(Implicit SQL to partitioned table)
Table Loader (Proc Append)
Target Table (SAS)
Phase 0
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
TELCO (ASIA) DATA MANAGEMENT ON
HADOOP
OPTIMIZED USING HADOOP SQOOP+OOZIE VIA CUSTOM SAS DI SQOOP TRANSFORM
SubjectArea TableLoadDurationinhours(pre-optimization)
LoadDuration,inhours(post-optimization)
Improvement(inhours)
%Improvement
DeltaLoadAggregationLevel
DeltaLoadDataSize(inGB)
FactTable1 0.7000 0.2383 0.4617 65.95% DAILY 5.60FactTable2 1.9500 0.5031 1.4469 74.20% WEEKLY 18.50FactTable3 3.8167 1.1969 2.6197 68.64% MONTHLY 37.20FactTable4 6.0667 0.6214 5.4453 89.76% DAILY 7.70FactTable5 0.5333 0.0617 0.4717 88.44% DAILY 1.70FactTable6 0.2500 0.0528 0.1972 78.89% DAILY 0.73FactTable7 0.4833 0.1011 0.3822 79.08% WEEKLY 2.20FactTable8 0.9667 0.2597 0.7069 73.13% MONTHLY 6.30FactTable9 5.7000 0.0697 5.6303 98.78% DAILY 1.70FactTable10 0.9333 0.1094 0.8239 88.27% WEEKLY 4.10FactTable11 1.7333 0.2517 1.4817 85.48% MONTHLY 8.10FactTable12 2.0833 0.2033 1.8800 90.24% DAILY 10.10FactTable13 4.4000 0.8367 3.5633 80.98% WEEKLY 39.80FactTable14 8.1333 2.6169 5.5164 67.82% MONTHLY 104.80FactTable15 0.4833 0.0447 0.4386 90.75% DAILY 0.75FactTable16 1.1000 0.0842 1.0158 92.35% WEEKLY 1.80FactTable17 1.3500 0.1008 1.2492 92.53% MONTHLY 3.60FactTable18 0.3833 0.0950 0.2883 75.22% DAILY 2.40FactTable19 0.6833 0.3014 0.3819 55.89% WEEKLY 14.70FactTable20 4.8333 1.1042 3.7292 77.16% MONTHLY 50.90
DimensionTable1 8.1000 2.9511 5.1489 63.57% N/A 213.70
AnalyticDataMart(ADM)
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
TELCO (ASIA) DATA MANAGEMENT ON
HADOOP OUTCOMES
• Reuse existing workflows • Retarget outputs to Hadoop Friendly Formats • Selectively upgrade processing to Hadoop Optimal
• Batch Window Dragon Slayed! • Infrastructure no longer challenged by explosive growth
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
HPA/TKGridRoot Nodegbbdap01
HPA/TKGridWorkNode
gbbdap02
HPA/TKGridWorkNode
gbbdap03
NM/DNVA/VSSMP
Hadoop PRODNode (Node manager/Data Node)
HBASE/Others
SASClients
(EM, EG, SASStudio
) SAS/ACCESSTO HADOOP
NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN
NM/DN
HPA/TKGridWorkNode
gbbdap04
HPA/TKGridWorkNode
gbbdap05
HPA/TKGridWorkNode
gbbdap06
HPA/TKGridWorkNode
gbbdap07
HPA/TKGridWorkNode
gbbdap08
HPA/TKGridWorkNode
gbbdap09
HPA/TKGridWorkNode
gbbdap20
HPA/TKGridWorkNode
gbbdap21
HPA/TKGridWorkNode
gbbdap22
HPA/TKGridWorkNode
gbbdap23
HPA/TKGridWorkNode
gbbdap24
HPA/TKGridWorkNode
gbbdap25
HPA/TKGridWorkNode
gbbdap26
HPA/TKGridWorkNode
gbbdap27
HPA/TKGridWorkNode
gbbdap19
SAS HPDMgbhpap01
SAS/ACCESSTO HADOOP
SSH
NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN
NN
NM/DN
EP Embedded Process
EP EP EP EP EP EP EP EP EP
EP EP EP EP EP EP EP EPEP
BANK EUROPE REFERENCE DIAGRAM
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
Data exploration at massive scale
Intuitive visual
analytics
SAS® VISUAL ANALYTICS EXPLORER
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
Descriptive and Predictive Modeling
Model
comparison
Dynamic group-by processing
SAS® VISUAL STATISTICS
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
In-Memory Statistics for
Hadoop:
Programming interface for SAS model
development
SAS® IN-MEMORY STATISTICS FOR HADOOP
Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
BANK EUROPE THINGS OF NOTE
• 1. SAS was not licensed for all nodes in the BDA
• SAS EP jobs scheduled by YARN will have full visibility to data across the cluster • SASHDAT (SAS High Performance Data Binary Format) needs data sets to stay on
the nodes licensed for SAS; some attention with Hadoop Balancer required.
• 2. SAS was not licensed for all cores on the nodes it was licensed for
• SAS Licensing Posture has improved – you can license say 200 cores on a 1000 core cluster and control limits with CGROUPS and YARN
49
SPD Engine with Hadoop § Support for running on MapR 4.0.2
§ Support for Code Accelerator
§ Enhanced WHERE pushdown: AND, OR, NOT, parenthesis, range operators and in-lists
§ Parallel write support can improve write performance up to 40%
§ Optionally uses Apache Curator/Zookeeper as a distributed lock server. No more physical lock files.
libname spdat spde '/user/dodeca' hdfshost=default;
50 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
Traditional Grid Manager Grid
control node Compute nodes
Shared File System
Run SAS job
LSF
YARN Resource Manager
YARN NM
YARN NM
YARN NM
Local Disks
Shared File System
Run SAS job
SAS GRID MANAGER FOR HADOOP
Leverage the processing power of the cluster • SAS runs completely in Hadoop
cluster • YARN with Oozie prioritizes and
schedules SAS jobs in cluster • All existing SAS Grid Manager
integrated (e.g. EM) capabilities • Reduction of SAN storage • No separate SAS compute tier
51 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
ORACLE ENGINEERED SYSTEMS FOR
SuperCluster ExaData ExaLogic Virtual
Compute
Appliance
ZFS
Storage
Appliance
Database
Backup, Recovery,
Logging Appliance
Big Data
Appliance
Top Related