Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData...

62

Transcript of Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData...

Page 1: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.
Page 2: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Microsoft APS Deep Dive

Matt GoswellFranz Robeller

DBI-B311

Page 3: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Self-service CollaborationCorporate PredictiveMobile

Extract, transform, load

Single query model Data quality Master data

management

Non-relationalRelational Analytical Streaming Internal & External

Data sources Non-Relational Data

The modern data warehouse

Page 4: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

About Analytics Platform System

Pre-built HW appliance

Windows Server 2012 R2 + SQL Server 2014

Massively Parallel Processing (MPP) to scale to 6 PBs

In-memory columnstore for 100x speed improvement

Dedicated region for HDInsight (Hadoop)

Integrated query model joining relational and HDInsight (Hadoop)

Available from HP and Dell

SQL Server Parallel Data Warehouse & HDInsight in a single appliance

Page 5: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Integrate relational + non-relational

5

Query relational and Hadoop in parallel

Single query

No need to ETL Hadoop data into DW

Query Hadoop with existing T-SQL skillsQuery re lat ional + non

re lat ional

SQL Result set

Relational data

PolyBase

Integrated query with PolyBase in SQL PDW

Semi\Unstructured\Streaming

Page 6: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

What is Hadoop?

6

Distributed, scalable system on commodity Hardware

Composed of a few parts:

HDFS – Distributed file system

MapReduce – Programming model

Others: HBase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper

MapReduce (Job Scheduling/Execution System)

HDFS (Hadoop Distributed File System)

HBase (Column DB)

Hive Mahout

Oozie

Sqoop

HBase/Cassandra/Couch/MongoDB

Avro

Zookeep

er

Pig

Hadoop = MapReduce + HDFS

FlumeCascad-

ingR

Am

bari

HCatalog

Page 7: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Hadoop alone is NOT the answer to all challenges

Move HDFS into the Warehouse Before Analysis

HDFS (Hadoop)

ETL

WarehouseHDFS (Hadoop)

Learn new skills

T-SQL

Build Integrate ManageMaintainSupport

Hadoop Ecosystem

New data sources

Devices

Web Sensor Social

“New” data sourcesNew data sources

Devices

Web Sensor Social

Page 8: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Hardware Architecture Overview (PDW+HDI)

One standard node type2x 8 core Intel processorsMemory 256 GB

Uses newest Infiniband connectivity (FDR – 56 GB/sec)

Using JBODsUsing Windows Server 2012 R2 technologies managing JBOD drives to achieve the same level of reliability and robustness as we know from SAN solutions

Backup- and Load Server (LS) are not in the appliance Customers can use their own hardwareCustomers can use more than 1 BU or LS for high availability

Scale unit conceptBase unit: Minimum configuration; populates rack with networking Scale unit: Adds capacity by 2–3 compute nodes/related storagePassive unit: Increases high availability (HA) capacity by adding more spares

Hardware Details

HST04

HST03

JBODIB andEthernet

Direct attached SAS

Base Unit

HSA03

HSA04

HST02

JBOD

HST01

HSA01

HSA02

Page 9: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Virtual Machine Architecture Overview (PDW Region)

General detailsAll hosts run Windows Server 2012 R2 StandardAll virtual machines run Windows Server 2012 R2 Standard as a guest operating systemAll fabric and workload activity happens in Hyper-V virtual machinesFabric virtual machines and CTL share one server Lower overhead costs especially for small topologiesPDW Agent runs on all hosts and all virtual machines and collects appliance health data on fabric and workloadDWConfig and Admin Console continue to exist Minor extensions expose host-level informationWindows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs)

PDW workload detailsSQL Server 2014 Enterprise Edition (PDW build) control node and compute nodes for PDW workload

Storage details2 Files on 2 LUNs per Filegroup, 8 Filegroups per compute nodeEach LUN is configured as RAID 1Large numbers of spindles are used in parallel

Software details

HST02

HST01

HSA02

HSA01

JBODIB andEthernet

Direct attached SAS

Base unit

Compute 2

Compute 1

• Window Server 2012 R2 Standard • PDW engine• DMS Manager• SQL Server 2014 Enterprise Edition (PDW build)• Shell databases just as in older versions

• Window Server 2012 R2 Standard • DMS Core • SQL Server 2014 Enterprise Edition (PDW build)

CTLWDSAD01 VMM

AD02

Page 10: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

SQL Server PDW 2014 Control Architecture Cost-Based Query Optimizer

Shell Appliance(SQL Server)

Engine Service

Pla

n

Ste

ps

Pla

n

Ste

ps

Pla

n

Ste

ps

Compute Node (SQL Server)

Compute Node (SQL Server)

Compute Node (SQL Server)

Con

trol N

od

e

SELECTSELECT• PDW 2014 uses SQL Server on the

Control node to run a “shell appliance”

• Every database with all its objects exists in the shell appliance as an empty “shell,” lacking the user data (which sits on all the compute nodes)

• Every DDL operation is executed against both the shell and the compute nodes

• Large parts of basic RDBMS functionality now provided by that shell

• Authentication and authorization of queries, but also the full security system

• Schema binding • Metadata catalog

foo foofoo

foo

Page 11: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Virtual Machine Architecture Overview (HDI Region)

General detailsAll hosts run Windows Server 2012 R2 StandardAll virtual machines run Windows Server 2012 R2 Standard as a guest operating systemAll HDI workload activity happens in Hyper-V virtual machinesLower overhead costs especially for small topologiesWindows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs) rather than SAN

HDI workload detailsWindows HDI Head-/Security-/Management- node and Data nodes for HDI workload

Storage details• 16 Data Disks per Data Node• No RAID 1 – single drives only! • But each file is stores 3 times in HDFS

Software details

HST04

HST03

HSA04

HSA03

JBODIB andEthernet

Direct attached SAS

Base unit

HMN01

HHN01

Data 3

Data 1

• Window Server 2012 R2 Standard • Windows HDI distribution software (version 2.x)

• Window Server 2012 R2 Standard • Windows HDI distribution software (version 2.x)

Data 4

Data 2

HSN01

Page 12: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Disk Layout: LUNs and Filegroups/Files (PDW Region)

.

.

...

.

JBOD

.

.

...

.

Disk 67 Disk 68

Disk 69 Disk 70

Node 1: Distribution A – file 1

Node 1: Distribution A – file 2

Node 1: Distribution B – file 2

Node 1: Distribution B – file 1

Node 1: Distribution H – file 1

Node 1: Distribution H – file 2

Hot spares

Fabric storage (VHDXs for nodes)

Temp DB

Log

Temp DB

Log

.

.

...

.

Node 2: Distribution A – file 1

Node 2: Distribution A – file 1

.

.

...

.

Disk 1 Disk 2

Disk 3 Disk 4

Disk 5 Disk 6

Disk 7 Disk 8

Disk 65 Disk 66

Disk 29 Disk 30

Disk 33 Disk 34

Disk 31 Disk 32

Disk 35 Disk 36

• Each LUN is composed of two drives in a RAID1 mirroring configuration

• Distributions are split across two files/LUNS

• TempDB and Log are across all 16 LUNs

• No fixed tempDB or log size allocation

• VHDXs are on JBODs to ensure HA

• Disk I/O further parallelized • Bandwidth: 2 cables with 4x

6 GBit/sec Lanes each

Design details

Page 13: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Distribution storage

• PDW stores eight distributions and one replicate per compute node.

• This value is fixed and cannot be configured dynamically nor preconfigured differently in the factory.• The value of eight does not change between smaller and larger appliances.

• Each distribution is stored on separate physical disks in the JBOD.• Replicated tables are striped across all disks in the JBOD.• The physical location for each distribution is controlled via SQL Server filegroups at the compute

node level.Compute node 1

Distribution A

Distribution B

Distribution C

Distribution D

Distribution H

Distribution G

Distribution F

Distribution E

Replicated

Data distribution layout

Compute node 2

Distribution A

Distribution B

Distribution C

Distribution D

Distribution H

Distribution G

Distribution F

Distribution E

Replicated

Page 14: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Distribution storage

• Data skew occurs when a collection of rows are stored in a distribution disproportionately to the count of rows in other distributions.

• Generally this is to the value chosen as the distribution key occurs significantly more often than other values in the data set.

• The following graph shows a real-world scenario where a table was distributed on an IP address.

• We should expect the distribution to beeven given that the IP is unique.

• In this situation a number of connections camevia proxy server, so the IP address was identical for a number of differentconnections.

• Skew will affect performance and limit storagecapacity by creating a hot spot for CPU andstorage on a single compute node.

Distribution of data

Page 15: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Hardware architectureInfiniBand

InfiniBand

Ethernet

Ethernet

Control node

Passive node

Master node

Passive node

Economical disk storage

Compute nodes

Economical disk storage

Compute nodes

Economical disk storage

Compute nodes

Networking

PDW region

PDW region

HDInsight region

Rack #1

InfiniBand

InfiniBand

Ethernet

Ethernet

Passive node

Economical disk storage

Compute nodes

Economical disk storage

Compute nodes

Economical disk storage

Compute nodes

HDI extension base unit

HDI active scale unit

HDI extension base unit

HDI active scale unit

Rack #2

HST-04

HST-03

HSA-03

HST-04

Economical disk storage

IB and Ethernet

Active Unit Addition of two or three compute nodes depending on OEM hardware configuration and related storage

Passive Unit Host available to accept the workload from the associated workload nodes

Failover Node

High availability for the rack

Page 16: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

3/4

Rack

45

.3TB

(Raw

)

Reserved Space (9U)• Data Integration Platform

server• Passive Unit (adds Failover Node)• Future expansion

Reserved Space (8U)• Data Integration Platform

server• Passive Unit (adds Failover Node)• Future expansion

InfinibandInfinibandEthernetEthernet

Control NodeFailover Node

JBOD 1

Compute Node 1Compute Node 2

JBOD 2

Compute Node 3Compute Node 4

JBOD 3

Compute Node 5Compute Node 6

JBOD 4

Compute Node 7Compute Node 8

ReservedSpace

PDW Backplane (6U):• Redundant Infiniband• Redundant Ethernet• Management and control (Active)• Rack Failover Node (Passive)

Base Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

¼

Rack

15

.1TB

(R

aw

)1

/2 R

ack

30

.2TB

(Raw

)

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Full R

ack

60

TB

(Raw

)

InfinibandInfinibandEthernetEthernet

Failover Node

JBOD 5

Compute Node 9Compute Node 10

JBOD 6

Compute Node 11Compute Node 12

JBOD 7

Compute Node 13Compute Node 14

JBOD 8

Compute Node 15Compute Node 16

ReservedSpace

Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)

Extension Base Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75TB

R

ack

75

.5TB

(R

aw

)

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

3 R

ack

18

1.2

TB

(Raw

)

1 1

/2 R

ack

90

.6TB

(Raw

)2

Rack

12

0.8

TB

(Raw

)

InfinibandInfinibandEthernetEthernet

Failover Node

JBOD 9

Compute Node 17Compute Node 18

JBOD 10

Compute Node 19Compute Node 20

JBOD 11

Compute Node 21Compute Node 22

JBOD 12

Compute Node 23Compute Node 24

ReservedSpace

Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)

Extension Base Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/Ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Reserved Space (9U)• Data Integration Platform

server• Passive Unit (adds Failover Node)• Future expansion

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/Ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

HP configuration

Page 17: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

1/3

Rack

22

.6TB

(R

aw

)

2/3

Rack

45

.3TB

(Raw

)Fu

ll Rack

67

.9TB

(Raw

)

Base Unit (6U):• Redundant Infiniband• Redundant Ethernet• Management and Control (Active)• Rack Failover Node (Passive)

Base Unit (10U):• 3 servers in 2U enclosure

• (16 cores/ea. Total: 48)• 2 JBOD 4U ea.

• 1 TB drives• User data capacity: 79 TB

Scale Unit (10U):• 3 servers in 2U enclosure

• (16 cores/ea. Total: 48)• 2 JBOD 4U ea.

• 1 TB drives• User data capacity: 79 TB

Reserved Space (6U)• Passive Unit (adds Failover Node)• Future expansion

InfinibandInfinibandEthernetEthernet

Control NodeFailover Node

JBOD 2

Compute Node 2Compute Node 3

JBOD 1

Compute Node 1

JBOD 5

Compute Node 8Compute Node 9

JBOD 6

Compute Node 7

Scale Unit (10U):• 3 servers in 2U enclosure

• (16 cores/ea. Total: 48)• 2 JBOD 4U ea.

• 1 TB drives• User data capacity: 79 TB

ReservedUse

JBOD 3

Compute Node 5Compute Node 6

JBOD 4

Compute Node 4

JBOD 1

Compute Node 2Compute Node 3

JBOD 2

Compute Node 1

Dell/Quanta configuration

Page 18: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Factoring in Compression: HP# Compute Nodes (1TB Drives)

2 4 6 8 10 12 16 20 24 32 40 48 56

Compression Ratio

1 15.1 30.2 45.3 60.4 75.5 90.6 121 151 181 242 302 362 4232 30.2 60.4 90.6 121 151 181 242 302 362 483 604 725 8463 45.3 90.6 136 181 227 272 362 453 544 725 906 1087 12684 60.4 121 181 242 302 362 483 604 725 966 1208 1450 16915 75.5 151 227 302 378 453 604 755 906 1208 1510 1812 21146 90.6 181 272 362 453 544 725 906 1087 1450 1812 2174 25377 106 211 317 423 529 634 846 1057 1268 1691 2114 2537 29608 121 242 362 483 604 725 966 1208 1450 1933 2416 2899 33829 136 272 408 544 680 815 1087 1359 1631 2174 2718 3262 3805

10 151 302 453 604 755 906 1208 1510 1812 2416 3020 3624 422811 166 332 498 664 831 997 1329 1661 1993 2658 3322 3986 465112 181 362 544 725 906 1087 1450 1812 2174 2899 3624 4349 507413 196 393 589 785 982 1178 1570 1963 2356 3141 3926 4711 549614 211 423 634 846 1057 1268 1691 2114 2537 3382 4228 5074 591915 227 453 680 906 1133 1359 1812 2265 2718 3624 4530 5436 6342

Page 19: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Compression: DELL & Quanta# Compute Nodes (1TB Drives)

3 6 9 12 15 18 21 24 27 36 45 54

Compressio

n Ratio

1 23 45 68 91 113 136 159 181 204 272 340 4082 45 91 136 181 227 272 317 362 408 544 680 8153 68 136 204 272 340 408 476 544 612 815 1019 12234 91 181 272 362 453 544 634 725 815 1087 1359 16315 113 227 340 453 566 680 793 906 1019 1359 1699 20396 136 272 408 544 680 815 951 1087 1223 1631 2039 24467 159 317 476 634 793 951 1110 1268 1427 1903 2378 28548 181 362 544 725 906 1087 1268 1450 1631 2174 2718 32629 204 408 612 815 1019 1223 1427 1631 1835 2446 3058 3669

10 227 453 680 906 1133 1359 1586 1812 2039 2718 3398 407711 249 498 747 997 1246 1495 1744 1993 2242 2990 3737 448512 272 544 815 1087 1359 1631 1903 2174 2446 3262 4077 489213 294 589 883 1178 1472 1767 2061 2356 2650 3533 4417 530014 317 634 951 1268 1586 1903 2220 2537 2854 3805 4757 570815 340 680 1019 1359 1699 2039 2378 2718 3058 4077 5096 6116

Page 20: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Compression vs. Scan Performance# Seconds to Scan a 1 TB table at 200 MB/second

# Compute Nodes 2 3 4 6 8 9

# Distributions 16 24 32 48 64 72

Compression

Ratio

1 327.68 218.45 163.84 109.23 81.92 72.822 163.84 109.23 81.92 54.61 40.96 36.413 109.23 72.82 54.61 36.41 27.31 24.274 81.92 54.61 40.96 27.31 20.48 18.205 65.54 43.69 32.77 21.85 16.38 14.566 54.61 36.41 27.31 18.20 13.65 12.147 46.81 31.21 23.41 15.60 11.70 10.408 40.96 27.31 20.48 13.65 10.24 9.109 36.41 24.27 18.20 12.14 9.10 8.09

10 32.77 21.85 16.38 10.92 8.19 7.2811 29.79 19.86 14.89 9.93 7.45 6.6212 27.31 18.20 13.65 9.10 6.83 6.0713 25.21 16.80 12.60 8.40 6.30 5.6014 23.41 15.60 11.70 7.80 5.85 5.2015 21.85 14.56 10.92 7.28 5.46 4.85

Page 21: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Demo – Overview of the APS appliance

Matt Goswell

Page 22: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Appliance Domains/RegionsGeneral details

• Physical hosts • Virtual machines required to maintain the appliance

infrastructure and workload virtual machine configurations

• Windows Storage Spaces handles mirroring and spares

PDW workload detailsSQL Server 2014 Enterprise Edition (PDW build)

Control node and compute nodes for PDW workload

Storage details2 Files on 2 LUNs per Filegroup, 8 Filegroupseach LUN configured as RAID 1Takes advantage of large number of spindles in parallel

HDI workload detailsWindows HDI (100% Apache Hadoop) based on Hortonworks HDP Head-/Security-/Management- node and Data nodes

Fabric Domain

PDW Workload Region

HDI Workload Region

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Page 23: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Fabric Virtual MachinesGeneral details

• Manages access between physical hosts• Required to support the clusters• Holds Appliance Active Directory and DNS• VMs are stored on local disks, not on CSVs and to have

redundancy we have two AD VMs on two physical machines

General details (new in v2 AU3)• Used to deploy Windows operating systems over the

appliance network • VM is stored on CSVs

General details• Manages the configuration/image of virtual machines

within the appliance• VM is stored on CSVs

Fabric Active Directory (AD01 and AD02)

SC Virtual Machine Manager (VMM)

Windows Deployment Services (WDS)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Page 24: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

PDW Workload Virtual Machines – HST01

General details• Client connections always go through the control

node• Contains no persistent user data• Contains metadata

• System metadata• User shell databases

• Parallel Data Warehouse advantages:• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution

• Local SQL Server processes final query plan and aggregates results

• Runs the “Data Movement Service” (DMS)• Manages connectivity to Compute Nodes• Manages Query execution• Runs the “Azure Data Management Gateway” (since

v2 AU3)- Enables Query from Cloud to On Prem through APS

• VM is stored on CSVs

Control Node (CTL)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Page 25: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

PDW Workload Virtual Machines – HSAxx

General details• Each MPP node is a highly tuned symmetric multi-

processing (SMP/NUMA) node with standard interfaces

• Provides dedicated hardware, database, and storage

• Runs SQL Server 2014 Enterprise Edition (PDW Build)

• Runs the Data Movement Service (DMS)• VMs are stored on CSVs

General details• Storage managed by Windows Server 2012 R2

Storage Spaces• Cross connected via dual 4x 6 GB/sec SAS

connections

Compute Nodes

Just a Bunch Of Disks (JBOD)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Page 26: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HDI Workload Virtual Machines – HST03

General details• Client connections always go through the Head Node• Ambari Agent, Namenode (1,2,3), Jobtracker, History

Server, HiveServer (1,2,3), HiveMetastore, OoziService, WebcatServer…

• IIS• Developer Dash Board• Secure Gateway

• IIS, Ambari Agent• SQL Server

• Ambari Agent • Datanode• TaskTracker

Head Node (HHN01)

Secure Gateway Node (HSN01)

Management Node (HMN01)

Data Node (HDN001 - HDNxxx)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Page 27: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HSA02HSA02

HST01HST02

HST01

Failover FunctionalityFailover Cluster Manager starts virtual machine on a new host after failure

Cluster Shared VolumesEnables all nodes to access the LUNs on the JBOD as long as at least one of the hosts attached to the JBOD is activeUses SMB3 protocol

Failover detailsOne cluster across the whole applianceVirtual machine images are automatically started on new host in the event of failoverRules enforced by affinity and anti-affinity mapsFailback continues to be through CSS

Uses Windows Failover Cluster Manager

Adding Passive Unit increases HA capacityEnables another virtual machine to fail without disabling the applianceAll hosts connected to a single JBOD cannot failover

Details

HSA01

JBODIB &Ethernet

Direct attached SAS

Base UnitCTLWDSAD01 VMM

Compute 1

Compute 2

HST03 Passive Unit

HST02

Base UnitCTL

Compute 2CTL

Sample: PDW Region (Base Unit - HP)

WDSAD01 VMMCompute 2AD02

AD02

Page 28: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HSA02HSA02

Replace NodeSingle type of node

Sole differentiator—storage attached vs. storage unattachedExecution commonality regardless if host is being replaced

Workloads migrate with virtual machinesReplace Node follows a subset of the bare metal provisioning

using WDS and executes the APS Setup.exe with the replace node action specified, along with the necessary information targeting the replacement node

Workload virtual machines do not have to be re-provisioned

Workload virtual machines are failed back using Windows Failover Cluster Manager

Failback still incurs small downtimeMay be a small performance impact by failed over compute nodes; documentation will suggest fail-backCurrently not using Live Migration for failover and failback

Details

HSA01

JBODIB andEthernet

Direct attached SAS

Compute 1

Compute 2

HST02

HST01 Base UnitCTLWDSAD01 VMM

Compute 2

Sample: PDW Region (Base Unit - HP)

AD02

Page 29: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HSA04

HSA03

JBOD

HSA03Compute 3

HSA04Compute 4Compute 4

Compute 3

Add Unit: Scaling from 2 to 56 Nodes

Addition to the appliance is in the form of one or more scale units

IHV owns installation and cabling of new scale units

Software provisioning consists of three phasesBare metal provisioning of new nodes (online since AU1)Provisioning of workload virtual machines (online since AU1)Redistribution of data (offline)

CSS assistance (may have to help prepare user data)Tools to validate environment/data transitionDevelop strategy for successful addition

Deleting old dataPartition switching from largest tablesCRTAS to move data off appliance temporarily

Details

HST02

HST01 Base UnitCTLWDSAD01 VMM

JBOD

IB andEthernet

Direct attached SAS

Sample: PDW Region (Base Unit - HP)

AD02

HSA02

HSA01

JBOD

Compute 1

Compute 2

PDW Region must have enough free space to re-distribute the largest table.

Page 30: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Supported ExtensionsAll actions performed by CSS

Add Unit

Data Scale Unit

Add Region

Hadoop Region only

Replace Node

• Hardware failure

Replace VM

• VM corruption

Page 31: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Project PolyBase• Background• Research done by Gray System Lab lead by Technical Fellow David

DeWitt

• High-level goals for V2 • Seamless Integration with Hadoop via regular T-SQL • Enhancing PDW query engine to process data coming from the Hadoop

Distributed File System (HDFS) • Fully parallelized query processing for highly performing data import

and export from HDFS • Integration with various Hadoop implementations• Hadoop on Windows Server, Hortonworks, and Cloudera

Page 32: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Project PolyBase• Both distributed systems

• Parallel data access between PDW and Hadoop

• Different goals and internal architecture

• Combined power of Big Data integration

Control Node

Compute Node

Compute Node

Compute Node

Compute Node

Name Node

Data Node

Data Node

Data Node

Data Node

PDW

Page 33: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Project PolyBase• Direct parallel data access between PDW Compute Nodes and Hadoop Data Nodes

• Support of all HDFS file formats

• Introducing “structure” on the “unstructured” data

PDWHadoop

Query

1

2

HDFS block

s

Results

3

HDFS DB

SQL in, results out

PDWHadoop

Query

1

2

HDFS block

sResults

HDFS DB

SQL in, results stored in HDFS

Page 34: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Project PolyBase next steps• Cost-based decision on how much data needs to be pushed to PDW

• SQL operations on HDFS data pushed into Hadoop as MapReduce jobs

HDFS

PDWHadoop

Results

7

2Map job

5

HDFS

blocks DB

3 4 6

SQL

1

MapReduce

Page 35: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

External tableIntroducing structure to semi\unstructured data

• Representation of data residing in Hadoop/HDFS

• Introducing new T-SQL Syntax

• Syntax very similar as if it were a regular table

Page 36: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

CREATE External Table Syntax--Create a new external table in SQL Server PDWCREATE EXTERNAL TABLE [ database_name . [ dbo ] . | dbo. ] table_name ( <column_definition> [ ,...n ] ) WITH ( LOCATION = 'hdfs_folder_or_filepath', DATA_SOURCE = external_data_source_name, FILE_FORMAT = external_file_format_name [ , <reject_options> [ ,...n ] ] ) [;] <reject_options> ::={ | REJECT_TYPE = value | percentage | REJECT_VALUE = reject_value | REJECT_SAMPLE_VALUE = reject_sample_value}

Indicates external

table

1.

Required location of Hadoop file2.

File Format options associated with data import from HDFS

(for example, arbitrary field delimiters and reject-related

thresholds)

4.

Required Data Source Definition of Hadoop

Cluster3.

Page 37: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

External table - Sample--STEP 1: Create an external data source for Hadoop-- DROP EXTERNAL Data Source FXR_TEST_DSRC;CREATE EXTERNAL DATA SOURCE FXR_TEST_DSRC WITH ( TYPE = HADOOP , LOCATION = 'hdfs://192.168.210.145:8020' , JOB_TRACKER_LOCATION = '192.168.210.145:50300'

---- default: 8021=Cloudera; 50300=HDInsight );

--STEP 2: Create an external file format for a Hadoop text-delimited file.--DROP EXTERNAL FILE FORMAT FXR_Test_Format;CREATE EXTERNAL FILE FORMAT FXR_Test_FormatWITH ( FORMAT_TYPE = DELIMITEDTEXT , FORMAT_OPTIONS

( FIELD_TERMINATOR = N';' , USE_TYPE_DEFAULT = TRUE , STRING_DELIMITER = '‘

) );

Page 38: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

External table - Sample (cont.)--STEP 3: Create a new external table in SQL Server PDWdrop external table Test;gocreate external table Test(name nvarchar(17), startzeitpunkt nvarchar(35),endzeitpunkt varchar(35), flms_system_realtime nvarchar(19), dummy nvarchar(19) NULL, Counter1DTonDur nvarchar(19), Counter1DMileage nvarchar(19), dummy2 nvarchar(2) NULL)WITH(LOCATION = '/user/fxr47511/pdwtest' , DATA_SOURCE = FXR_TEST_DSRC , FILE_FORMAT = FXR_Test_Format , REJECT_TYPE = value , REJECT_VALUE = 1000);

Page 39: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

External tableConsiderations• Data can be changed or removed at any time on Hadoop side

• PDW V2 will not guarantee any form of concurrency control or isolation level

• Same query may return different results—data gets changed on Hadoop/HDFS side between two query runs

• Query may fail if data gets removed or relocated

• Location of the data residing on an external cluster gets validated every time a user selects from it

Page 40: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Demo – Integrating data from PDW and HDIMatt Goswell

Page 41: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HDFS Bridge• Direct HDFS access

• Functional part of Data Movement Service

• Hides HDFS complexity

• HDFS file types supported by use of appropriate RecordReader interface

DMSSQL Server DMS SQL Server

HDFS Hadoop Cluster

HDFSHDFS

HDFS Bridge

HDFS Bridge

PDW Node

PDW Node

Page 42: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

HDFS Bridge – Data transformation

DMS Ser erPDW Engine

Load Manage

r

DMSManage

r DMS

SQLServe

rDMS

Converter Sender

Receiver Writer

Query against external table executed in PDW

HDFS Bridge reads data blocks by using Hadoop RecordReaders interface

Each row is converted for bulk insert and hashed based on the distribution column

Hashed row is sent to appropriate node receiver for loading

HDFS Hadoop Cluster

HDFSHDFS

HDFS Bridge

DMS

Converter Sender

Receiver Writer

HDFS Bridge

Row is bulk inserted into destination table

Control Node Compute

Nodes

APS Appliance

Page 43: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Setup and monitoring• Java runtime libraries are included with the APS v2 (AU1+)

software and are installed automatically when you install a HADOOP REGION in the APS Appliance

• Security• Static user account created on Hadoop

• Hadoop connectivity must be enabled in PDW• exec sp_configure 'hadoop connectivity', 1

• List of external tables• Shown in SSDT Object explorer• sys.pdw_external_tables

Page 44: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Enabling Hadoop connectivityexec sp_configure 'hadoop connectivity', 1

Parameter

Enables support for…

0 • Disable Hadoop connectivity

1 • HortonWorks (HDP 1.3) for Microsoft Server• HDInsight on Analytics Platform System (version AU1) (HDP

1.3)• Azure blob storage on Microsoft Azure (WASB[S]) (AU1)

2 • HortonWorks (HDP 1.3) for Linux

3 • Cloudera CDH 4.3 for Linux

4 • HortonWorks (HDP 2.0) for Windows Server• HDInsight on Analytics Platform System (AU2) (HDP 2.0)• Azure blob storage on Microsoft Azure (WASB[S]) (AU2)

5 • HortonWorks (HDP 2.0) for Linux

Page 45: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Columnstore overview

• Clustered columnstore index is comprised of two parts:• Columnstore• Deltastore

• Data is compressed into segments• Ideally ~1 million rows (subject to

system resource availability)

• A collection of segments representing a set of entire rows is called a row group

• The minimum unit of I/O between disk and memory is a segment (red block is a single segment)

• Execution in batch mode (as opposed to traditional row mode) moves multiple rows between iterators:

~ 1000 Rows

• Dictionaries (primary and secondary) are used to store additional metadata about segments

Terminology

C1 C2 C3 C5 C6C4

Row group Segments

C1 C2 C3 C5 C6C4

Delta (row) store

Columnstore

Page 46: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Columnstore overview

• Columnstore indexes are now clustered• Only a single index exists for a table:

• Clustered index + non-clustered columnstore is not supported• Clustered columnstore + non-clustered row store index is not supported

• Full DML (Insert, Update, Delete, Select) is supported directly on columnstore• A previous workaround involved maintaining a separate secondary row store table with UNION

ALL

• All PDW data types are supported in columnstore indexes:• Decimal with precision greater than 18 was not supported in SQL Server 2012• Binary/varbinary was not supported in SQL Server 2012• Datetimeoffset with scale greater than 2 was not supported in SQL Server 2012

• Query processing• Batch mode hash join spill implemented (previously this would revert to row mode).• Aggregations without GROUP BY supported

• Some limitations still apply• Avoid string data types for filtering or join conditions• Some SQL clauses, for example: ROW_NUMBER() / RANK() etc. OVER (PARTITION BY … ORDER

BY …)

Enhancements beyond SQL Server 2012

Page 47: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Columnstore overview

• SQL 2012 implemented batch mode processing to handle rows a batch-at-a-time in addition to a row-at-a-time• SQL 2008 and before only had row processing

• Typically batches of about 1,000 rows are moved between iterators• Significantly less CPU is required due to the

average number of instructions per row decreasing

• Batch mode processing:• Hash Join/Aggregate are supported• Merge Join, Nested Loop Join, and Stream

Aggregate are not supported

SELECT COUNT(*) FROM FactInternetSales_Column 352 msSELECT COUNT(*) FROM FactInternetSales_Row 6704 ms

More than compression – batch mode

Batch mode scan example

Row mode scan example

Page 48: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Columnstore index design

• INSERTED a single record into a table with clustered columnstore index

• Screenshots taken from DMV sys.pdw_nodes_column_store_row_groups (subset of total rows returned)

• Before row inserted:

• After single row inserted (deltastore has been created and single row represented)

• After REBUILD

• REORGANIZE only moves “Closed” delta store segments into a “Compressed” status• REBUILD affects entire index (or entire partition index)

DMV Show Data Change

Segment row count increased by 1

Page 49: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Columnstore index design

• The state_description field has three states: COMPRESSED, OPEN, and CLOSED• COMPRESSED represents a row group that is stored in columnstore format• OPEN represents a deltastore that is accepting new rows• CLOSED represents a full deltastore ready for REORGANIZE

• When inserting 102,400 rows or more in a single batch into a columnstore index distribution, the data will compress automatically

• When inserting 102,399 or less rows in a single batch, the data will be stored in the delta store

• The actual maximum number of rows per delta store is 1,048,576 at which point it is CLOSED• This is also the ideal segment size that SQL Server will try to create when first building a

columnstore index from a table• When Index Build encounters memory pressure, DOP is reduced first and then segment size is

reduced

• Only the REBUILD statement can compress a delta store that is not in the CLOSED state• Neither REORGANIZE or the Tuple Mover process will have any effect on an OPEN delta store

Columnstore data movement

Page 50: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Demo – Exploring some of the APS management views and metadataMatt Goswell

Page 51: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Get Started Today!Sign up for a free architectural design session with your Microsoft representative

Learn about the Microsoft Analytics Platform System at www.microsoft.com/aps

Try HDInsight at www.microsoft.com/bigdata

Try SQL Server for data warehousing in Windows Azure VMs at www.windowsazure.com

Try SQL Server 2014 at www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx

Page 52: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

DBI-B310 Microsoft Analytics Platform System Overview

Related content

DBI-B337 Polybase in the Modern Data Warehouse

We’re going to be at Ask the Experts

Page 53: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

27 Hands on Labs + 8 Instructor Led Labs in Hall 7

DBI Track resources

Free SQL Server 2014 Technical Overview e-book

microsoft.com/sqlserver and Amazon Kindle StoreFree online training at Microsoft Virtual Academy

microsoftvirtualacademy.com Try new Azure data services previews!Azure Machine Learning, DocumentDB, and Stream Analytics

Page 54: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

Developer Network

http://developer.microsoft.com

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Page 55: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Please Complete An Evaluation FormYour input is important!TechEd Schedule Builder CommNet station or PC

TechEd Mobile appPhone or Tablet

QR code

Page 56: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Evaluate this session

Page 57: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 58: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

When performance mattersA financial customer required a powerful analytics platform…to improve performance, deliver an enhanced level of customer service, handle terabytes of data and an unprecedented level of query complexity.

The right solutionMicrosoft Analytics Platform System

Complex Query

SQL Server 2008R2

[4 hours]

APS[15

seconds]

£600 £0.625

960x FASTER

Page 59: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

PDW Configuration ManagerAppliance Topology:

• Password Reset• Time Zone• Network

Parallel Data Warehouse Topology:

• Certificate• Firewall• PDW Service Status• Instant File

Initialization• Restore Master

Database

HDInsight Topology:

• Certificate• Firewall• HDI Service Status• User Management

Page 60: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Admin Console

DashboardQueries ActivityLoad ActivityBackup and Restore

Active LocksActive SessionsAlertsAppliance State

https://controlnodeipaddress

Page 61: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Admin Console

HDFSHealthMAP/ReduceStoragePerformance Monitor

For each HDI Node:• OS• Data

Page 62: Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData qualityMaster data management Non-relationalRelationalAnalyticalStreamingInternal.

Performance …figures.Actual performance figures for data export from PDW

Method Performance NotesDWSQL 10 Gb\Hr Single thread

SQLCMD 10 Gb\Hr Single thread

PDE \ CRTAS 500 Gb\Hr Spoke performance

BCP 56 Gb\Hr Single thread

Polybase 280 Gb\Hr3 VM data nodes, 8 compute

nodesSSIS 46 Gb\Hr Per stream

PDW= HP Full Rack V2 Appliance (8 nodes).Spoke = HP DL980, 4 proc (32 core), 1TB RAMData Nodes = VM’s built within the spoke DL980 (2 cores, 64GB RAM)