Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData...

Microsoft APS Deep Dive

Matt GoswellFranz Robeller

DBI-B311

Self-service CollaborationCorporate PredictiveMobile

Extract, transform, load

Single query model Data quality Master data

management

Non-relationalRelational Analytical Streaming Internal & External

Data sources Non-Relational Data

The modern data warehouse

About Analytics Platform System

Pre-built HW appliance

Windows Server 2012 R2 + SQL Server 2014

Massively Parallel Processing (MPP) to scale to 6 PBs

In-memory columnstore for 100x speed improvement

Dedicated region for HDInsight (Hadoop)

Integrated query model joining relational and HDInsight (Hadoop)

Available from HP and Dell

SQL Server Parallel Data Warehouse & HDInsight in a single appliance

Integrate relational + non-relational

5

Query relational and Hadoop in parallel

Single query

No need to ETL Hadoop data into DW

Query Hadoop with existing T-SQL skillsQuery re lat ional + non

re lat ional

SQL Result set

Relational data

PolyBase

Integrated query with PolyBase in SQL PDW

Semi\Unstructured\Streaming

What is Hadoop?

6

Distributed, scalable system on commodity Hardware

Composed of a few parts:

HDFS – Distributed file system

MapReduce – Programming model

Others: HBase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper

MapReduce (Job Scheduling/Execution System)

HDFS (Hadoop Distributed File System)

HBase (Column DB)

Hive Mahout

Oozie

Sqoop

HBase/Cassandra/Couch/MongoDB

Avro

Zookeep

er

Pig

Hadoop = MapReduce + HDFS

FlumeCascad-

ingR

Am

bari

HCatalog

Hadoop alone is NOT the answer to all challenges

Move HDFS into the Warehouse Before Analysis

HDFS (Hadoop)

ETL

WarehouseHDFS (Hadoop)

Learn new skills

T-SQL

Build Integrate ManageMaintainSupport

Hadoop Ecosystem

New data sources

Devices

Web Sensor Social

“New” data sourcesNew data sources

Devices

Web Sensor Social

Hardware Architecture Overview (PDW+HDI)

One standard node type2x 8 core Intel processorsMemory 256 GB

Uses newest Infiniband connectivity (FDR – 56 GB/sec)

Using JBODsUsing Windows Server 2012 R2 technologies managing JBOD drives to achieve the same level of reliability and robustness as we know from SAN solutions

Backup- and Load Server (LS) are not in the appliance Customers can use their own hardwareCustomers can use more than 1 BU or LS for high availability

Scale unit conceptBase unit: Minimum configuration; populates rack with networking Scale unit: Adds capacity by 2–3 compute nodes/related storagePassive unit: Increases high availability (HA) capacity by adding more spares

Hardware Details

HST04

HST03

JBODIB andEthernet

Direct attached SAS

Base Unit

HSA03

HSA04

HST02

JBOD

HST01

HSA01

HSA02

Virtual Machine Architecture Overview (PDW Region)

General detailsAll hosts run Windows Server 2012 R2 StandardAll virtual machines run Windows Server 2012 R2 Standard as a guest operating systemAll fabric and workload activity happens in Hyper-V virtual machinesFabric virtual machines and CTL share one server Lower overhead costs especially for small topologiesPDW Agent runs on all hosts and all virtual machines and collects appliance health data on fabric and workloadDWConfig and Admin Console continue to exist Minor extensions expose host-level informationWindows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs)

PDW workload detailsSQL Server 2014 Enterprise Edition (PDW build) control node and compute nodes for PDW workload

Storage details2 Files on 2 LUNs per Filegroup, 8 Filegroups per compute nodeEach LUN is configured as RAID 1Large numbers of spindles are used in parallel

Software details

HST02

HST01

HSA02

HSA01

JBODIB andEthernet

Direct attached SAS

Base unit

Compute 2

Compute 1

• Window Server 2012 R2 Standard • PDW engine• DMS Manager• SQL Server 2014 Enterprise Edition (PDW build)• Shell databases just as in older versions

• Window Server 2012 R2 Standard • DMS Core • SQL Server 2014 Enterprise Edition (PDW build)

CTLWDSAD01 VMM

AD02

SQL Server PDW 2014 Control Architecture Cost-Based Query Optimizer

Shell Appliance(SQL Server)

Engine Service

Pla

n

Ste

ps

Pla

n

Ste

ps

Pla

n

Ste

ps

Compute Node (SQL Server)



Con

trol N

od

e

SELECTSELECT• PDW 2014 uses SQL Server on the

Control node to run a “shell appliance”

• Every database with all its objects exists in the shell appliance as an empty “shell,” lacking the user data (which sits on all the compute nodes)

• Every DDL operation is executed against both the shell and the compute nodes

• Large parts of basic RDBMS functionality now provided by that shell

• Authentication and authorization of queries, but also the full security system

• Schema binding • Metadata catalog

foo foofoo

foo

Virtual Machine Architecture Overview (HDI Region)

General detailsAll hosts run Windows Server 2012 R2 StandardAll virtual machines run Windows Server 2012 R2 Standard as a guest operating systemAll HDI workload activity happens in Hyper-V virtual machinesLower overhead costs especially for small topologiesWindows Storage Spaces handles mirroring and spares and enables use of lower cost DAS (JBODs) rather than SAN

HDI workload detailsWindows HDI Head-/Security-/Management- node and Data nodes for HDI workload

Storage details• 16 Data Disks per Data Node• No RAID 1 – single drives only! • But each file is stores 3 times in HDFS

Software details

HST04

HST03

HSA04

HSA03

JBODIB andEthernet

Direct attached SAS

Base unit

HMN01

HHN01

Data 3

Data 1

• Window Server 2012 R2 Standard • Windows HDI distribution software (version 2.x)

• Window Server 2012 R2 Standard • Windows HDI distribution software (version 2.x)

Data 4

Data 2

HSN01

Disk Layout: LUNs and Filegroups/Files (PDW Region)

.

.

...

.

JBOD

.

.

...

.

Disk 67 Disk 68

Disk 69 Disk 70

Node 1: Distribution A – file 1


Node 1: Distribution B – file 2

Node 1: Distribution B – file 1

Node 1: Distribution H – file 1

Node 1: Distribution H – file 2

Hot spares

Fabric storage (VHDXs for nodes)

Temp DB

Log

Temp DB

Log

.

.

...

.



.

.

...

.

Disk 1 Disk 2

Disk 3 Disk 4

Disk 5 Disk 6

Disk 7 Disk 8

Disk 65 Disk 66

Disk 29 Disk 30

Disk 33 Disk 34

Disk 31 Disk 32

Disk 35 Disk 36

• Each LUN is composed of two drives in a RAID1 mirroring configuration

• Distributions are split across two files/LUNS

• TempDB and Log are across all 16 LUNs

• No fixed tempDB or log size allocation

• VHDXs are on JBODs to ensure HA

• Disk I/O further parallelized • Bandwidth: 2 cables with 4x

6 GBit/sec Lanes each

Design details

Distribution storage

• PDW stores eight distributions and one replicate per compute node.

• This value is fixed and cannot be configured dynamically nor preconfigured differently in the factory.• The value of eight does not change between smaller and larger appliances.

• Each distribution is stored on separate physical disks in the JBOD.• Replicated tables are striped across all disks in the JBOD.• The physical location for each distribution is controlled via SQL Server filegroups at the compute

node level.Compute node 1

Distribution A

Distribution B

Distribution C

Distribution D

Distribution H

Distribution G

Distribution F

Distribution E

Replicated

Data distribution layout

Compute node 2

Distribution A

Distribution B

Distribution C

Distribution D

Distribution H

Distribution G

Distribution F

Distribution E

Replicated

Distribution storage

• Data skew occurs when a collection of rows are stored in a distribution disproportionately to the count of rows in other distributions.

• Generally this is to the value chosen as the distribution key occurs significantly more often than other values in the data set.

• The following graph shows a real-world scenario where a table was distributed on an IP address.

• We should expect the distribution to beeven given that the IP is unique.

• In this situation a number of connections camevia proxy server, so the IP address was identical for a number of differentconnections.

• Skew will affect performance and limit storagecapacity by creating a hot spot for CPU andstorage on a single compute node.

Distribution of data

Hardware architectureInfiniBand

InfiniBand

Ethernet

Ethernet

Control node

Passive node

Master node

Passive node

Economical disk storage

Compute nodes


Compute nodes


Compute nodes

Networking

PDW region

PDW region

HDInsight region

Rack #1

InfiniBand

InfiniBand

Ethernet

Ethernet

Passive node


Compute nodes


Compute nodes


Compute nodes

HDI extension base unit

HDI active scale unit

HDI extension base unit

HDI active scale unit

Rack #2

HST-04

HST-03

HSA-03

HST-04


IB and Ethernet

Active Unit Addition of two or three compute nodes depending on OEM hardware configuration and related storage

Passive Unit Host available to accept the workload from the associated workload nodes

Failover Node

High availability for the rack

3/4

Rack

45

.3TB

(Raw

)

Reserved Space (9U)• Data Integration Platform

server• Passive Unit (adds Failover Node)• Future expansion



InfinibandInfinibandEthernetEthernet

Control NodeFailover Node

JBOD 1

Compute Node 1Compute Node 2

JBOD 2


JBOD 3


JBOD 4


ReservedSpace

PDW Backplane (6U):• Redundant Infiniband• Redundant Ethernet• Management and control (Active)• Rack Failover Node (Passive)

Base Unit (7U):• 2 HP 1U servers

• (16 cores/ea. Total: 32)• JBOD 5U

• 1 TB drives• User data capacity: 75 TB

Scale Unit (7U):• 2 HP 1U servers



¼

Rack

15

.1TB

(R

aw

)1

/2 R

ack

30

.2TB

(Raw

)







Full R

ack

60

TB

(Raw

)


Failover Node

JBOD 5


JBOD 6


JBOD 7


JBOD 8


ReservedSpace

Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)

Extension Base Unit (7U):• 2 HP 1U servers





• 1 TB drives• User data capacity: 75TB

1¼

R

ack

75

.5TB

(R

aw

)







3 R

ack

18

1.2

TB

(Raw

)

1 1

/2 R

ack

90

.6TB

(Raw

)2

Rack

12

0.8

TB

(Raw

)


Failover Node

JBOD 9


JBOD 10


JBOD 11


JBOD 12


ReservedSpace

Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)

Extension Base Unit (7U):• 2 HP 1U servers




• (16 cores/Ea. Total: 32)• JBOD 5U





• (16 cores/Ea. Total: 32)• JBOD 5U





HP configuration

1/3

Rack

22

.6TB

(R

aw

)

2/3

Rack

45

.3TB

(Raw

)Fu

ll Rack

67

.9TB

(Raw

)

Base Unit (6U):• Redundant Infiniband• Redundant Ethernet• Management and Control (Active)• Rack Failover Node (Passive)

Base Unit (10U):• 3 servers in 2U enclosure

• (16 cores/ea. Total: 48)• 2 JBOD 4U ea.


Scale Unit (10U):• 3 servers in 2U enclosure



Reserved Space (6U)• Passive Unit (adds Failover Node)• Future expansion


Control NodeFailover Node

JBOD 2


JBOD 1

Compute Node 1

JBOD 5


JBOD 6

Compute Node 7

Scale Unit (10U):• 3 servers in 2U enclosure



ReservedUse

JBOD 3


JBOD 4

Compute Node 4

JBOD 1


JBOD 2

Compute Node 1

Dell/Quanta configuration

Factoring in Compression: HP# Compute Nodes (1TB Drives)

2 4 6 8 10 12 16 20 24 32 40 48 56

Compression Ratio

1 15.1 30.2 45.3 60.4 75.5 90.6 121 151 181 242 302 362 4232 30.2 60.4 90.6 121 151 181 242 302 362 483 604 725 8463 45.3 90.6 136 181 227 272 362 453 544 725 906 1087 12684 60.4 121 181 242 302 362 483 604 725 966 1208 1450 16915 75.5 151 227 302 378 453 604 755 906 1208 1510 1812 21146 90.6 181 272 362 453 544 725 906 1087 1450 1812 2174 25377 106 211 317 423 529 634 846 1057 1268 1691 2114 2537 29608 121 242 362 483 604 725 966 1208 1450 1933 2416 2899 33829 136 272 408 544 680 815 1087 1359 1631 2174 2718 3262 3805

10 151 302 453 604 755 906 1208 1510 1812 2416 3020 3624 422811 166 332 498 664 831 997 1329 1661 1993 2658 3322 3986 465112 181 362 544 725 906 1087 1450 1812 2174 2899 3624 4349 507413 196 393 589 785 982 1178 1570 1963 2356 3141 3926 4711 549614 211 423 634 846 1057 1268 1691 2114 2537 3382 4228 5074 591915 227 453 680 906 1133 1359 1812 2265 2718 3624 4530 5436 6342

Compression: DELL & Quanta# Compute Nodes (1TB Drives)

3 6 9 12 15 18 21 24 27 36 45 54

Compressio

n Ratio

1 23 45 68 91 113 136 159 181 204 272 340 4082 45 91 136 181 227 272 317 362 408 544 680 8153 68 136 204 272 340 408 476 544 612 815 1019 12234 91 181 272 362 453 544 634 725 815 1087 1359 16315 113 227 340 453 566 680 793 906 1019 1359 1699 20396 136 272 408 544 680 815 951 1087 1223 1631 2039 24467 159 317 476 634 793 951 1110 1268 1427 1903 2378 28548 181 362 544 725 906 1087 1268 1450 1631 2174 2718 32629 204 408 612 815 1019 1223 1427 1631 1835 2446 3058 3669

10 227 453 680 906 1133 1359 1586 1812 2039 2718 3398 407711 249 498 747 997 1246 1495 1744 1993 2242 2990 3737 448512 272 544 815 1087 1359 1631 1903 2174 2446 3262 4077 489213 294 589 883 1178 1472 1767 2061 2356 2650 3533 4417 530014 317 634 951 1268 1586 1903 2220 2537 2854 3805 4757 570815 340 680 1019 1359 1699 2039 2378 2718 3058 4077 5096 6116

Compression vs. Scan Performance# Seconds to Scan a 1 TB table at 200 MB/second

# Compute Nodes 2 3 4 6 8 9

# Distributions 16 24 32 48 64 72

Compression

Ratio

1 327.68 218.45 163.84 109.23 81.92 72.822 163.84 109.23 81.92 54.61 40.96 36.413 109.23 72.82 54.61 36.41 27.31 24.274 81.92 54.61 40.96 27.31 20.48 18.205 65.54 43.69 32.77 21.85 16.38 14.566 54.61 36.41 27.31 18.20 13.65 12.147 46.81 31.21 23.41 15.60 11.70 10.408 40.96 27.31 20.48 13.65 10.24 9.109 36.41 24.27 18.20 12.14 9.10 8.09

10 32.77 21.85 16.38 10.92 8.19 7.2811 29.79 19.86 14.89 9.93 7.45 6.6212 27.31 18.20 13.65 9.10 6.83 6.0713 25.21 16.80 12.60 8.40 6.30 5.6014 23.41 15.60 11.70 7.80 5.85 5.2015 21.85 14.56 10.92 7.28 5.46 4.85

Demo – Overview of the APS appliance

Matt Goswell

Appliance Domains/RegionsGeneral details

• Physical hosts • Virtual machines required to maintain the appliance

infrastructure and workload virtual machine configurations

• Windows Storage Spaces handles mirroring and spares

PDW workload detailsSQL Server 2014 Enterprise Edition (PDW build)

Control node and compute nodes for PDW workload

Storage details2 Files on 2 LUNs per Filegroup, 8 Filegroupseach LUN configured as RAID 1Takes advantage of large number of spindles in parallel

HDI workload detailsWindows HDI (100% Apache Hadoop) based on Hortonworks HDP Head-/Security-/Management- node and Data nodes

Fabric Domain

PDW Workload Region

HDI Workload Region

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

Fabric Virtual MachinesGeneral details

• Manages access between physical hosts• Required to support the clusters• Holds Appliance Active Directory and DNS• VMs are stored on local disks, not on CSVs and to have

redundancy we have two AD VMs on two physical machines

General details (new in v2 AU3)• Used to deploy Windows operating systems over the

appliance network • VM is stored on CSVs

General details• Manages the configuration/image of virtual machines

within the appliance• VM is stored on CSVs

Fabric Active Directory (AD01 and AD02)

SC Virtual Machine Manager (VMM)

Windows Deployment Services (WDS)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

PDW Workload Virtual Machines – HST01

General details• Client connections always go through the control

node• Contains no persistent user data• Contains metadata

• System metadata• User shell databases

• Parallel Data Warehouse advantages:• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution

• Local SQL Server processes final query plan and aggregates results

• Runs the “Data Movement Service” (DMS)• Manages connectivity to Compute Nodes• Manages Query execution• Runs the “Azure Data Management Gateway” (since

v2 AU3)- Enables Query from Cloud to On Prem through APS

• VM is stored on CSVs

Control Node (CTL)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

PDW Workload Virtual Machines – HSAxx

General details• Each MPP node is a highly tuned symmetric multi-

processing (SMP/NUMA) node with standard interfaces

• Provides dedicated hardware, database, and storage

• Runs SQL Server 2014 Enterprise Edition (PDW Build)

• Runs the Data Movement Service (DMS)• VMs are stored on CSVs

General details• Storage managed by Windows Server 2012 R2

Storage Spaces• Cross connected via dual 4x 6 GB/sec SAS

connections

Compute Nodes

Just a Bunch Of Disks (JBOD)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

HDI Workload Virtual Machines – HST03

General details• Client connections always go through the Head Node• Ambari Agent, Namenode (1,2,3), Jobtracker, History

Server, HiveServer (1,2,3), HiveMetastore, OoziService, WebcatServer…

• IIS• Developer Dash Board• Secure Gateway

• IIS, Ambari Agent• SQL Server

• Ambari Agent • Datanode• TaskTracker

Head Node (HHN01)

Secure Gateway Node (HSN01)

Management Node (HMN01)

Data Node (HDN001 - HDNxxx)

Base Unit

IB andEthernet

HST04

JBOD

Direct attached SAS

HSA01Compute 1

HSA02Compute 2

HST02

JBOD

HST01

HSA03Data 1 Data 2

HSA04Data 3 Data 4

HST03HMN0

1HHN01

HSN01

CTLWDSAD01 VMM

AD02

HSA02HSA02

HST01HST02

HST01

Failover FunctionalityFailover Cluster Manager starts virtual machine on a new host after failure

Cluster Shared VolumesEnables all nodes to access the LUNs on the JBOD as long as at least one of the hosts attached to the JBOD is activeUses SMB3 protocol

Failover detailsOne cluster across the whole applianceVirtual machine images are automatically started on new host in the event of failoverRules enforced by affinity and anti-affinity mapsFailback continues to be through CSS

Uses Windows Failover Cluster Manager

Adding Passive Unit increases HA capacityEnables another virtual machine to fail without disabling the applianceAll hosts connected to a single JBOD cannot failover

Details

HSA01

JBODIB &Ethernet

Direct attached SAS

Base UnitCTLWDSAD01 VMM

Compute 1

Compute 2

HST03 Passive Unit

HST02

Base UnitCTL

Compute 2CTL

Sample: PDW Region (Base Unit - HP)

WDSAD01 VMMCompute 2AD02

AD02

HSA02HSA02

Replace NodeSingle type of node

Sole differentiator—storage attached vs. storage unattachedExecution commonality regardless if host is being replaced

Workloads migrate with virtual machinesReplace Node follows a subset of the bare metal provisioning

using WDS and executes the APS Setup.exe with the replace node action specified, along with the necessary information targeting the replacement node

Workload virtual machines do not have to be re-provisioned

Workload virtual machines are failed back using Windows Failover Cluster Manager

Failback still incurs small downtimeMay be a small performance impact by failed over compute nodes; documentation will suggest fail-backCurrently not using Live Migration for failover and failback

Details

HSA01

JBODIB andEthernet

Direct attached SAS

Compute 1

Compute 2

HST02

HST01 Base UnitCTLWDSAD01 VMM

Compute 2


AD02

HSA04

HSA03

JBOD

HSA03Compute 3

HSA04Compute 4Compute 4

Compute 3

Add Unit: Scaling from 2 to 56 Nodes

Addition to the appliance is in the form of one or more scale units

IHV owns installation and cabling of new scale units

Software provisioning consists of three phasesBare metal provisioning of new nodes (online since AU1)Provisioning of workload virtual machines (online since AU1)Redistribution of data (offline)

CSS assistance (may have to help prepare user data)Tools to validate environment/data transitionDevelop strategy for successful addition

Deleting old dataPartition switching from largest tablesCRTAS to move data off appliance temporarily

Details

HST02

HST01 Base UnitCTLWDSAD01 VMM

JBOD

IB andEthernet

Direct attached SAS


AD02

HSA02

HSA01

JBOD

Compute 1

Compute 2

PDW Region must have enough free space to re-distribute the largest table.

Supported ExtensionsAll actions performed by CSS

Add Unit

Data Scale Unit

Add Region

Hadoop Region only

Replace Node

• Hardware failure

Replace VM

• VM corruption

Project PolyBase• Background• Research done by Gray System Lab lead by Technical Fellow David

DeWitt

• High-level goals for V2 • Seamless Integration with Hadoop via regular T-SQL • Enhancing PDW query engine to process data coming from the Hadoop

Distributed File System (HDFS) • Fully parallelized query processing for highly performing data import

and export from HDFS • Integration with various Hadoop implementations• Hadoop on Windows Server, Hortonworks, and Cloudera

Project PolyBase• Both distributed systems

• Parallel data access between PDW and Hadoop

• Different goals and internal architecture

• Combined power of Big Data integration

Control Node

Compute Node

Compute Node

Compute Node

Compute Node

Name Node

Data Node

Data Node

Data Node

Data Node

PDW

Project PolyBase• Direct parallel data access between PDW Compute Nodes and Hadoop Data Nodes

• Support of all HDFS file formats

• Introducing “structure” on the “unstructured” data

PDWHadoop

Query

1

2

HDFS block

s

Results

3

HDFS DB

SQL in, results out

PDWHadoop

Query

1

2

HDFS block

sResults

HDFS DB

SQL in, results stored in HDFS

Project PolyBase next steps• Cost-based decision on how much data needs to be pushed to PDW

• SQL operations on HDFS data pushed into Hadoop as MapReduce jobs

HDFS

PDWHadoop

Results

7

2Map job

5

HDFS

blocks DB

3 4 6

SQL

1

MapReduce

External tableIntroducing structure to semi\unstructured data

• Representation of data residing in Hadoop/HDFS

• Introducing new T-SQL Syntax

• Syntax very similar as if it were a regular table

CREATE External Table Syntax--Create a new external table in SQL Server PDWCREATE EXTERNAL TABLE [ database_name . [ dbo ] . | dbo. ] table_name ( <column_definition> [ ,...n ] ) WITH ( LOCATION = 'hdfs_folder_or_filepath', DATA_SOURCE = external_data_source_name, FILE_FORMAT = external_file_format_name [ , <reject_options> [ ,...n ] ] ) [;] <reject_options> ::={ | REJECT_TYPE = value | percentage | REJECT_VALUE = reject_value | REJECT_SAMPLE_VALUE = reject_sample_value}

Indicates external

table

1.

Required location of Hadoop file2.

File Format options associated with data import from HDFS

(for example, arbitrary field delimiters and reject-related

thresholds)

4.

Required Data Source Definition of Hadoop

Cluster3.

External table - Sample--STEP 1: Create an external data source for Hadoop-- DROP EXTERNAL Data Source FXR_TEST_DSRC;CREATE EXTERNAL DATA SOURCE FXR_TEST_DSRC WITH ( TYPE = HADOOP , LOCATION = 'hdfs://192.168.210.145:8020' , JOB_TRACKER_LOCATION = '192.168.210.145:50300'

---- default: 8021=Cloudera; 50300=HDInsight );

--STEP 2: Create an external file format for a Hadoop text-delimited file.--DROP EXTERNAL FILE FORMAT FXR_Test_Format;CREATE EXTERNAL FILE FORMAT FXR_Test_FormatWITH ( FORMAT_TYPE = DELIMITEDTEXT , FORMAT_OPTIONS

( FIELD_TERMINATOR = N';' , USE_TYPE_DEFAULT = TRUE , STRING_DELIMITER = '‘

) );

External table - Sample (cont.)--STEP 3: Create a new external table in SQL Server PDWdrop external table Test;gocreate external table Test(name nvarchar(17), startzeitpunkt nvarchar(35),endzeitpunkt varchar(35), flms_system_realtime nvarchar(19), dummy nvarchar(19) NULL, Counter1DTonDur nvarchar(19), Counter1DMileage nvarchar(19), dummy2 nvarchar(2) NULL)WITH(LOCATION = '/user/fxr47511/pdwtest' , DATA_SOURCE = FXR_TEST_DSRC , FILE_FORMAT = FXR_Test_Format , REJECT_TYPE = value , REJECT_VALUE = 1000);

External tableConsiderations• Data can be changed or removed at any time on Hadoop side

• PDW V2 will not guarantee any form of concurrency control or isolation level

• Same query may return different results—data gets changed on Hadoop/HDFS side between two query runs

• Query may fail if data gets removed or relocated

• Location of the data residing on an external cluster gets validated every time a user selects from it

Demo – Integrating data from PDW and HDIMatt Goswell

HDFS Bridge• Direct HDFS access

• Functional part of Data Movement Service

• Hides HDFS complexity

• HDFS file types supported by use of appropriate RecordReader interface

DMSSQL Server DMS SQL Server

HDFS Hadoop Cluster

HDFSHDFS

HDFS Bridge

HDFS Bridge

PDW Node

PDW Node

HDFS Bridge – Data transformation

DMS Ser erPDW Engine

Load Manage

r

DMSManage

r DMS

SQLServe

rDMS

Converter Sender

Receiver Writer

Query against external table executed in PDW

HDFS Bridge reads data blocks by using Hadoop RecordReaders interface

Each row is converted for bulk insert and hashed based on the distribution column

Hashed row is sent to appropriate node receiver for loading

HDFS Hadoop Cluster

HDFSHDFS

HDFS Bridge

DMS

Converter Sender

Receiver Writer

HDFS Bridge

Row is bulk inserted into destination table

Control Node Compute

Nodes

APS Appliance

Setup and monitoring• Java runtime libraries are included with the APS v2 (AU1+)

software and are installed automatically when you install a HADOOP REGION in the APS Appliance

• Security• Static user account created on Hadoop

• Hadoop connectivity must be enabled in PDW• exec sp_configure 'hadoop connectivity', 1

• List of external tables• Shown in SSDT Object explorer• sys.pdw_external_tables

Enabling Hadoop connectivityexec sp_configure 'hadoop connectivity', 1

Parameter

Enables support for…

0 • Disable Hadoop connectivity

1 • HortonWorks (HDP 1.3) for Microsoft Server• HDInsight on Analytics Platform System (version AU1) (HDP

1.3)• Azure blob storage on Microsoft Azure (WASB[S]) (AU1)

2 • HortonWorks (HDP 1.3) for Linux

3 • Cloudera CDH 4.3 for Linux

4 • HortonWorks (HDP 2.0) for Windows Server• HDInsight on Analytics Platform System (AU2) (HDP 2.0)• Azure blob storage on Microsoft Azure (WASB[S]) (AU2)

5 • HortonWorks (HDP 2.0) for Linux

Columnstore overview

• Clustered columnstore index is comprised of two parts:• Columnstore• Deltastore

• Data is compressed into segments• Ideally ~1 million rows (subject to

system resource availability)

• A collection of segments representing a set of entire rows is called a row group

• The minimum unit of I/O between disk and memory is a segment (red block is a single segment)

• Execution in batch mode (as opposed to traditional row mode) moves multiple rows between iterators:

~ 1000 Rows

• Dictionaries (primary and secondary) are used to store additional metadata about segments

Terminology

C1 C2 C3 C5 C6C4

Row group Segments

C1 C2 C3 C5 C6C4

…

Delta (row) store

Columnstore


• Columnstore indexes are now clustered• Only a single index exists for a table:

• Clustered index + non-clustered columnstore is not supported• Clustered columnstore + non-clustered row store index is not supported

• Full DML (Insert, Update, Delete, Select) is supported directly on columnstore• A previous workaround involved maintaining a separate secondary row store table with UNION

ALL

• All PDW data types are supported in columnstore indexes:• Decimal with precision greater than 18 was not supported in SQL Server 2012• Binary/varbinary was not supported in SQL Server 2012• Datetimeoffset with scale greater than 2 was not supported in SQL Server 2012

• Query processing• Batch mode hash join spill implemented (previously this would revert to row mode).• Aggregations without GROUP BY supported

• Some limitations still apply• Avoid string data types for filtering or join conditions• Some SQL clauses, for example: ROW_NUMBER() / RANK() etc. OVER (PARTITION BY … ORDER

BY …)

Enhancements beyond SQL Server 2012


• SQL 2012 implemented batch mode processing to handle rows a batch-at-a-time in addition to a row-at-a-time• SQL 2008 and before only had row processing

• Typically batches of about 1,000 rows are moved between iterators• Significantly less CPU is required due to the

average number of instructions per row decreasing

• Batch mode processing:• Hash Join/Aggregate are supported• Merge Join, Nested Loop Join, and Stream

Aggregate are not supported

SELECT COUNT(*) FROM FactInternetSales_Column 352 msSELECT COUNT(*) FROM FactInternetSales_Row 6704 ms

More than compression – batch mode

Batch mode scan example

Row mode scan example

Columnstore index design

• INSERTED a single record into a table with clustered columnstore index

• Screenshots taken from DMV sys.pdw_nodes_column_store_row_groups (subset of total rows returned)

• Before row inserted:

• After single row inserted (deltastore has been created and single row represented)

• After REBUILD

• REORGANIZE only moves “Closed” delta store segments into a “Compressed” status• REBUILD affects entire index (or entire partition index)

DMV Show Data Change

Segment row count increased by 1

Columnstore index design

• The state_description field has three states: COMPRESSED, OPEN, and CLOSED• COMPRESSED represents a row group that is stored in columnstore format• OPEN represents a deltastore that is accepting new rows• CLOSED represents a full deltastore ready for REORGANIZE

• When inserting 102,400 rows or more in a single batch into a columnstore index distribution, the data will compress automatically

• When inserting 102,399 or less rows in a single batch, the data will be stored in the delta store

• The actual maximum number of rows per delta store is 1,048,576 at which point it is CLOSED• This is also the ideal segment size that SQL Server will try to create when first building a

columnstore index from a table• When Index Build encounters memory pressure, DOP is reduced first and then segment size is

reduced

• Only the REBUILD statement can compress a delta store that is not in the CLOSED state• Neither REORGANIZE or the Tuple Mover process will have any effect on an OPEN delta store

Columnstore data movement

Demo – Exploring some of the APS management views and metadataMatt Goswell

Get Started Today!Sign up for a free architectural design session with your Microsoft representative

Learn about the Microsoft Analytics Platform System at www.microsoft.com/aps

Try HDInsight at www.microsoft.com/bigdata

Try SQL Server for data warehousing in Windows Azure VMs at www.windowsazure.com

Try SQL Server 2014 at www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx

DBI-B310 Microsoft Analytics Platform System Overview

Related content

DBI-B337 Polybase in the Modern Data Warehouse

We’re going to be at Ask the Experts

27 Hands on Labs + 8 Instructor Led Labs in Hall 7

DBI Track resources

Free SQL Server 2014 Technical Overview e-book

microsoft.com/sqlserver and Amazon Kindle StoreFree online training at Microsoft Virtual Academy

microsoftvirtualacademy.com Try new Azure data services previews!Azure Machine Learning, DocumentDB, and Stream Analytics

http://www.microsoft.com/sqlserver

http://www.amazon.com/Introducing-Microsoft-SQL-Server-2014-ebook/dp/B00JYE19U8/ref=sr_1_1?s=digital-text&ie=UTF8&qid=1414405512&sr=1-1&keywords=sql+server+2014

http://www.microsoftvirtualacademy.com/

http://www.microsoftvirtualacademy.com/

http://azure.microsoft.com/en-us/trial/get-started-machine-learning/

Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

Developer Network

http://developer.microsoft.com

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

http://www.microsoft.com/learning

http://microsoft.com/msdn

http://microsoft.com/msdn

http://microsoft.com/technet



Please Complete An Evaluation FormYour input is important!TechEd Schedule Builder CommNet station or PC

TechEd Mobile appPhone or Tablet

QR code

Evaluate this session

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

When performance mattersA financial customer required a powerful analytics platform…to improve performance, deliver an enhanced level of customer service, handle terabytes of data and an unprecedented level of query complexity.

The right solutionMicrosoft Analytics Platform System

Complex Query

SQL Server 2008R2

[4 hours]

APS[15

seconds]

£600 £0.625

960x FASTER

PDW Configuration ManagerAppliance Topology:

• Password Reset• Time Zone• Network

Parallel Data Warehouse Topology:

• Certificate• Firewall• PDW Service Status• Instant File

Initialization• Restore Master

Database

HDInsight Topology:

• Certificate• Firewall• HDI Service Status• User Management

Admin Console

DashboardQueries ActivityLoad ActivityBackup and Restore

Active LocksActive SessionsAlertsAppliance State

https://controlnodeipaddress

Admin Console

HDFSHealthMAP/ReduceStoragePerformance Monitor

For each HDI Node:• OS• Data

Performance …figures.Actual performance figures for data export from PDW

Method Performance NotesDWSQL 10 Gb\Hr Single thread

SQLCMD 10 Gb\Hr Single thread

PDE \ CRTAS 500 Gb\Hr Spoke performance

BCP 56 Gb\Hr Single thread

Polybase 280 Gb\Hr3 VM data nodes, 8 compute

nodesSSIS 46 Gb\Hr Per stream

PDW= HP Full Rack V2 Appliance (8 nodes).Spoke = HP DL980, 4 proc (32 core), 1TB RAMData Nodes = VM’s built within the spoke DL980 (2 cores, 64GB RAM)

Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData...

Documents

Transcript of Self-serviceCollaborationCorporatePredictiveMobile Extract, transform, loadSingle query modelData...