U-SQL Query Execution and Performance Basics (SQLBits 2016)
-
Upload
michael-rys -
Category
Data & Analytics
-
view
336 -
download
2
Transcript of U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael RysPrincipal Program Manager, Big Data @ Microsoft@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Query Execution and Performance Basics
Simplified U-SQL Job Workflow
Job Front End
Job Scheduler Compiler ServiceJob Queue
Job Manager
U-SQL CatalogYARN
Job submission
Job execution
U-SQL Runtime Vertex execution
U-SQL Compilation Process
C#
C++
AlgebraOther files
(system files, deployed resources)
managed dllUnmanaged
dll
Compilation output (in job folder)
Compiler & Optimizer
U-SQL Metadata Service
Deployed to Vertices
Job Status in Visual Studio
Preparing
Queued
Running
Finalizing
Ended(Succeeded, Failed, Cancelled)
NewCompiling
QueuedScheduling
Starting
Running
Ended
UX Job State
The script is being compiled by the Compiler Service
All jobs enter the queue.
Are there enough ADLAUs to start the job?
If yes, then allocate those ADLAUs for the job
The U-SQL runtime is now executing the code on 1 or more ADLAUs or finalizing the outputs
The job has concluded.
The Job Queue
The queue is ordered by job priority.
Lower numbers -> higher priority.
1 = highest.
Running jobs
When a job is at the top of the queue, it will
start running.
Defaults: Max Running Jobs = 3Max Tokens per job = 20Max Queue Size = 200
Priority Doesn’t Preempt Running Jobs
X has Pri=1.
XA
B
C
X will NOT preempt running jobs. X will have to wait.
These are all running and have very low priority
(pri=1000)
Resources
Blue items: the output of the compiler
Grey items: U-SQL runtime bits
Download all the resources
Download a specific resource
The Job FolderInside the Default ADL Store:
/system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID
/system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb-49af9a784f64
Query ExecutionPlans, Vertices, Stages, Parallelism,
ADLAUs
Job Scheduler & Queue
Fron
t-End
Ser
vice
13
Optimizer
Vertex Scheduling
Compiler
Runtime
Visual Studio
Portal / API
Query Life
Job Execution
Parallelism100 (ADLAUs)
Work composed of12K Vertices
U-SQL Script -> Job GraphLogical -> Physical Plan
Each square = “a vertex” represents a fraction of the total
Vertexes in each SuperVertex (aka “Stage) are doing the same operation on a different part of the same data. Visualized as a
“Job Graph”
ADLAUs AzureData LakeAnalyticsUnit
Parallelism N = N ADLAUs
1 ADLAU ~= A VM with 2 cores and 6 GB of memory
Execution with Requested ParallelismRequested Parallelism =
1(reserve enough to do 1
vertex at a time)
Requested Parallelism = 4
(reserve enough to do 4 vertices at a time)
NotesThe next stage can
start before the previous one has
finished
It may not be possible to use all
the reserved parallelism during
a Stage
NotesThe Job Resources are copied to each vertex
JOB RESOURCE
S
Stage Details252 Pieces of
work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
Super Vertex = Stage
Automatic Vertex retry
ORANGE: A vertex failed … but was
retried automatically
Overall Stage Completed
Successfully
Vertex Execution View
All the vertexes
Filter which vertexes to see
The Critical Path
Vertex Relationships
The vertex on the bottom depends on the output of the vertex in the top
Critical Path
The dependency chain of vertexes that kept the job running to the
very end.
EfficiencyCost vs Latency
𝐽𝑜𝑏𝐶𝑜𝑠𝑡=5𝑐+ (𝑚𝑖𝑛𝑢𝑡𝑒𝑠× 𝐴𝐷𝐿𝑈𝐴𝑠×𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛 )
Allocation
Allocating 10 ADLAUsfor a 10 minute job.
Cost = 10 min * 10 ADLAUs = 100 ADLAU minutes
Time
Blue line: Allocated
Over Allocation Consider using fewer ADLAUs
You are paying for the area under the blue line
You are only using the area under the red line
Time
Profile isn’t loaded
Profile is loaded now
Click Resource usage
Blue: Allocation
Red: Actual running
Dips down to 1 active vertex at these times
Smallest estimated time when given 2425 ADLAUs
1410 seconds= 23.5 minutes
Model with 100 ADLAUs
8709 seconds= 145.5 minutes
http://aka.ms/AzureDataLake