LINQ to HPC: Developing Big Data Applications on Windows HPC Server
-
Upload
saptak-sen -
Category
Software
-
view
338 -
download
3
Transcript of LINQ to HPC: Developing Big Data Applications on Windows HPC Server
LINQ to HPC: Developing “Big Data” Applications on Windows HPC ServerWSV205
Saptak SenSenior Product ManagerMicrosoft Technical Computing
Session Objectives and TakeawaysSession Objective(s):
Understand Microsoft solution for Big DataHow to use and develop LINQ to HPC applicationsDemo of LINQ to HPC/DSC on HPC, Microsoft’s solutions for unstructured Big Data
Key Takeaways:1. LINQ to HPC/DSC provide a highly productive stack for writing
big data applications.2. Demos
1. Data management with DSC2. Application development in LINQ to HPC3. Application management of LINQ to HPC applications
Characteristics of Big Data
Large Data Volume 100s of TBs to 10s of PBs
Large scale processing and analytics at unprecedented low cost (hardware and software)
New Economics Distributed Parallel Processing
Frameworks Easy to Scale on commodity hardware MapReduce-style programming models
New Technologies
Unstructured Weak relational schema Text, Images, Videos, Logs
Non-Traditional data Types
Sensors Devices Traditional applications Web Servers Public data
New Data Sources
How popular is my product? What is the best ad to serve? Is this a fraudulent transaction?
New Questions & New Insights
4
Example: Traditional e-commerce data flow
5
New exploratory e-commerce data flow
6
Introduction to LINQ to HPC
Developing Big Data applications for HPC Server
Example: find web pages from many log files
var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line);var user = from access in logentries where access.user.EndsWith(@"\sen") select access;var accesses = from access in user group access by access.page into pages select new UserPageCount(“sen", pages.Key, pages.Count());var htmAccesses = from access in accesses where access.page.EndsWith(".htm") orderby access.count descending select access;
LINQ query transformed into computation graph
Input
Compute
Compute and resort
Compute and resort
Output
2
1
3
4 5
LINQ to HPC Job Directed Acyclic Graph (DAG) of vertices
Processingvertices
Edges(files)
Inputs
Outputs
Executes DAGs by mapping vertices to Distributed Vertex Hosts
Processingvertices
Edges(files)
Inputs
Outputs
Free Compute Resources
HPC + LINQ to HPC Job Overview
Application that calls LINQ to
HPC APIs
HPC Head Node
DSC
Submit LINQ to HPC Job
1
1
The LINQ to HPC job also starts a set of parametric
sweep tasks across the rest of the nodes as DVH
2b
A LINQ to HPC job starts 1 basic task
assigning a node as the DGM
2a
2a
LINQ to HPC Vertices read and write files
3b
Graph Manager starts/stops Vertices
3a
HPC Compute Nodes
3a
3b2b
Graph Manager
Vertex Host
HPC + LINQ to HPC Job Overview
Vertices read and write files
3b
Graph Manager starts/stops Dryad Vertices
3a
HPC Compute Nodes
3a
3b
Graph Manager
Vertex Host
Vertices in logical computation graph
• Graph manager starts vertices on Vertex Hosts
• Preferentially schedules vertices near input files
When input is already on cluster, can make local IO the common case
More on HPC + LINQ to HPC mechanics
Application that calls LINQ to
HPC APIs
HPC Head Node
DSC
Publish to share:1. binaries for LINQ to HPC job2. XML description of LINQ to
HPC graph
1
1
DVH loads binaries for this LINQ to HPC job from share, executes them according
to commands from DGM
DGM reads XML description of graph from share, calls DSC to locate files referenced in
XML
2a
3b
3a
HPC Compute Nodes
3a
3b2b
LINQ to HPC Graph Manager
LINQ to HPC Vertex Host
The LINQ to HPC job also starts a set of parametric
sweep tasks across the rest of the nodes as DVH
2b
A LINQ to HPC job starts 1 basic task
assigning a node as the DGM
2a
Deployment Steps
DSC NODE ADD sen-cn1 /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:sen-hn
Demo adding a new Node
demo Using the HPC Management Tool
LINQ to HPC Object Model
Hello World!
using System;using System.Linq;using Microsoft.Hpc.Linq; namespace MyProgram { class Program { static void Main(string[] args) { var config = new HpcLinqConfiguration(“MyHpcClusterHeadNode”); var context = new HpcLinqContext(config); var lengths = context.FromDsc<LineRecord>("MyTextData") .Select(r => r.Line.Length); Console.WriteLine("The maximum line length is {0}", lengths.Max()); } }}
Analyzing data using LINQ to HPC
demo
Managing data and HPC cluster
HPC Server administration basics: Managing the job queueHow to identify the user that submitted jobsCanceling a runaway job
Data Storage Catalog specific tasks: Monitor disk usage tracked by DSC on each nodeView how the DSC file set maps to NTFS across nodesIdentify the nodes where files are replicated
Quick overview of the software components that made this possible.
HPC provisioning, management, etc.
MPI SOA LINQ to HPC runtime
Windows Server Azure*
Distributed runtimes
Cluster and cloud services
Platform
DSC (Distributed Storage Catalog)
Bind individual NTFS shares together to support the LINQ to
HPC distributed runtime
Programming models LINQ to HPC NEW
* Future support planned
How LINQ to HPC and Parallel Data Warehouse complement each other
Customer needs for Big Data lie on a spectrumOne extreme is analytics targeting a traditional data warehouse. The analyst knows the cube he or she wants to build, and the analyst knows the data sources.Another extreme is analyzing raw unstructured data. The analyst does not know exactly what the data contains, nor what cube would be justified. The analyst needs to do ad-hoc analyses that may never be run again.
HPC Server targets the raw unstructured data extreme.
Microsoft already has great data platform assets
PowerPivot, SQL Server Integration Services (SSIS), Parallel Data Warehouse (PDW), …HPC+LINQ to HPC’s focus on raw unstructured data analytics enables new solutions that incorporate multiple assets
E.g., analyze raw unstructured data using HPC+LINQ to HPC then pipe it to SSIS and apply rest of BI stack
Microsoft Big Data End-to-EndSensors
Devices
Apps
Bots
Crawlers
Data Marts
SSAS
ERP
CRM
LOB
HPC Server
SQL EDW
S S RS
Data & Compute Intensive HPC App
Interactive Reports
Performance Scorecard
PowerPivot
Embedded BI Apps
Hadoop
Integration Services
Integration Services
For more information
Download HPC Server 2008 R2 Evaluation Copy Today – microsoft.com/hpc
Download Service Pack 2 Beta - connect.microsoft.com
HPC Server Hands-on Labs – microsoft.com/hpc -> Technical Resources
Product Demo Station – in the Server and Cloud Section
HPC Server Certification Exam - microsoft.com/learning/en/us/exam.aspx?ID=70-690
Find Me Later At… twitter: @saptak
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
http://northamerica.msteched.com
Connect. Share. Discuss.
Complete an evaluation on CommNet and enter to win!
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.