Matt Heusser - Keynote - Cool New Things... and some old ones too
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert Heusser
-
Upload
amd-developer-central -
Category
Technology
-
view
510 -
download
0
description
Transcript of IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert Heusser
REAL-‐TIME INSIGHT IN BIG DATA EVEN FASTER USING HSA
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 2
AGENDA
WHAT ARE BIG DATA AND PARSTREAM
TECHNICAL ARCHITECTURE
HSA USAGE
What are Big Data and ParStream
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 4
What is Big Data? COMMON SENSE FROM WIKIPEDIA
“Big data is a collecRon of data sets so large and complex that it becomes difficult to process using on-‐hand database management tools or tradiBonal data processing applicaRons. The challenges include capture, curaRon, storage, search, sharing, analysis and visualizaRon.”
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 5
WHAT BIG DATA IS NOT
Big Data is NOT Storage of large datasets
A COMMON MISTAKE
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 6
REAL-TIME IN BIG DATA IS A TWO-DIMENSIONAL PROBLEM
Sub-second response times
Continuous extremely fast data load and availability
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 7
ANALYTICS LANDSCAPE BIG DATA ANALYTICS REQUIRES NEW TECHNOLOGICAL SOLUTIONS
Real-‐Time
Lag Time
OperaBonal Data
Massively parallel (MPP) Real-‐Time
Map Reduce Batches (NoSQL)
OLTP ReporBng
In-‐Memory DB
Complex Event Processing
Gigabyte Terabyte Petabyte
< 1..10 milli sec
10 sec
10 min
10..100 milli sec
1 sec
1 min OLAP
Big Data Response Rme
Batch-‐AnalyBcs
Real-‐Time AnalyBcs Stream-‐AnalyBcs
OperaBons AnalyBcs
1h
● ParStream
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 8
PARSTREAM IS A UNIQUE PRODUCT
! Analyze and Filter Billions of Records ! Query Data Structures with 1000’s of columns
! Get Answers in Milliseconds without Cubes
! Get Answers in Milliseconds without Cubes
! Execute 1000’s of Concurrent Queries
PARSTREAM EMPOWERS CUSTOMERS TO REALIZE NEW BUSINESS OPPORTUNITIES EVOLVING WITH BIG DATA
High Performance Index
Column Store
In-‐Memory Technology
High-‐Speed Import
Scalability
Scalability Clustering Clustering Real-‐Rme Queries
Technical Architecture
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 10
ARCHITECTURE BUILDING BLOCKS
! Columnar Storage
! In Memory Technology
! Shared Nothing Architecture ! Standard Interfaces ! User Defined FuncRons ! Unique High Performance
Compressed Index
PARSTREAM IS THE BIG DATA ANALYTICS PLATFORM BASED ON A UNIQUE HIGH PERFORMANCE COMPRESSED INDEX
SQL/JDBC/ODBC C++ UDF API
Real-‐Time AnalyRcs Engine
Compressed Index
MPP
In-‐Memory & Disc Technology
ParRRoning
Shared Nothing
Fast Columnar Storage
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 11
PARALLEL ARCHITECTURE
! STANDARD DW ARCHITECTURE ‒ Long Query RunRme ‒ Frequent Full Table Scans ‒ Data is at Least 1 Day Old
! PARSTREAM ARCHITECTURE ‒ Each Query Uses MulRple Processor Cores ‒ Query execuRon using compressed indices ‒ ConRnuous Import Assures Timeliness of Data
PARSTREAM OVERCOMES LIMITATIONS OF TRADITIONAL DW ARCHITECTURES
Nightly Batch -‐ Import
Query
Query
Parallel Import
HPCI
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 12
TRADITIONAL DATABASE QUERY EXECUTION STATIC QUERY EXECUTION
SQL-‐Statement
Parser
Parsed-‐Statement ExecuRonPlan
OpRmizer/Planner Executor
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 13
MODULAR EXECUTION TREE
! Parsed query descripRons are transformed into execuRon trees
! OpRmizer distributes execuRon operaRons to available hardware
! Data-‐locality and current load are used for allocaRon
! During query execuRon opRmizer can re-‐allocate if beneficial
! OpRmizer conRnuously refines allocaRon based on past queries
! Flow based execuRon control ! Each ExecNode processes blocks of data ! Data transfer between nodes using queues
ATOMIC OPERATIONS COMBINED USING QUEUES
ExecuBon Tree
aggregate
sort
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
HSA Usage
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 15
ExecuBon Tree
aggregate
sort
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
aggregaRon
fetch
filter
calc
ARCHITECUTRE ALLOWS USAGE OF DIFFERENT PROCESSING UNITS
! Each atomic operaRon may be processed using any available compute resource
! Dynamic workload assignment during query execuRon
! Overall workload management ensures opRmal resource usage
ANY PART OF THE QUERY MAY BE EXECUTED INDIVIDUALLY
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 16
aggregaRon
fetch
filter
calc
PROBLEMS USING TRADITIONAL GPU COMPUTE UNITS
! Target scenario Real-‐Time BIG DATA ‒ Processing huge amounts of data ‒ Dynamically changing of data ‒ InteracRve response Rme
! Part of the data fixed in GPU memory ‒ Input data transferred once via PCI during loading ‒ Transfer of result via PCI during execuRon
! Data resident in main memory ‒ Offload of computaRonal task to GPU ‒ Transfer in and out via PCI during execuRon
! Global data needs to be transferred to GPU too ! Global data needs to be synchronized ! Latency based on blockwise processing ! Different programming models
THE TRANSFER AND COMMUNICATION PROBLEM
fetch
calc
aggregaRon
filter
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 17
HSA SOLVES ALL OUR PROBLEMS
! No Data transfer required ! Shared page table support ! Coherent memory regions
! User-‐level command queueing
! Hardware scheduling ! Bold allows uniform programming model
| REAL-‐TIME INSIGHT IN BIG DATA| November 19, 2013 | CONFIDENTIAL 18
DISCLAIMER & ATTRIBUTION
The informaRon presented in this document is for informaRonal purposes only and may contain technical inaccuracies, omissions and typographical errors.
The informaRon contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, soqware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligaRon to update or otherwise correct or revise this informaRon. However, AMD reserves the right to revise this informaRon and to make changes from Rme to Rme to the content hereof without obligaRon of AMD to noRfy any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinaRons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdicRons. SPEC is a registered trademark of the Standard Performance EvaluaRon CorporaRon (SPEC). Other names are for informaRonal purposes only and may be trademarks of their respecRve owners.