High Performance Distributed Computing

14
Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing

description

High Performance Distributed Computing. Henri Bal Vrije Universiteit Amsterdam. Outline. 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions. Developments. Multiple types of data explosions : - PowerPoint PPT Presentation

Transcript of High Performance Distributed Computing

Page 1: High Performance Distributed  Computing

Henri BalVrije Universiteit Amsterdam

High Performance Distributed Computing

Page 2: High Performance Distributed  Computing

Outline

1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle

4. Conclusions

Page 3: High Performance Distributed  Computing

Developments

• Multiple types of data explosions:– Big data: huge processing/transportation

demands– Complex heterogeneous data

LOFAR: ~15 PB/yearSKA: >300 PB/year, exascale processing

Complex data

Page 4: High Performance Distributed  Computing

Developments

• Infrastructure explosion– High complexity: heterogeneous systems with

diversity of processors, systems, networks

Page 5: High Performance Distributed  Computing

VU HPDC GROUP• Bridge the gap between demanding

applications and complex infrastructure• Distributed programming systems for

– Clusters, grids, clouds– Accelerators (GPUs) – Heterogeneous systems (``Jungles”) – Clouds & mobile devices

• Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….

Page 6: High Performance Distributed  Computing

Highlights VU-HPDC group

1st Prize: SCALE 2008

AAAI-VC 2007 DACH 2008 - BS DACH 2008 - FT

3rd Prize: ISWC 2008 1st Prize: SCALE 2010 EYR 2011Sustainability award

Solved Awari 2002

Page 7: High Performance Distributed  Computing

Links to data science cycle

Understand and decide

Analyze and model

Store and process

ReasoningKnowledge representati

on

MultimediaRetrieval

Modeling and

simulationMachine Learning

Information Retrieval

Decision Theory

PerceptionCognition

VisualAnalytics

DistributedProcessing

Large Scale Databases

SoftwareEng.

System / Network

Eng.

Distributed reasoning

Jungle computingMapReduce

Page 8: High Performance Distributed  Computing

Reasoning – Semantic Web

• Make the Web smarter by injecting meaning so that machines can “understand” it.o initial idea by Tim Berners-Lee in 2001

• Now attracted the interest of big IT companies

Page 9: High Performance Distributed  Computing

Google Example

Page 10: High Performance Distributed  Computing

Google Example

Page 11: High Performance Distributed  Computing

Distributed Reasoning• WebPIE: web-scale distributed reasoner

doing full materialization• QueryPIE: distributed reasoning with

backward-chaining + pre-materialization of schema-triples

• DynamiTE: maintains materialization after updates (additions & removals)

Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data

With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen

COMMIT/

Page 12: High Performance Distributed  Computing

Glasswing: MapReduceon Accelerators

• Use accelerators as a mainstream feature• Massive out-of-core data sets• Scale vertically & horizontally• Code portability using OpenCL• Maintain MapReduce abstraction

With: Ismail El Helw, Rutger Hofman

Page 13: High Performance Distributed  Computing

Glasswing Pipeline

• Overlaps computation, communication & disk access

• Supports multiple buffering levels

Page 14: High Performance Distributed  Computing

Evaluation of Glasswing

• Glasswing uses CPU, memory & disk resources more efficiently than Hadoop

• Compute-bound applications benefit dramatically from GPUs

• Better scalability than Hadoop• Runs on a variety of accelerators• E.g. k-means clustering:

– 8.5× (1 node) vs.15.5 × (64 nodes) vs. 107 × (GPU node)