High Performance Computing in Our Everydays

21
High Performance Computing in Our Everydays High Performance Computing in Our Everydays Peter Wittek Swedish School of Library and Information Science University of Bor˚ as 10/10/11

Transcript of High Performance Computing in Our Everydays

Page 1: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

High Performance Computing in OurEverydays

Peter Wittek

Swedish School of Library and Information ScienceUniversity of Boras

10/10/11

Page 2: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Outline

1 What Is New in HPC?

2 Supporting Frameworks

3 Computational Requirements of Digital Libraries

4 A Workflow in Cloud HPC

5 Experimental Results

6 Open Issues

7 Conclusions

Page 3: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

What Is New in HPC?

Cloud HPC

Cloud computing: think of it as a utilityE.g., you get to use 10 small computer instances for $0.82an hour

Your computer instances do not necessarily correspond toactual computers

VirtualizationDemo: ReactOS

Latest contestant in cloud computing: HPCNot ordinary computer instances

Page 4: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

What Is New in HPC?

Massive Parallelism

Figure: Floating-Point Operations per Second for the CPU and GPU

Page 5: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

What Is New in HPC?

Massive Parallelism

Cache

ALUControl

ALU

ALU

ALU

DRAM

CPU

DRAM

GPU

Streaming hardwareExplicit memory management

Page 6: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

What Is New in HPC?

Massive Parallelism

Parallel versus distributed computingDistributed nodes do not share the memory:

Connected through network;Calculations may run in a parallel fashion;Other nodes do not see what one node has computed;Nodes may fail.

Page 7: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

What Is New in HPC?

Why You Should Care

Digital libraries and HPC?No need for upfront investment;Go beyond full-text search;Machine learning;Pattern matching;Social media and graph mining;

You can define a new fieldFreedom

Page 8: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

Why Is Distributed Computing Hard?

Take an example: creating an inverted indexAn inverted index is at the core of search enginesA simple example:

term1: (doc1,freq11), (doc5,freq51)term2: (doc1,freq12), (doc3,freq32), (doc6,freq62)

Naıve approach to parallelize:Have an indexer at each node;Distribute documents to nodes;Let nodes broadcast the lists (Message Passing Interface –MPI).

Page 9: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

MapReduce

Published in 2004 by Google researchersSince then it has become widespread in data-intensiveprocessingCore idea: keep things simple, you can do two things:

Map: Send out chunks of data and then do something onthemReduce: Collect chunks of data and do something on themwhile collecting

Intermediate data structure: key-value pairsThe framework should also take care of the mundanetasks, such as failing nodes, network latency, etc.

Page 10: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

A MapReduce Inverted Indexer

The task is: formulate your problem in MapReduce termsMap: gets a chunk of text. Emits:

Key: termValue: document id and corresponding frequency

Reduce: Merges by keyThere might be a different number of map and reducetasks

Page 11: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

Another MapReduce Example

Sometimes it is worth bypassing the reduce phaseThen we do not need to emit key-value pairs at all

Distributed GPU random projection

Page 12: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

Exploiting GPU Resources

Low-level frameworks: CUDA and OpenCLThey certainly do not make GPUs much friendlierHigher-level libraries: BLAS, cuSPARSEAs long as you know maths. . .

Page 13: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Supporting Frameworks

Overcoming GPU Obstacles

GPU MapReduceAcademic projects: Mars, GPMR

GPU-aware MapReduce: extend existing frameworksDevelop extensive middleware

Page 14: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Computational Requirements of Digital Libraries

Digital Preservation

Future-proofing document collectionsEmulationMigration

Workflows are often tremendously compute-intensive

Page 15: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Computational Requirements of Digital Libraries

Machine Learning and Advanced Services

Digital collections and social networksA step towards digital curation

SaaS approach to digital curation

Indexing by Lucene/NutchCollection-level metadata extraction by Mahout

Page 16: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

A Workflow in Cloud HPC

A Middleware Architecture

SupportServices:-Documentprocesses-Contextsearch-Datamining

Map

Red

uce

En

gin

e

PolicyEnforcement

ArchivalStorageInterface

Middleware

Grid or Cloud Storage Grid or Cloud Computing

A middleware to make adoption by DL practitioners easierMoving towards computational science

Page 17: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Experimental Results

Cost

1 4 10 20 40 80

Number of Processing Cores

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08Avera

ge C

ost

in U

SD

100100010000

Figure: Comparison of average cost of computations with differentcollection sizes

Page 18: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Experimental Results

Running time

1 4 10 20 40 80

Number of Processing Cores

0

1000

2000

3000

4000

5000

6000

7000

8000R

unnin

g T

ime (

Min

s)

100100010000

Figure: Comparison of running times with different collection sizes

Page 19: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Open Issues

Obstacles to Adoption

Persistence and high-reliabilityMapReduceNot just a technological issue

Service-level agreementParticularly problematicAnother EU FP7 project working on it: SLA@SOINiche for alternative cloud providers

Difficulty of integration

Page 20: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Conclusions

Acknowledgment

Work has been funded by Sustaining Heritage Accessthrough Multivalent ArchiviNg (SHAMAN), an EU FP7large integrated project.http://shaman-ip.eu/shaman/

Additional funding has been received from Amazon WebServices.http://aws.amazon.com/

Page 21: High Performance Computing in Our Everydays

High Performance Computing in Our Everydays

Conclusions

Summary

Cloud and HPC: a solution looking for a problemDigital libraries

Computational requirementsExpertiseComplexity and integration

Contact: [email protected]