Big Data

16
1 Activity in Big Data 26/03/2012

description

Big Data en el Barcelona Supercomputing Center

Transcript of Big Data

Page 1: Big Data

1

Activity

in Big Data 26/03/2012

Page 2: Big Data

2

Previous work

in Big Data

Page 3: Big Data

3

Scenario

Application placement and

scheduling:

MapReduce

Data management:

Key-Value storage

Target Applications:

Data Analytics

Bioinformatics

Page 4: Big Data

4

Big Data Papers

High level performance goals and Big Data

• Resource-aware Adaptive Scheduling for MapReduce Clusters. J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, E. Ayguadé. In the ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011).

• Performance-Driven Task Co-Scheduling for MapReduce Environments.J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, I. Whalley. In the 12th IEEE/IFIP Network Operations and Management Symposium (NOMS2010).

Hybrid Hardware and Big Data

• Speeding Up Distributed MapReduce Applications Using Hardware Accelerators. Y. Becerra, V. Beltran, D. Carrera, M. González, J. Torres and E. Ayguadé. In the 38th International Conference on Parallel Processing (ICPP 2009).

• Accelerated MapReduce Workloads in Heterogeneous Clusters. J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, E. Ayguadé. Performance Management of Accelerators. In the 39th International Conference on Parallel Processing (ICPP2010).

Big Data and Energy:

• Towards Energy-Eficient Management of MapReduce Workloads. J.Polo, Y. Becerra, D. Carrera, V. Beltran, J. Torres and E. Ayguadé. First international conference on energy-efficient computing and networking. (e-Energy 2010).

• GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. Í. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. European Conference on Computer Systems (Eurosys 2012).

Page 5: Big Data

5

On going research

in Big Data

Page 6: Big Data

6

New challenges in Big Data: OUR VISION

Data Volume GBs PBs

Executio

n T

ime

Conventional Storage Systems

Large Data

Sets, growing

too big for

conventional

storage/tools

new

requirements

for real-time

decisions

Page 7: Big Data

7

New challenges in Big Data: OUR APPROACH

Data Volume GBs PBs

Executio

n T

ime

Conventional Storage Systems

MapReduce &

NoSQL

In-memory

Page 8: Big Data

8

On going research projects

Goal Use case Collaborators Technology

involved

MapReduce & NoSQL

Snapshot isolation (support to

online data generation)

Data Analytics

IBM Hadoop

& Cassandra

High level performance goal and automatic query configuration

Data Analytics and Bioinformatics (support to

drug discovery)

Life Science Dept. (BSC)

Hadoop &

Cassandra

Automatic configuration, data organization to meet high level

performance goals

Bioinformatics (support to drug discovery)

Life Science Dept. (BSC)

Cassandra

In-Memory

In-Memory Bioinformatics

Workflows (index construction, alignment, sorting, data

processing)

Bioinformatics (genomic sequencing)

IBM and Life Science Dept. (BSC)

PIMD

Page 9: Big Data

9

Next planned research

in Big Data

Page 10: Big Data

10

New challenges in Big Data: Our approach

Data Volume GBs PBs

Executio

n T

ime

Conventional Storage Systems

MapReduce &

NoSQL

In-memory

Storage Hierarchy

Management RDBMS

IN-MEMORY

APPLICATION

Page 11: Big Data

Our Big Data resource management picture

Resource Management

Application placement and scheduling:

(multi-job performance goals, resource awareness,

hybrid harware)

Data Management: automatic data organization

and configuration (NoSQL/In-Memory/Hierarchy

management)

Hig

h S

ca

lab

le

NoSQL

In-M

em

ory

DB

Heterogeneous Compute Nodes

Storage Hierarchy:

Mix of Mechanichal + Flash + SCM

Data Analytics

Drug Discovery

Air Quality Forecasting

Genomic Sequencing

Business Intelligent

SQL

To meet performance goals as:

Consistency,

Availability,

Partitioning Tolerance,

Energy Consumption,

Response Time,

Page 12: Big Data

Collaboration with other BSC departments

Data-centric Resource Manager

Custom Data

Mgmt.

Compute nodes Storage

In-mem Key/Val

eDSL Prog.

Models

NoSQL

FileSystems Persistent

Objects

Mix of Mechanichal + Flash + SCM

Lega

cy C

od

e

(MP

I)

Heterogeneous Application Flows (Domain Specific, Differentiated Resource

Requirements)

Page 13: Big Data

13

Autonomic & eBusiness

Group

Page 14: Big Data

14

Group Goal

To research autonomic and intelligent resource

management for today's business applications.

Autonomic and Intelligent Resource

Management

Cloud Computing

Big Data

Business Analytics

High Performance Computing

Sustainable Computing

The objective is to create new

components at middleware

level that provides holistic

solutions for some of the new

IT challenges in the industry

Page 15: Big Data

15

Current main interrelated areas

Middleware that provides

a holistic solution

Workload Management

Massively Distributed Data Stores

Embedded Domain Specific Languages for

HPC

Exploiting Heterogeneous

Hardware

BLO-driven Management

High performance architectures for Big Data

Online predictors

Service-aware VM

Management

Energy-aware Management

Page 16: Big Data

16

Group members

www.bsc.es/eBusiness