Barcelona Supercomputing Center

5
Barcelona Supercomputing Center

description

Barcelona Supercomputing Center. Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external research. BSC-CNS is a consortium that includes : the Spanish Government (MEC) – 51% - PowerPoint PPT Presentation

Transcript of Barcelona Supercomputing Center

Page 1: Barcelona Supercomputing Center

Barcelona Supercomputing Center

Page 2: Barcelona Supercomputing Center

Barcelona Supercomputing Center

• The BSC-CNS objectives:

• R&D in Computer Sciences, Life Sciences and Earth Sciences.

• Supercomputing support to external research.

• BSC-CNS is a consortium that includes :

• the Spanish Government (MEC) – 51%

• the Catalonian Government (DIUE) – 37%

• the Technical University of Catalonia (UPC) – 12%

• 300 people

Page 3: Barcelona Supercomputing Center

Research areas

• Influence the way machines are built, programmed and used

• Through demonstration, ideas, cooperation with manufacturers & productse-science

Programming models• Evolving standarts (OpenMP x.y)• Prototyping infrastructure (mercurium, nanos library, …)• Dependeces/data-flow (StarSs for Cell, SMP, GPU, Grid)• Hierarchical/hybrid (MPI/SMPSs, NestedSs, …) • Software Distributed Shared Memory•Use of Transactional memory

Programming models• Evolving standarts (OpenMP x.y)• Prototyping infrastructure (mercurium, nanos library, …)• Dependeces/data-flow (StarSs for Cell, SMP, GPU, Grid)• Hierarchical/hybrid (MPI/SMPSs, NestedSs, …) • Software Distributed Shared Memory•Use of Transactional memory

Resource management• OS scheduling: resource/power aware job scheduling, dynamic load balancing• Scalable file systems • Efficient execution on distributed computing environments: GRIDSs @ MN/RES, Grid I/O, heterogenous workloads• Management for next-generation data centers: virtualization

Resource management• OS scheduling: resource/power aware job scheduling, dynamic load balancing• Scalable file systems • Efficient execution on distributed computing environments: GRIDSs @ MN/RES, Grid I/O, heterogenous workloads• Management for next-generation data centers: virtualization

Performance analysis• Tracing: scalable/online, sampling• Visualization: Paraver• Automatic analysis: spectral, clustering,…• Methodologies and training material• Integration with other tools

Performance analysis• Tracing: scalable/online, sampling• Visualization: Paraver• Automatic analysis: spectral, clustering,…• Methodologies and training material• Integration with other tools

Prediction and evaluation infrastructure• Dimemas: multiscale simulation• Interconnection network: overlap, contention, …• Node and microarchitecture level simulators: MPsim, TaskSim• Architecture support for programming models and runtimes

Prediction and evaluation infrastructure• Dimemas: multiscale simulation• Interconnection network: overlap, contention, …• Node and microarchitecture level simulators: MPsim, TaskSim• Architecture support for programming models and runtimes

UsersEarth SciencesLife Sciences Engineering apps

Page 4: Barcelona Supercomputing Center

Programming models

• Implementations on top of other low level run times, FPGAs, OpenCL

• Granularity control

• Locality aware scheduling

• Application porting Hybrid MPI/StarSs and comparison with other models

• Load balancing in nested/hybrid implementations

• Instrumentation and analysiss for task based systems

StarSsCellSs

SMPSs

GPUSs

GridSs

ClearSpeedSsClusterSs

CompSs (Java)

#pragma css task input(A, B) output(C)void vadd3 (float A[BS], float B[BS], float C[BS]);#pragma css task input(sum, A) output(B)void scale_add (float sum, float A[BS], float B[BS]);#pragma css task input(A) inout(sum)void accum (float A[BS], float *sum);

for (i=0; i<N; i+=BS) // C=A+B vadd3 ( &A[i], &B[i], &C[i]);...for (i=0; i<N; i+=BS) // sum(C[i]) accum (&C[i], &sum);...for (i=0; i<N; i+=BS) // B=sum*A scale_add (sum, &E[i], &B[i]);...for (i=0; i<N; i+=BS) // A=C+D vadd3 (&C[i], &D[i], &A[i]);...for (i=0; i<N; i+=BS) // E=C+F vadd3 (&C[i], &F[i], &E[i]);

Page 5: Barcelona Supercomputing Center

Performance tools

• Analysis of applications at large scale

• Maximize ratio of captured information / emitted data

• Intelligent on line data reduction

• Mixed instrumentation and sampling

• Advanced modeling/prediction of sequential computation behavior

• Memory behavior

• Use classification techniques of hardware counter metrics to identify potentially interesting transformations

CPI STACK model for sequential

computation parts