XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita...

13
XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou , Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes Barcelona Supercomputing Center (BSC – CNS) XtreemOS is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.

Transcript of XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita...

Page 1: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes

Barcelona Supercomputing Center (BSC – CNS)

XtreemOS  is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.

Page 2: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Outline• XtreemOS Overview• Application Execution Manager• Job Execution Flow• Monitoring• Performance and scalability• Job Execution• Job Status

• Future

Page 3: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

XtreemOS overview• What is?• A Linux-based operating system to support Virtual

Organizations for Grid.

• Several layers

Page 4: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

XtreemOS overview• Some key features:• The Grid easy to use (like a Linux)• Highly scalable.• Fault Tolerant.• Able to run interactive jobs.• Extensible

• 3 nodes types (can be replicated):• Core• Resource• Client

Page 5: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Application Execution Manager• Job management, Monitoring and resource management.• Access Point to submit and control jobs.• Distributed and asynchronous.• Extensible• Linux concepts in Grid world:• Process-Thread paradigm.• Signals.

Page 6: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Application Execution Manager• Several distributed services:

• Job Manager.• Execution Manager.• Reservation Manager.• …

• Semantics:• JobUnit• Set of processes of a Job running in a resource.

• Job• Set of JobUnits. Identified by a JobID. [Process-Thread]

Page 7: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Job Execution Flow

XOSD JobMng

User

XOSD ExecMng

JobDirectory

RSS

Any XOSD

Kernel

JID = createJob(JSDL)

JID

runJob(JID)

getResources(JSDL)

Schedules & Executes process

Job finished (all processes finished)

Page 8: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Monitoring• System metrics.• User defined metrics.• Different levels of information.• Buffering.

• Each service mantains its monitoring information (SCOPE).• ExecMng has information about processes.• JobMng has information about jobs.• ResMng has information about resources.

Page 9: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Performance & scalability• Key points:• Collaboration with Linux Kernel.• No central storage. (DHT’s)• Can be replicated.• Don’t search for best global scheduling, only for a good

enough local scheduling.

• What is the performance without DHT’s?• Typical VO, small (100 nodes) local grid.

Page 10: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Job Execution• O(X2):• Need resource

management for each submitted process.

• All processes are from the same job. (in other systems they would be independent jobs)

Page 11: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Job status• Ask all processes information of the job with low overhead. • Look job finished status in 0.012 seconds (0.014 in GT5) without contacting ExecMng’s

Page 12: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Future improvements• Reduced internal communication times.• Caching to reduce overhead.

• Some conclusions:• Kernel Collaboration with «middleware» is important.• DHT’s (not evaluated) are a good option to distribute data.• But still no high performance.

• Including the concept 1 Job-> n Process gives the user a lot of benefits.• Easy to understand, easy to manage.

Page 13: XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes

Barcelona Supercomputing Center (BSC – CNS)

XtreemOS  is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.