XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita...
-
Upload
pamela-jennings -
Category
Documents
-
view
213 -
download
1
Transcript of XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita...
XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes
Barcelona Supercomputing Center (BSC – CNS)
XtreemOS is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.
Outline• XtreemOS Overview• Application Execution Manager• Job Execution Flow• Monitoring• Performance and scalability• Job Execution• Job Status
• Future
XtreemOS overview• What is?• A Linux-based operating system to support Virtual
Organizations for Grid.
• Several layers
XtreemOS overview• Some key features:• The Grid easy to use (like a Linux)• Highly scalable.• Fault Tolerant.• Able to run interactive jobs.• Extensible
• 3 nodes types (can be replicated):• Core• Resource• Client
Application Execution Manager• Job management, Monitoring and resource management.• Access Point to submit and control jobs.• Distributed and asynchronous.• Extensible• Linux concepts in Grid world:• Process-Thread paradigm.• Signals.
Application Execution Manager• Several distributed services:
• Job Manager.• Execution Manager.• Reservation Manager.• …
• Semantics:• JobUnit• Set of processes of a Job running in a resource.
• Job• Set of JobUnits. Identified by a JobID. [Process-Thread]
Job Execution Flow
XOSD JobMng
User
XOSD ExecMng
JobDirectory
RSS
Any XOSD
Kernel
JID = createJob(JSDL)
JID
runJob(JID)
getResources(JSDL)
Schedules & Executes process
Job finished (all processes finished)
Monitoring• System metrics.• User defined metrics.• Different levels of information.• Buffering.
• Each service mantains its monitoring information (SCOPE).• ExecMng has information about processes.• JobMng has information about jobs.• ResMng has information about resources.
Performance & scalability• Key points:• Collaboration with Linux Kernel.• No central storage. (DHT’s)• Can be replicated.• Don’t search for best global scheduling, only for a good
enough local scheduling.
• What is the performance without DHT’s?• Typical VO, small (100 nodes) local grid.
Job Execution• O(X2):• Need resource
management for each submitted process.
• All processes are from the same job. (in other systems they would be independent jobs)
Job status• Ask all processes information of the job with low overhead. • Look job finished status in 0.012 seconds (0.014 in GT5) without contacting ExecMng’s
Future improvements• Reduced internal communication times.• Caching to reduce overhead.
• Some conclusions:• Kernel Collaboration with «middleware» is important.• DHT’s (not evaluated) are a good option to distribute data.• But still no high performance.
• Including the concept 1 Job-> n Process gives the user a lot of benefits.• Easy to understand, easy to manage.
XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes
Barcelona Supercomputing Center (BSC – CNS)
XtreemOS is funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.