BEACON Edited: Dec 11th. Summary Principle Scenarios Existing Technologies.
-
Upload
ira-gibson -
Category
Documents
-
view
213 -
download
0
Transcript of BEACON Edited: Dec 11th. Summary Principle Scenarios Existing Technologies.
BEACON
Edited: Dec 11th
Summary
• Principle• Scenarios• Existing Technologies
Logical Representation of BEACONand Beacon end-points (Beeps)
Job and Resource manager
RAS system BEAC
ON
and
Exp
osé
back
plan
esNotificationsCommandsNotificationsCommands
NotificationsCommands
NotificationsCommands
NotificationsCommands
NotificationsCommandsCPU CPU
Node
Enclave
Application
Runtime systems
CPU CPU
Node
OS OS
CPU CPU
Node
Enclave
Application
Runtime systems
CPU CPU
Node
OS OS
Logical Representation of BEACON
Job and Resource manager RAS channel
CPU CPU
Node
Application
Runtime systems
System
LocalBEACON
CPU CPU
Node
Notifications
Comm
ands
LocalBEACON
Notifications
Comm
ands
CPU CPU
Node
Enclave
Application
Runtime systems
LocalBEACON
CPU CPU
Node
Notifications
Comm
ands
LocalBEACON
Notifications
Comm
ands
GlobalBEACON
GlobalBEACON
GlobalBEACON
GlobalBEACON
Noti
ficati
ons
Com
man
ds
Noti
ficati
ons
Com
man
ds
BEACON Principle
NodeLocal
BEACON
NodeLocal
BEACON
NodeLocal
BEACON
NodeLocal
BEACON
GlobalBEACON
GlobalBEACON
GlobalBEACON
GlobalBEACON
Enclave Enclave
• Two daemons helps failure containment, fault isolation, and security• Global Beacon is created when node boots up; connects to other global
beacons on other active nodes during startup• Local beacon is launched with the job in an enclave; connects to the global
beacon on the same node
Beacon Related Services
BEACON Services
IP multicast TCP/IPPAMI(BG/Q),
IBM machine?uGNI (XK6),
Aries (XC30)?
TranslatorsResponse
management
OS, Runtime, Applications, RMS, RAS, Enclave services, EXPOSE
Query management?
Unreliable channel Reliable channel
BEACON API
BEACON Transport
Logger
BEACON Events
• Beacon will support two types of data– Internal events (subscriptions, Beacon maintenance,
announcements, etc.)– External events (notifications, commands)
• Internal events can be produced and consumed by Beacon and its services
• External events are produced and consumed by all Beeps
• ? Do we need discrete and stream events? Stream throttling? Scenarios?
BEACON Event Format
Priority: -reliable or not-discrete or stream (if needed)
Payload:-generated and interpreted by Beeps
BEACON Start-up
• Discovery and Topology – Discovery and Topology daemon will reside on a permanent
node (similar to service node in BG)– Will help establish the topology of global Beacon daemons;
global daemons will contact it for parent discovery – Scalable, resilient (replication)– Topology options are still being researched:
• Small degree• Small diameter• High resilience• Multiple paths• CHORD, and other P2P topologies are candidates
BEACON Transport• BEACON transport can deliver events reliably or unreliably• Unreliable delivery: no delivery guarantees. • Reliable Delivery : Reliability will need to be end-to-end across a
distributed chain of agents (higher protocol that TCP)• Event Buffering
– Required because Time-To-Live for every event message– TTL is set by publisher (from 0 for immediate to few minutes?)– Producer produces events; but subscriber disappears before event reaches it
Event is dropped after TTL– Producer produces events; but subscription has not yet propagated in the
system Event will be sent to the subscriber (by the logger) if TTL is valid
BEACON Services
• Use the Beacon API (no other Point to point messaging)• Translators – Translate events so that they can be understood
semantically between Beeps• Response Management– Manages responses and coordinates
different entities following recovery plans• Logger – Logs external events and re-publishes events, based
on un-expired TTL, for new (or restarting) subscribers, duplicate events (re-published by the logger) will not be re-delivered to subscribers
• ? Query Management - Manages queries within the BEACON framework ?
Translators• The translators do not perform actions – they just read an
event and publish a new event, using state information to translate the payload
• Subscribers would have to subscribe to events coming from the translators
• For any system that does a mapping and/or allocation, we need a translator that can reverse the mapping.
• For ARGO, we will build a specific translator only when there is no other software in the process stack performing that translation (e.g. If MPI can tell that rank Y is failing when 0x1234 fails, then we do not need a translator for that)
Example Scenario
Example scenario
• Fan has failed This will cause several nodes and switches to fail within 5 seconds. The failure will affect several jobs and will affect the network. Some of the jobs can take preventive measures to handle node failures, other cannot. – Fan controller issues event “fan 17245 failed at 00:00:00”– “Translator process” A subscribes to “fan failures in the system” and
picks this message and issue several messages of the form “node 175 will fail at 00:00:05”
– “Translator process” B subscribes to “node failures in the system” and picks this message and issues the message “node 73 of enclave foo will fail at 00:00:05”
– The enclave manager C subscribes to “node failures in enclave foo” and picks this message and issues messages of the form “process with rank 25 in M : PI_COMM_WORLD” will fail at 00:00:05
Example scenarioIdeally speaking, • Translator A - uses information on the physical system topology; it could
also use information on the current system health: • Translator B - uses information on the nodes allocated to each enclave (by
the global resource manager)• Translator C-uses information on the mapping of MPI processes to the
nodes (by the partition manager)
Practically speaking, • Creation of translators might be scenario based
Beacon Scenarios
Double bit error: detected/uncorrectableApplication and library both can handle,
Response manager decide which one does the correction
Example of application:Bag of tasks, each tasks calling linear algebra
functions or FFTs (ABFT version)
Double bit error: detected/uncorrectableIn App: App handles
App
Lib
OS
MemCont
Register@Handler
Hardware interrupt
Progress is stopped
“Classic” way Mem accessor Scrubbing
Invocation of signal handler
Handler fix or not
Handle returns to OS
OS returns control to App
App
Lib
OS
MemCont
App levelHandler
Progress is stopped
Beacon way
ResponseManager
BeaconMem access
Manager decides App should fix
Handler Fix or not
Lib levelHandler
Hardware interrupt
OS uses APIto ask response
Invocation of signal handler
App handler returns to OS
OS returns control to App
OS needs to accept multiple handlers
Double bit error: detected/uncorrectableIn Lib: Lib handles
App
Lib
OS
MemCont
App levelHandler
Progress is stopped
Beacon way
ResponseManager
Beacon
Mem access
Manager decides Lib should fix
Handler Fix or notLib levelHandler
Hardware interrupt
OS uses APIto ask response
Invocation of signal handler
Lib handler returns to OS
OS returns control to Lib
Double bit error: detected/uncorrectableIn Lib: App handles
App
Lib
OS
MemCont
App levelHandler
Progress is stopped
Beacon way
ResponseManagerBeacon
Mem access
Manager decides App should fix
Handler Fix or not
Lib levelHandler
Hardware interrupt
OS uses APIto ask response
Invocation of signal handler
App handler returns to OS
OS returns control to App
Note that the correction may be attempted in the Lib first and if the Lib does notsucceed then the application handler could be called. The corresponding diagramcould be built from this one and the previous one.
Response Management (RM)• Entities who subscribe and receive events will want to respond with
actions• A response management framework will need to manage
response/recovery authorizations in systematic manner without compromising system stability
• Phases of the BEACON software: Each BEACON-enabled software will have the following phases:
1. Announcement of capabilities : Entities have to announce their response capabilities for various events. Responses are declared on a per-event basis by every component
2. Exchange of events :- Publish and subscribe to event; receive events3. Responding to events :- RM will implement a response plan, decide who
should take action and will publish corresponding events. Response/recovery sequence is listed in an admin-provided data file
Response ManagementResponse manager
– Tracks when component connects and exit– One exists per enclave. We might add a global response manager, if
needed– Will subscribe to events of topic = “auth-requested”– Will publish events of topic = “auth-response” will indicate if a
software has permission to start recovery– “auth-response” events are also called as commands
Response Manager Protocolin case of multiple recovery options
Fault-Tolerant
Application
BEACON
1. Received event foo
MigrationManager
(MM)
1. Received event foo
4. Publish “Recovery Started”
5. Publish “Recovery Failed ”
6. Publish “Recovery Started”
ResponseManager
7. Publish “Recovery Completed”
3. Publishes “Auth granted” to (1) App; (2) MM
2. Publishes “Want Auth for foo”
2. Publishes “Want Auth for foo”
Response plan:Try app firstThen migration
Query management
• Currently, no scenarios seem to require this feature– Wait and see approach; reliable BEACON anyways provides a
foundation to build this
Existing Technologies
• Characterization of the system architecture to be used in the ARGO project
• Looked at existing technologies (Astrolabe, Google Dapper, IBM Elastic subscribe)– Nothing that can be picked up and used since most are designed for the
internet. Use gossip protocols; do not offer reliable delivery
• Other potential technologies under investigation– CIFTS, AMQP, EVPATH
Existing Technologies
• Characterization of the system architecture to be used in the ARGO project
• Looked at existing technologies (Astrolabe, Google Dapper, IBM Elastic subscribe)– Nothing that can be picked up and used since most are designed for the
internet. Use gossip protocols; do not offer reliable delivery
• Other potential technologies under investigation– CIFTS, AMQP, EVPATH
EVPATH