Process control daemon

of 49/49
Process Control Daemon For Embedded Linux Platforms Speaker: Hai Shalom rt-embedded.com/ pcd
  • date post

    15-Dec-2014
  • Category

    Documents

  • view

    9.254
  • download

    2

Embed Size (px)

description

 

Transcript of Process control daemon

  • 1. Process Control DaemonFor Embedded Linux Platforms
    Speaker: Hai Shalom
    rt-embedded.com/pcd

2. Background review: What were the reasons that led to the development of PCD.
PCD project review: Features and high level overview of the project.
Live demonstration.
Q & A.
Agenda
3. Does your product have a process controller?
Does your product automatically recover after a crash?
Do you think your products boot time is fast enough?
Are you using methods other than printfto debug a crashed application?
Are you familiar with all the processes which are running in your product and their dependencies?
Some questions
4. Most of you probably answered No to at least one question.
People who answered Yes to all questions are probably using PCD already!
Lets review some facts about Embedded Linux based products
What were your answers?
5. Done by scripts (rcS, rc.*). These are great, but might be:

  • Not optimal for embedded / not deterministic:

6. Limited ways to synchronize depended processes(delay). 7. Limited ways to verify successful start of a process 8. No error checking (usually). 9. No formal way to define dependencies. 10. Difficult to start processes in parallel. 11. Not trivial to understand, maintain and extend: 12. Require additional shell scripting expertise. 13. Tend to be long and unreadable. 14. Plenty of commented code, old remarks, different code styles.System start up
15. Done by scripts (rcS, rc.*). These are great, but might be:

  • Not optimal for embedded / not deterministic:

16. Limited ways to synchronize depended processes (delay). 17. Limited ways to verify successful start of a process 18. No error checking (usually). 19. No formal way to define dependencies. 20. Difficult to start processes in parallel. 21. Not trivial to understand, maintain and extend: 22. Require additional shell scripting expertise. 23. Tend to be long and unreadable. 24. Plenty of commented code, old remarks, different code styles.System start up
Looks familiar?
25. A crashed program just terminates, usually after printing Segmentation Fault.

  • Now what?

26. Where is the debug information? 27. Kernel crashes are assumed to be handled by the systems watchdog.Signal Handlers not always implemented correctly.

  • Unsafe to use printf, and many other functions.

The system remains unstable and unusable.

  • End user must power-cycle (again?).

Crash handling and recovery
28. A crashed program just terminates, usually after printing Segmentation Fault.

  • Now what?

29. Where is the debug information? 30. Kernel crashes are assumed to be handled by the systems watchdog.Signal Handlers not always implemented correctly.

  • Unsafe to use printf, and many other functions.

The system remains unstable and unusable.

  • End user must power-cycle (again?).

Crash handling and recovery
31. No central management entity.

  • init is the parent of all processes.

Must know process pid in order to signal or kill.
Each process must manage his own children.

  • Child process inherits his fathers priority.

32. Parents must retrieve childsexit status, or else we end upwith ZombiesProcess management
33. A customer reports a crash in the field or in his lab tests:

  • There is no standard method for generating and collecting remote debug information.

34. When a process abnormally terminates, all its information goes away and no log is saved. 35. You might be on the next flight to the customers lab.Field/Remote debugging
36. A great (and free) solution: PCD
What is PCD?
37. What is PCD?
PCD Process Control Daemon, is an open source, light-weight system level process manager for Embedded-Linux based products (consumer electronics, network devices, etc).
The PCD provides a complementary service for any Embedded Linux driven product.
Designed and implemented by Hai Shalom during employment at Texas Instruments for Next-Gen Puma5 Cable chipset.
Released to open source as part of his M.Sc. Degree research.
PCD is a proven solution that already drives millions of devices in the world.
38. System startup: PCD starts up the system in an efficient, synchronized and deterministic manner.
Process management: a centralized entity that controls and monitors all processes, and provides API to manage them.
System recovery: Configurable per process recovery action is taken in case of a crash.
Debug information: PCD provides a detailed crash log in case of a program error.
PCD Features in high-level
39. How does it work?
What are the advantages of products with PCD?
40. Rule blocks replace/extend traditional shell scripts.
Each rule defines a single process.
Rule inter-dependency is well defined.
PCD Scripts: Rule blocks
Process 1
Rule 1
PCDScriptFile
Process 2
Rule 2
Process 3
Rule 3
41. Very simple and readable syntax.
Easy to extend and maintain.
Each Rule block is based on the same template and contains the following details:

  • What is the process name and parameters?

42. When to start it (depends on event)? 43. What is the required priority? 44. What is the completion event? 45. How much time to wait for it to complete? 46. What to do in case of a crash?PCD Scripts: Rule blocks
47. Very simple and readable syntax.
Easy to extend and maintain.
Each Rule block is based on the same template and contains the following details:

  • What is the process name and parameters?

48. When to start it (depends on event)? 49. What is the required priority? 50. What is the completion event? 51. How much time to wait for it to complete? 52. What to do in case of a crash?PCD Scripts: Rule blocks
53. Event Driven System Startup
Once all rules are parsed, the PCD builds a dependency graph database.
PCD starts each rule in the right time.
PCD continuously monitors the system.
PCD
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Rule
Last
54. Right time when a Start eventoccurred:

  • Another rule or set of rules have completed successfully, or;

55. A resource has been created (Network device, file).Completion eventwhen the attached process:

  • Has exited with the correct status, or;

56. Sent a Process ready event to the PCD, or; 57. Created a resource, or; 58. Was running for a specified amount of time, or; 59. Was created. 60. A Completion event of one rule could be the Start event of another rule.Event Driven System Startup
61. Dependencies between processes are well defined.
Rules are started as soon as their start event comes.
No need for non-deterministic delays between starting processes.
Rules without inter-dependency are started in parallel.
Improve user experience and product reputation (Fast product!)
Reduced startup time
62. Enhanced stability and robustness
Crash
Process
Signal
PCD
Rule
Restart
Recover
Rule
Ignore
Reboot
63. Enhanced stability and robustness
Enhanced monitoring on processes and recovery in case of failure.
Each Rule defines what to do in case its process crashes:

  • Restart the process: Usually for non-critical services such as a web server, or processes that can recover by restarting themselves.

64. Reboot the system: In case of a fatal, non-recoverable error. 65. Initiate a recovery rule. 66. Ignore: Similar behavior without PCD.