The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”
description
Transcript of The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”
![Page 1: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/1.jpg)
The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”
Alva L. Couch, [email protected] Daniels, [email protected]
Tufts University, Medford MA USAhttp://www.eecs.tufts.edu/~couch/maelstrom
![Page 2: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/2.jpg)
Network Troubleshooting Target problem: automate network
troubleshooting. Starting point: list of services to
assure. Easy part: how to assure one
service. Hard part: precedences between
assurance tasks.
![Page 3: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/3.jpg)
Dreams Quicker response to network
problems. More collaboration. Less “boring” work. Acceptable losses!
![Page 4: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/4.jpg)
Silly Example
A: network up
B: nis bound
C: commands work
D: home dirs available
E: nfs server up
![Page 5: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/5.jpg)
Ideal: “Plug and Play” Automation Grab the scripts that you need
from others. Scripts all just “get along” and
work together. Make script writers work harder. So administrators’ work is easier!
![Page 6: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/6.jpg)
Let’s Automate! Suppose we write scripts
A,B,C,D,E to check and repair corresponding functions.
Normally, we’d have to remember to run them in the order “A B E D C”.
We’d usually do that by predeclaring precedences: B:A means “B must follow A”.
![Page 7: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/7.jpg)
Predeclaring Precedences
A: network up
B: nis bound
C: commands work
D: home dirs available
E: nfs server up
B:A C:B C:D D:A D:E E:A
B:A
C:B C:D
D:A D:E
E:A
![Page 8: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/8.jpg)
Predeclaring PrecedencesIs a Pain! Must know precedences beforehand. Must update precedences whenever
you add or remove a script. In some cases, precedences are
unknown or dynamic! In this case, any fixed order is an
ineffective procedure for troubleshooting some problems.
![Page 9: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/9.jpg)
One Ineffective Procedure
F: filesystem OK
G: fsck command in filesystem OK
No fixed orderwill ensure success.
“Chicken and Egg” problem!
![Page 10: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/10.jpg)
Discovering Order Suppose that A,B,C,D,E are crafted so
that: They fail robustly when called at the wrong
time. They tell you when they fail. They don’t undo each other’s actions.
Then we may infer their required execution order from their behavior rather than declaring precedences beforehand!
![Page 11: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/11.jpg)
Permutation Embedding For a set of objects x1,x2,…,xn, all
permutations of the objects are embedded in the string containing n-1 copies of x1,…,xn, followed by x1.
E.g., x1,…,xn,x1,…,xn,…..,x1,…xn,x1 n-1 copies trailing x1
![Page 12: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/12.jpg)
Example of Permutation Embedding
Embedding Permutation
ABCDEABCDEABCDEABCDEA
ABCDE
ABCDEABCDEABCDEABCDEA
BACDE
ABCDEABCDEABCDEABCDEA
ACBDE
… …
ABCDEABCDEABCDEABCDEA
ECDBA
ABCDEABCDEABCDEABCDEA
DECBA
ABCDEABCDEABCDEABCDEA
EDCBA
![Page 13: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/13.jpg)
Exploiting Permutation Embedding Don’t record precedences. Try scripts in embedding
sequence. Record successes. Don’t repeat trials that succeed. Retry scripts that fail. Until all succeed!
![Page 14: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/14.jpg)
Discovering Order (1)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A
![Page 15: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/15.jpg)
Discovering Order (2)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B
![Page 16: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/16.jpg)
Discovering Order (3)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B
![Page 17: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/17.jpg)
Discovering Order (4)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B
![Page 18: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/18.jpg)
Discovering Order (5)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E
![Page 19: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/19.jpg)
Discovering Order (6)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E
![Page 20: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/20.jpg)
Discovering Order (7)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E
![Page 21: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/21.jpg)
Discovering Order (8)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E
![Page 22: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/22.jpg)
Discovering Order (9)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E D
![Page 23: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/23.jpg)
Discovering Order (10)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E D
![Page 24: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/24.jpg)
Discovering Order (11)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E D
![Page 25: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/25.jpg)
Discovering Order (12)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E D
![Page 26: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/26.jpg)
Discovering Order (13)
A
B
C
D
E A B C D EA B C D EA B C D EA B C D EA
Precedences Execution order
Discovered order:A B E D C
![Page 27: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/27.jpg)
Effect of Initial Ordering Efficiency depends upon initial
ordering of tasks. Best case: initial order is
appropriate order. Worst case: initial order is
opposite to appropriate order.
![Page 28: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/28.jpg)
Best Case: ABEDC
A
B
C
D
E A B E D CA B E D CA B E D CA B E D CA
Precedences Execution order
Discovered order:A B E D C
![Page 29: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/29.jpg)
Worst Case: CBDEA (1)
A
B
C
D
E C B D E AC B D E AC B D E AC B D E AC
Precedences Execution order
Discovered order:A
![Page 30: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/30.jpg)
Worst Case: CBDEA (2)
A
B
C
D
E C B D E AC B D E AC B D E AC B D E AC
Precedences Execution order
Discovered order:A B E
![Page 31: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/31.jpg)
Worst Case: CBDEA (3)
A
B
C
D
E C B D E AC B D E AC B D E AC B D E AC
Precedences Execution order
Discovered order:A B E D
![Page 32: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/32.jpg)
Worst Case: CBDEA (4)
A
B
C
D
E C B D E AC B D E AC B D E AC B D E AC
Precedences Execution order
Discovered order:A B E D C
![Page 33: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/33.jpg)
How Bad Can Bad Get?
A B C D EE D C B AE D C B AE D C B AE D C B AE
Precedences Execution order
Discovered order:A B C D E
![Page 34: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/34.jpg)
Small Errors Mean Small Inefficiencies
A
B
C
D
E A B D E CA B D E CA B D E CA B D E CA
Precedences Execution order
Discovered order:A B E D C
![Page 35: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/35.jpg)
Implementation:The Maelstrom The mael command is a
dispatcher for a set of scripts. Input is a list of commands to try. Mael tries to make them all
succeed (exit code 0). Nonzero exit code means failure;
try again.
![Page 36: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/36.jpg)
Seeding the Storm Can give mael hints and other
information about its commands. B:A - I think command B should
be tried after A. B::A – B cleans up after A. B must
be retried if it succeeds before A. B:::A – I know B will only succeed
after A.
![Page 37: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/37.jpg)
Command Requirements Maelstrom only functions correctly if the
commands that it dispatches are: Aware: commands know whether they
failed. Homogeneous: commands that
change the same system attribute change it in the same way.
(Convergent: commands that discover that goals are already met do nothing.)
![Page 38: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/38.jpg)
How Difficult Is It to Write a Conforming Maelstrom Script? Easy part: awareness.
Local to the script. Insert enough branches to check for
script preconditions. Hard part: homogeneity.
Global convergence criterion. All scripts must agree on desired
effects.
![Page 39: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/39.jpg)
Form of Maelstrom Script Check all preconditions
necessary for script function. If preconditions are not present,
fail. Else try to fix a problem. If that seems to work, succeed. Else fail.
![Page 40: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/40.jpg)
Engineering Maelstrom Scripts No preconditions for the script as a
software unit. Safe to run in any sequence with
other scripts. Only thing in doubt:
homogeneity. Do scripts agree on what to do?
![Page 41: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/41.jpg)
Imperfect Storms ::, ::: help compensate for
imperfect command behavior. A::B – A and B aren’t
homogeneous and A should be done last, even if B succeeded last.
A:::B – A isn’t aware that it needs B, so do B first.
![Page 42: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/42.jpg)
A Lesson Learned Causality is a myth in a
sufficiently complex system. Cannot determine what will
happen in general. Can determine what repaired a
specific problem. This is not the same as what
caused the problem.
![Page 43: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/43.jpg)
Not Causal, but Operational Impossible to determine true
precedences between tasks by direct observation (Sandnes).
Easy to determine an order that satisfies unknown precedences.
![Page 44: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/44.jpg)
But Wait, There’s More!In the Paper: Decision trees represent best
practices. Mael’s commands can represent
decision trees. Mael replaces make’s global
precedence knowledge with dynamic probes during commands.
Can implement make in mael.
![Page 45: The Maelstrom: Network Service Troubleshooting Via “Ineffective Procedures”](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814be7550346895db8c026/html5/thumbnails/45.jpg)
Status and Availability http://www.eecs.tufts.edu
/ ~couch/maelstrom Platform: Perl 5. Portable to most any system. Intensively tested on a “precedence
simulator” that simulates behavior of troubleshooting scripts.
Working on script content now.