Life of a Cell
-
Upload
caldwell-humphrey -
Category
Documents
-
view
22 -
download
0
description
Transcript of Life of a Cell
![Page 1: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/1.jpg)
Life of a Cell
Woes and Wins
![Page 2: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/2.jpg)
Dexter "Kim" Kimball ([email protected]) 2
The Conundrum
Distribute -- on-line -- millions of pages of aircraft maintenance documentation in a system that the FAA requires to be foolproof:– No downtime– All data identical for every mechanic
worldwide. “Always”
![Page 3: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/3.jpg)
Dexter "Kim" Kimball ([email protected]) 3
Business Risks
An airplane cannot leave the gate if maintenance documentation is unavailable.
An airplane stuck at the gate causes the airline to lose lots of money (system wide)
Hasn’t been done before
![Page 4: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/4.jpg)
Dexter "Kim" Kimball ([email protected]) 4
Business Drivers
Faster access to documentation translates to millions of dollars a year in recovered revenue– No such thing as “I did that yesterday I’ll just
wing it” – documents change daily– New document is printed and carried aboard
the aircraft (or you’re busted)– Search times and print times must be low
![Page 5: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/5.jpg)
Dexter "Kim" Kimball ([email protected]) 5
Business Drivers
Consistency of documentation eliminates “flip flop” maintenance costs– I use procedure A and perform X– Downline – old documents ... “Hey, who did
that? But uh oh I can fix it.” Procedure B– Downline – new documents, Procedure A ....
![Page 6: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/6.jpg)
Dexter "Kim" Kimball ([email protected]) 6
Business Drivers
• Safety– An incident involving a fatality drops ticket
sales by 50% for two weeks.– If the incident cannot be explained ticket
sales remain off until it is– US Airways 737 (1994?), Pittsburgh, almost
put airline out of business– Airline people really do care about the people
they’re responsible for
![Page 7: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/7.jpg)
Dexter "Kim" Kimball ([email protected]) 7
The Plan
Be the first airline to gain competitive advantage by going to 100% online documentation
Retire microfilm/microfiche completely
Don’t lose shirt
![Page 8: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/8.jpg)
Dexter "Kim" Kimball ([email protected]) 8
The Technologies
• Excalibur Technologies “EFS” (Electronic File System)
• Transarc AFS 3.3
• HP Servers
• Bunch’o’stuff to convert manuals to TIF
• Windows 3.1 target user platform
![Page 9: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/9.jpg)
Dexter "Kim" Kimball ([email protected]) 9
The Process
Scan microfiche/film manual pages to TIF• EFS: OCR TIFs• AFS: Store TIF pages• EFS: Index TIFs (OCR output), keyword indexes• AFS: Store index• AFS: Replicate to strategically placed fileservers• Mechanics and engineers:
– Click on index icon (File cabinet)– Keyword search– EFS client on Windows 3.1 desktop requests data from EFS
server running on AFS fileserver
![Page 10: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/10.jpg)
Dexter "Kim" Kimball ([email protected]) 10
World wide airline, world wide cell
• Fileserver locations decided by– Location on corporate backbone– Connectivity from other linestations (smaller
airports)– Number of linestations that can be served
from location– Paranoia (over designed by 2x)
![Page 11: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/11.jpg)
Dexter "Kim" Kimball ([email protected]) 11
Domestic Fileserver Locations
BOI
PIT
IAD
BWI
MIA
IAH
IND.181130
189
96
75
373
nLarge location (> 50 workstations);
Fileserver location. n is totalnumber of workstations in region.
Medium location (8-workstations); AFSclient only. No local fileservers.
Small location (< 8 workstations);AFS client only. No local fileservers.
Basic U.S. map with airport codes courtesy of Roger Blundell
AFS Fileserver Locations and their FileserviceRegions
![Page 12: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/12.jpg)
Dexter "Kim" Kimball ([email protected]) 12
End User Workstations
• Every hangar -- many per “dock”
• Every gate – 2x, independent LANs
• Every engineering department
• Facilities for support of in-air aircraft
(World wide)
![Page 13: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/13.jpg)
Dexter "Kim" Kimball ([email protected]) 13
AFS Client Locations
• Minimal– No supported Windows 3.1 AFS client– EFS client requests data from AFS client
![Page 14: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/14.jpg)
Dexter "Kim" Kimball ([email protected]) 14
Number of users
• 40000 human users– “I forgot my password” puts airline out of
business
• 1500 workstations – workstation hostname is “user” and is written on front of workstation
![Page 15: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/15.jpg)
Dexter "Kim" Kimball ([email protected]) 15
Woes and Wins
• Network – shoving data into your LAN
• Replication management– Who is authorized– You want me to release how many volumes?– vos release times
• FAA – the system will not go down! All replicas will be identical
• Let’s use a really big cache for Seattle!
![Page 16: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/16.jpg)
Dexter "Kim" Kimball ([email protected]) 16
Woe: Network
How to get 300 – 600 GB of data to fileserver for initial load of ROs– Slow links to small airports– Slow links to international server locations– Fast links heavily trafficked– vos release can beat the * out of a network– An airline is always in operation – no magic
window of opportunity
![Page 17: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/17.jpg)
Dexter "Kim" Kimball ([email protected]) 17
Win: Network
• Can’t use vos release
• Hey, we have lots of those airplane things– Load local (SFO) fileserver array with disks,
setup vicep’s– vos addsite to fileserver/array; vos release– vgexport – OS says by to volume groups – vos remsite; remove drives; – Fly to wherever; vgimport, vos addsite / vos
release. Rio, anyone?
![Page 18: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/18.jpg)
Dexter "Kim" Kimball ([email protected]) 18
Woes: Replication Management
15000 RW volumes, all replicated
• Who’s authorized to issue vos release?
• Which volumes to release? EFS randomly places data ...
• How many volumes did you say to release?
![Page 19: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/19.jpg)
Dexter "Kim" Kimball ([email protected]) 19
Win: Replication Management
• Authorization/automation– Per fleet per manual vosrel PTS group– PTS group on every relevant volume root
node– User interface writes record to work queue, a
file in /afs• Requester; manual/index; priority
– Fileserver cron job compares requester with vosrel PTS group, figures out volume list, performs vos release –localauth
![Page 20: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/20.jpg)
Dexter "Kim" Kimball ([email protected]) 20
Woe: Replication Management
• Which volumes to release?– Well known volume tree and consistent
naming conventions– Release all volumes for requested manual– Who cares, really? How many can there be?
• Sometimes 4000+ volumes per night• vos release is slowish – doesn’t check to see if
volume is unchanged; looks at contents• Release cycle > 24 hours, queue issue. OW!
![Page 21: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/21.jpg)
Dexter "Kim" Kimball ([email protected]) 21
Win: Replication Management
• Filter release requests– Compare RO dates, RW dates – if RW not
changed and all ROs same date, skip it• Filter: 3 seconds • vos release “no op” – 30 seconds
– Small fraction of volumes for given manual are actually changed
• Sometimes 0 changed; sometimes < 1%; usually small fraction of total
![Page 22: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/22.jpg)
Dexter "Kim" Kimball ([email protected]) 22
Woe: FAA – the system will not fail!!
• FAA requires 100% uptime, else won’t approve system and airline can go fish
• Yeah, right!
![Page 23: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/23.jpg)
Dexter "Kim" Kimball ([email protected]) 23
Win: FAA – the system will not fail!!
• Data outage vs. system outage
• Replication, of course
• Multiple configurations for EFS client– Crude failover
• No data outage for six years and counting– Well, there were a couple of times when ...
but we fixed that ...
![Page 24: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/24.jpg)
Dexter "Kim" Kimball ([email protected]) 24
Woe: FAA –replicas will be identical
• Several million RW files X 5 replicas
• Have to prove that all files are identical across the 5 ROs for a given volume
![Page 25: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/25.jpg)
Dexter "Kim" Kimball ([email protected]) 25
Win: FAA –replicas will be identical
• Tree crawler!
• A little cheesy – “ls –l | cksum” each directory in volume and compare results
• Known “bad case” looked for 6x per day
• Key “fs setserverprefs” – I prefer you, now you, now you, now you
• Dedicated client, no mounted .backups
![Page 26: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/26.jpg)
Dexter "Kim" Kimball ([email protected]) 26
Woe: Let’s use a really big cache
• It seemed like a really good idea– 20% files changed per quarter -- < 2%/week– Average file size 10K– Oops, the indexes are monolithic and 300
MB ... but don’t change often– Let’s try a 12 GB cache!
• “Hello? I’ve got twenty minutes to turn the shuttle. It takes fifteen minutes to ...”
![Page 27: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/27.jpg)
Dexter "Kim" Kimball ([email protected]) 27
Win: Let’s not use a really big cache
• AFS client (still I believe?) chokes on large cache– 12 GB =~ 1,200,000 cache “Vfiles”– At garbage collection time, cache purge
looks for LRU– Gee, that takes a long time. Is the machine
dead?– Let’s try a 3 GB cache!
• (Worked indefinitely from 3.3 through 3.6)
![Page 28: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/28.jpg)
Dexter "Kim" Kimball ([email protected]) 28
Other smidgeons
• vos release manager– Does volume need to be released?– Are all the relevant fileservers available?– Is there a sync site for the VLDB?– Do it– Did it?
• Check VLDB entry• Compare dates
![Page 29: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/29.jpg)
Dexter "Kim" Kimball ([email protected]) 29
Other smidgeons
• Data reasonableness checks– Do files pointed to by index actually exist?– If not, do not vos rel the index– Avoids the data outage of “empty index” – for
example *(bad day)*
![Page 30: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/30.jpg)
Dexter "Kim" Kimball ([email protected]) 30
Other smidgeons
• popcache– Index files: monolithic and large– Fileservers: overseas, slow networks– Initial search of newly released index could
take many minutes– Cat indexes to /dev/null every five minutes
• If index unchanged, local cached copy is used• If index changed, pulled from fileserver and user
doesn’t pay penalty for first search
![Page 31: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/31.jpg)
Dexter "Kim" Kimball ([email protected]) 31
Other smidgeons
• Anyone here ever have these?– AFS is complaining about the network, so AFS broke
the network • AFS is the network’s canary in a cage
– We could do the whole thing with NFS!– AFS isn’t POSIX compliant. Yay DFS! – A file lock resides on disk. File in RO volume can’t be
locked. (Oh yes it can.)– HP T500 goes to sleep?– We could do the whole thing on a Kenmore!
![Page 32: Life of a Cell](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813521550346895d9c85c4/html5/thumbnails/32.jpg)
Dexter "Kim" Kimball ([email protected]) 32
Outcome: AFS Rules
• The airline became the first airline (and may still be the only) to place 100% of its aircraft maintenance documentation on line
• The system has run reliably for 5 years +• So of course it’s time to replace it
• There are three server locations in the US, one each in Europe, Hong Kong, Narita, Sydney, Montevideo, Rio de J
• Mechanics no longer mash the microfilm reader
This system was enabled by AFS