ARGOS Investigation

download ARGOS Investigation

of 119

description

ARGOS Investigation

Transcript of ARGOS Investigation

  • Active Response Gravity Offload System (ARGOS)

    Subject Drop Mishap

    JSC 13-0006

    IRIS S-2013-022-00002

    Close Call Mishap

    Date of Mishap: January 16, 2013

    Date of Report: March 22, 2013

  • 1

    Final Report Findings and Recommendation 3/22/2013

    Gail Chapline, Joe Anderson, Mary Cerimele, Mike Cooke, Mike Foreman, John Haas, Art Knell, Asher Lieberman, John Ruppert

    NASA/JSC ACTIVE RESPONSE GRAVITY OFFLOAD SYSTEM (ARGOS) INVESTIGATION BOARD REPORT

    NR

  • i

    Table of Contents

    1 Executive Summary ............................................................................................................................................... 1 2 Acknowledgments ................................................................................................................................................ 2 3 Background ........................................................................................................................................................... 3 4 Investigation Board Objectives ............................................................................................................................. 4 5 ARGOS Description ............................................................................................................................................... 4 6 Investigation ....................................................................................................................................................... 10

    6.1 Interviews ....................................................................................................................................................... 10 6.2 Mechanical System ......................................................................................................................................... 10 6.3 Hardware Inspection ...................................................................................................................................... 11

    6.3.1 Running Torque Measurements............................................................................................................. 12 6.3.2 Incremental Disassembly and Sample Collection ................................................................................... 13 6.3.3 Detailed Inspection of Major Components ............................................................................................ 18

    6.4 Fault Tree ........................................................................................................................................................ 24 6.5 Control System ............................................................................................................................................... 26 6.6 Electronic System ........................................................................................................................................... 28

    6.6.1 E-Stop System ........................................................................................................................................ 28 6.6.2 Z-Axis Motor Controller .......................................................................................................................... 29 6.6.3 CAN Bus Interface .................................................................................................................................. 29 6.6.4 Computer Running Trick Simulation ...................................................................................................... 29

    6.7 Software ......................................................................................................................................................... 30 6.7.1 Background ............................................................................................................................................ 30 6.7.2 ARGOS Hoist Control System Background ............................................................................................. 30 6.7.3 Software Validation ................................................................................................................................ 31 6.7.4 ARGOS Software Configuration Management ....................................................................................... 31 6.7.5 Software Regression Testing .................................................................................................................. 34 6.7.6 Fault Detection Software Logic .............................................................................................................. 34 6.7.7 Test Data Review .................................................................................................................................... 39

    6.8 Safety and Hazard Analysis ............................................................................................................................. 40 6.8.1 Safety ..................................................................................................................................................... 40 6.8.2 Hazard Analysis ...................................................................................................................................... 40

    6.9 Engineering Processes, Roles and Responsibilities ......................................................................................... 40 7 Findings and Recommendations ......................................................................................................................... 42

    7.1 General Findings: ............................................................................................................................................ 42 7.2 Proximate/Root Causes and Contributing Factors ......................................................................................... 42

    7.2.1 Proximate Cause .................................................................................................................................... 42 7.2.2 Intermediate Cause ................................................................................................................................ 42 7.2.3 Root Cause ............................................................................................................................................. 43 7.2.4 Contributing Factors ............................................................................................................................... 43

    7.3 Specific Findings: ............................................................................................................................................ 43 7.3.1 Findings Specific to Mechanical System Design ..................................................................................... 43 7.3.2 Findings Specific to the Z-axis Controller System ................................................................................... 46 7.3.3 Findings Specific to Software Design ...................................................................................................... 47 7.3.4 Findings Specific to Safety and Hazards ................................................................................................. 48

    8 References .......................................................................................................................................................... 50 8.1 Appointment Letter ........................................................................................................................................ 51 8.2 Materials Chemical Analysis Report ............................................................................................................... 53 8.3 Materials Metallurgical Analysis Report ......................................................................................................... 61 8.4 ARGOS Startup Checklist .............................................................................................................................. 106

  • ii

    List of Figures

    Figure 1: Handrail Involved in the Incident.................................................................................................................... 3

    Figure 2: ARGOS System Picture .................................................................................................................................... 5

    Figure 3: Inline Lifting Components ............................................................................................................................... 6

    Figure 4: Yates Shock Absorber ..................................................................................................................................... 7

    Figure 5: Spring/Damper Festo Muscle ...................................................................................................................... 7

    Figure 6: STI Load Cell .................................................................................................................................................... 8

    Figure 7:VNCHI Gimbal Assembly .................................................................................................................................. 8

    Figure 8: Genie Assembly without Rope ........................................................................................................................ 9

    Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview .............................................................................................. 11

    Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View.................................................................................... 12

    Figure 11: Torque Measurement Site ......................................................................................................................... 12

    Figure 12: Torque Measurements for Various Knob Positions ................................................................................... 13

    Figure 13: ARGOS Project Created Fault Tree ............................................................................................................. 24

    Figure 14: High Level Control Loop .............................................................................................................................. 26

    Figure 15: Electronics System Block Diagram ............................................................................................................. 28

    Figure 16: ARGOS Control System Block Diagram ....................................................................................................... 30

    Figure 17: Slide from EMU TRR Software .................................................................................................................... 31

    Figure 18: ARGOS Startup Checklist from day of incident ........................................................................................... 32

    Figure 19: ARGOS Startup Checklist Path Verification (step 54).................................................................................. 32

    Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident ...................................................... 33

  • iii

    List of Tables

    Table 1: ARGOS Evolution of Mechnaical System .......................................................................................................... 4

    Table 2: Listing of Major Components within Focus Area .......................................................................................... 18

    Table 3: Trick Simulation Source Code Files ................................................................................................................ 33

    Table 4: Fault Detection for the Hazard of Un-commanded Motion .......................................................................... 38

    Table 5: Test Parameters Recorded by the ARGOS Trick Simulation ........................................................................... 39

  • 1

    1 Executive Summary On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was un-intentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test subject did not suffer significant injuries, the potential for a serious injury was present. Serious injury can occur at shorter distances as the test subject has less time to react. The un-intended drop could have been as much as 4 to 5 feet (1.2 to 1.5 m), based on the length of cable released. Slight damage to the test structure and a handrail mock-up was also sustained. The Software, Robotics and Simulation Division (ER) initiated an internal investigation, which included running two tests on the hardware with no test subject, interviews with test participants and a preliminary report. A close call was filed to record the incident.

    On January 21, 2013, an Engineering Investigation Board was convened by the Director of Engineering to investigate the close call and identify the causes, and any contributing factors relating to the close call. The team was also charged with developing recommendations to prevent a similar incident.

    The Investigation Board found that the incident was most likely caused by partial gearbox binding/jamming causing an undesired motor controller response. The motor controller is COTS hardware, and little to no information was provided by the vendor in regard to how it performs its function, essentially a black box in the system control loop. In combination with the hardware design issues found during the investigation, the controller commanded a high velocity downward motion, resulting in the test subject free falling onto a test mockup (handrail).

  • 2

    2 Acknowledgments The Board would like to acknowledge the many discussions with Larry Dungan and Paul Valle regarding the ARGOS. Without their help, answers to endless questions, and thorough knowledge of the ARGOs, the Board could not have completed the investigation.

    The Board would also like to thank several consultants to the Board; Monty Carroll and Ray Morales for their invaluable expertise on control systems, Duane Pierson and Linda Shackelford, from the Institutional Review Board (IRB) for their discussions regarding human safety and hazardous environments and Irene Piatek and Charlene Curtis for discussions on Engineering Project management work instructions.

  • 3

    3 Background On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was unintentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test participant fell approximately 12 to 18 inches (30.5 to 45.7 cm), 4 to 5 feet (1.2 to 1.5 m) of wire rope was driven off the drum in the vertical (-Z) direction during this event. The test began normally and approximately nine minutes into the test the drop occurred. At the time the participant was translating along a handrail, simulating microgravity, in a horizontal position, body parallel to the ground, see cover page image. The handrail was mounted on pallets, raising the handrail approximately 24 to 36 inches (61 to 91 cm) above the floor. The participant landed on the handrail, permanently bending the handrail, See Figure 1.

    Figure 1: Handrail Involved in the Incident

    The drop of the participant was followed by a slight roll to the right and then the gimbal mechanism fell on top of the participants back. The entire event took approximately 0.5 seconds. The participant had minor injuries (bruising), with no medical attention required, and minor damage to the test facility was incurred, bent handrail. The incident was classified as a close call. The test was terminated, the suit and ARGOS personnel, assisted the participant out of the ACES suit.

    The ARGOS team initiated a preliminary investigation. Two tests were conducted, which were approved by Test Safety Officer and Software, Robotics and Simulation Division Management. With no weight on the system, the ARGOS configuration GUI was launched and the system was enabled to check the motor controller. After enabling, the system drifted downward slowly about 2 inches (5 cm) during the first two seconds. Then it suddenly moved downward rapidly for 15 inches (38 cm) before a manual emergency stop was initiated by an operator. Next, the cable drum was turned manually by hand and was subjectively noted to be stiffer and difficult to rotate than normal. The ARGOS was then manually jogged, unloaded at 30 rpm, motor velocity, both up and down using the ARGOS Configuration GUI in the unsuited gear ratio. Again, the ARGOS engineers noted that during the jog

  • 4

    commands, the system exhibited abnormal behavior as follows: 1. Decreased ability to hold constant velocity, 2. Sluggish acceleration and 3. Sluggish deceleration.

    During the event the safety stops which were enabled, all failed to stop the test subject from impacting the handrail. If the handrail had not stopped the drop, the test subject could have been dropped as much as 4 to 5 feet (1.2 to 1.5 m).

    4 Investigation Board Objectives The Boards primary objective is to gather the facts and identify the cause(s) and contributing factors relating to the ARGOS incident and to recommend appropriate actions to prevent a similar incident from occurring again. The Board was comprised of members from the Engineering Directorate, Safety and Mission Assurance Directorate, Mission Operations Directorate and Crew Office, see appointment letter EA-13-001.

    5 ARGOS Description The goal of the Active Response Gravity Offload System (ARGOS), shown in Figure 1, is to develop the technology for a facility to simulate reduced gravity environments found in low earth orbit, in proximity of asteroids, and on lunar and Martian surfaces. ARGOS is used to evaluate unsuited and suited human performance of ambulation and exploration, EVA tasks at different offloads and with different interfaces, including the use of various gimbals and harnesses. The various tasks intended to characterize human performance on ARGOS include treadmill walking, incline walking and jogging; over ground walking; jumping; exploration type EVA tasks; and other dynamic movements of the human.

    The project started initially with an X (horizontal translation) and Z-axis (vertical translation/offload) commercial off the shelf (COTS) mechanical system that expanded to a large X and Y (horizontal translation) and Z-axis (vertical translation/offload) system. A custom Z-axis mechanical system was then designed, prototyped, and tested which was then combined with the COTS X and Y-axis mechanical system.

    The evolution of ARGOS mechanical systems is shown in Table 1. The NASA Standards for Lifting Devices and Equipment, Doc #: NASA-STD-8719.9, and other industry standards (ASME B30.2, Overhead and Gantry Cranes and ASME B30.5 Mobile and Locomotive Cranes), have been used as guidelines but there is no Voluntary Consensus Standards that specifies the design or operation of a ground based human rated robotic system. Testing of the system started with simple activation of the motor, progressed to static weight testing, utilization of a Stewart platform, and finally a human in the loop. Performance data collected from ARGOS Generation 1 led to development of an improved Generation 2 ARGOS.

    COTS X and Z Human Interaction Testing 8/26/2008

    COTS XYZ Human Interaction Testing 2/13/2009

    Generation 1 Custom Z Non-Human Interaction Testing (Stewart 6-DOF platform gait simulator)

    4/9/2009

    Generation 1 Custom Z Human Interaction Testing 7/24/2009

    Generation 1 Custom Z Human Attached Testing 9/28/2009

    COTS XY, Gen 1 Custom Z Human Interaction Testing 1/19/2010

    COTS XY, Gen 1 Custom Z Human Attached Testing 4/8/2010

    Generation 2 Custom XY Human Interaction Testing 6/20/2011

    Generation 2 Custom Z Human Interaction Testing 9/30/2011

    Generation 2 Custom XYZ Human Interaction Testing 11/7/2011

    Generation 2 Custom XYZ Human Attached Testing 3/1/2012

    Table 1: ARGOS Evolution of Mechanical System

  • 5

    The Generation 2 ARGOS system has two different gear ratios (Unsuited and Suited). The Unsuited gear ratio provides the capability to offload up to 300 lbf (1334 N) with high dynamic capabilities. The Suited gear ratio provides the capability to offload up to 750 lbf (3336 N) with low dynamic capabilities. The system works by providing a constant force offload through an overhead motion control system. The Generation 2 ARGOS system provides a wider range of capabilities for robotic, rover, and human space flight testing. The following sections provide descriptions of the major sub-systems.

    Figure 2: ARGOS System Picture

  • 6

    E-stop System

    An emergency shutdown can be activated by the following:

    Manual activation of the e-stop by the test team. Automatic activation of the e-stop by the motor controller in the event of system fault requiring an

    emergency stop.

    Automatic activation of the e-stop by the limit switch system. In each direction of travel the system is equipped with two limit switches as required by the NASA crane standard. This e-stop can only occur if the first limit switch has failed.

    Subject Force Input

    Due to the dynamic movement capabilities of ARGOS, forces can be induced into the person whose weight is being offloaded in the event that an emergency shutdown is required. These forces will not be seen during normal operations. The analysis of potential forces is very difficult and a very conservative worst cases analysis indicates forces could reach approximately 2698 lbf (12kN). The ARGOS team worked with human performance experts to determine the force levels of a world class athlete jumping upward and have the system e-stop activation occur at the worst time, just after leaving the ground. The probability of this is very small and most people or systems could not achieve the required kinetic energy. However this case was considered and the hazard controlled. Figure 3 illustrates the components in the lifting path with the exception of the gimbal assembly.

    Figure 3: Inline Lifting Components

    The OSHA limits for fall protection at the hook attachment point are 1800lbf (8kN) (OSHA 29 CFR Parts 1910 and 1926). To prevent these forces from transferring into the human, a Yates shock absorber (shown in Figure 4), a COTS product utilized in climbing fall protection, is installed in line with the lifting cable. The Yates part number is 602. The shock absorbers deploy when forces exceed 450 lbf (2kN). The 450 lbf (2kN) is based on the manufacturers design and data which was confirmed with deployment tests. The forces into the human or robot would not exceed 450lbf (2kN) and is one-fourth the allowed OSHA forces. Over the past four years of testing there has not been a deployment of these devices during human testing.

  • 7

    Figure 4: Yates Shock Absorber

    Series Elastic Actuator (SEA)

    A series elastic actuator (SEA) provides spring and dampening in the load path. SEAs adds a spring with a known spring constant in series with manipulators to increase compliance and decrease natural frequency. This spongier manipulator results in better force control allowing improved tuning of the system and increased stability. A COTS product from Festo Inc, shown in Figure 5 is utilized. This product is actually a pneumatic muscle being used in a constant pressure application. This device was evaluated and determined to not be a pressure system. Festo muscle is used in the load path with a load rated choker in parallel. There are two Festo muscle lengths that can be utilized and any combination may be placed above and below the load cell. ARGOS currently uses two Festo muscles in line with the load cell (one above and one below).

    Figure 5: Spring/Damper Festo Muscle

    Load Cell

    An STI load cell, shown in Figure 6, with an amplified output provides the force measurement. The cable is double shielded and the electronics are housed in a metal box to decrease electromagnetic interference. A programmable anti-aliasing filter is utilized as a low pass filter to eliminate aliasing issues between the load cell and a/d converter. The force measurement is sampled every millisecond for input to the control logic which adjusts the output velocity of the motor needed to maintain desired off-load force throughout load disturbance.

  • 8

    Figure 6: STI Load Cell

    Gimbal

    The Versatile Neutral Capability Horizontal Interface (VNCHI), shown in Figure 7, is attached to ARGOS via the Festo muscle and to the suited subject. Other gimbals and harness setups are available/utilized depending upon test objectives.

    The VNCHI gimbal design is intended to connect a human test participant to the ARGOS in the horizontal position for microgravity simulation. The intent of the VNCHI gimbal assembly is to have a system that provides roll, pitch, and yaw rotations about the test participants center of gravity (CG) while connected to ARGOS in the horizontal position. The gimbal attaches rigidly to the test participants hang-gliding harness, which the participant lays in securely. There are adjustments to align the participants CG with the lifting path, so the CG is always centered under the ARGOS cable. The gimbal consists of custom Aluminum 6061-T6 and 1515-5 PH Stainless Steel parts with COTS bearings and fasteners.

    Figure 7:VNCHI Gimbal Assembly

    Emergency Egress

    In the event of a power outage or system failure that prevents the function of ARGOS the test participant will be removed from the system by a rolling stair case ladder. If the treadmill is being used at this time, a small stair case ladder will be placed on the treadmill deck and the participant will walk down the ladder. For a power outage with a robotic system the load will be treated as a suspended load and removed after power has returned to the facility.

    In the event the test participant becomes injured and is unable to walk down the stair case ladder a Sky Genie variable descent device will be deployed to lower the person to the ground. The Sky Genie was used by the Space Shuttle program for crew member emergency egress from the orbiter. It is shown below in Figure 8. Prior to each use the Sky Genie hardware, rope, and cables are inspected for cuts, frays, broken strands, or other visual damage. The rope is changed out after two years of use, and has a shelf life of 5 years. The attachment point onto the z-axis

  • 9

    is rated for a 4945 lbf (22 kN) load as required by OSHA and the vendor documentation. The Sky Genie is attached to the z-axis and lifting path by locking carabiners. The Sky Genie is a controlled descent device and not intended for use as a fall protection system.

    Figure 8: Genie Assembly without Rope

    For testing with individuals in space suits or other loads where the preference may be to remove the load with the man basket instead of the Sky Genie a 4 x 8 feet (1.2 x 2.4 m) COTS man lift backset attached to the fork lift is used to lower the load to the ground. Personnel in riding in the man basket are required to wear fall protection equipment. When required this equipment and a certified operator are required in the ARGOS area during the testing.

    Controller

    See Controller System, Section 6.5

    Electronics

    See Electronics System, Section 6.6

    Software

    See Software, Section 6.7

    Mockups

    In the ARGOS test area several floor mockups are used to simulate space station hand rails, bolt torquing, different rock surfaces and interactions. These mockups are moved in/out of the test area as needed. These mockups do include rocks and the hazards associated with handling rocks. The use of hand tools and battery powered drills are part of the tasks conducted with these mockups.

  • 10

    6 Investigation 6.1 Interviews Limited interviews were conducted as witness statements were taken by the ARGOS team immediately after the close call. Two interviews were conducted. The first interview was with the subject crew member in the close call. This test run was the first experience for the test subject in the ARGOS. So there was no comparisons he could draw on as far as how the system behaved. He also stated that since he was in a modified ACES suit, with headphones on, he was insulated both physically and from external noises. He said although he was dropped 12 to 18 inches (30.5 to 45.7 cm) onto the handrail, and that the harness fell on top of him, he was not injured. He did experience a fairly good impact on the face plate of the helmet that hit his jaw. He also stated that it was difficult to tell where resistance came from, as far as the suit (pressurized at 4.3 psi (29,650 N/m

    2)) or ARGOS when

    translating. Just prior to the incident, he was translating along the handrail, using both pull and push, but was not commanding a downward motion.

    The second interview was with Safety and Test Operations Division, subject matter expert on lifting requirements in NASA Standard 8719.9. Most of the requirements in this document were believed to have been met by the ARGOS team but there are some exceptions, specifically with the control system design and the limit switch configurations. The Board members and the subject matter expert did agree that ARGOS has unique performance requirements and Chapter 4 was the closest fit lifting system in NASA Standard 8719.9 in terms of providing guidance to the design team.

    6.2 Mechanical System For the purposes of this investigation, only the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) will be discussed. See Figure 9.

    The Heavy Lift Assembly is a NASA designed electromechanical system whose basic function is to raise and lower a suspended object or human in response to commands issued from a force feedback control system. The object is suspended via a Hoist Cable wrapped around a spiral cut Drum which can rotate and translate. The rotation of the drum provides the change in object elevation, while the translation (synchronized to the spiral lead) maintains a constant cable exist point and prevents cable layering. The assembly contains redundant fail safe brakes and an integral servomotor brake that will engage to prevent Drum rotation when power is removed.

    Connected in series to the Drum, is a constantly meshed two-speed transmission. The transmission makes use of helical cut gear sets to reduce vibration and driveline noise so that disturbance inputs to the force feedback control system are minimized. The transmission contains two manually selectable gear ratios:

    1. Unsuited Gear Ratio: This ratio is used for objects whose mass is less than 300 lbf (1334 N) 2. Suited Gear Ratio: This ratio is used for objects whose mass is less than 750 lbf (3336 N)

    **The use of Suited/Unsuited does not describe the configuration of the test object.

    The gear ratios have no synchronization and require complete offload before selection. The selection mechanism is comprised of a Shift Fork connected to a Clutch Plate with anti-friction nylon pads. The Shift Fork moves the Clutch Plate between the desired gear ratios by use of a spline drive, driven externally and manually by a Gear Selector Knob. Positive indication of transmission engagement is accomplished visually by locking the Gear Selector Knob into position and electronically by end of travel limit switches.

    Connected to the transmission is an AC servomotor manufactured by Kollmorgen, driven by an off the shelf motor controller and commanded by a NASA designed control system. The control system receives object position data from an absolute encoder geared off the Drums rotation shaft, an integral AC motor encoder and two Drum end of travel limit switches.

  • 11

    Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview

    6.3 Hardware Inspection On January 28, 2013 the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) was removed from the Heavy Lift X-Axis Assembly (ARGOSSTE502) and placed on a disassembly bench in NASA JSC Building 9. The hardware inspection team was comprised of all Board members and representatives from the Software, Robotics, and Simulation Division (ER). The goal of the inspection was to evaluate the hardware for any signs of binding, seizing or jamming using the following approach:

    Prior to disassembly, measure the systems running torque, under no load Perform a complete visual inspection of all assessable rotating parts Develop a focus area, comprised of major components most likely to cause mechanical failures Disassemble the items within the focus area incrementally to allow for visual inspection and photography

    Shown below in Figure 10, is an exploded view of the Heavy Lift Assembly and identification of the focus area and its major contributors. Each major component is identified using an item number for reference in subsequent discussions.

  • 12

    Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View

    6.3.1 Running Torque Measurements

    Initial inspection of the hardware using external torque measurements was performed with the unit intact. Using a calibrated dial type toque wrench, both break away and running torque measurements were taken on the output shaft (Drum rotation axis) for various positions of the Gear Selector Knob, See Figure 11. Measurements were performed by Board member, Joe Anderson, with care taken to minimize inertial loading onto the measurement device.

    Figure 11: Torque Measurement Site

    The position of the Gear Selector Knob was varied between different states to evaluate drag from the selector mechanisms rigging. Measurements for each knob position were repeated a minimum of three times and the averages are presented in Figure 12. The results show the lowest amount of torque was achieved in the neutral position (fewest rotating components) and that the Suited and Unsuited locked gear selections resisted with

  • 13

    approximately 30 inlbf (3.4 Nm). A noticeable change in torque measurement was seen when the Unsuited selection was toggled between locked and unlocked. The cause was due to a rigging method that allowed the internal Shift Fork (Item 11) to be preloaded against the rotating Clutch Plate (Item 5) such that frictional drag was introduced into the gear train. Other that than the friction effect noted, no other anomalies were discovered and the gear train rotated smoothly under no load.

    Figure 12: Torque Measurements for Various Knob Positions

    6.3.2 Incremental Disassembly and Sample Collection

    Following the torque measurements, the Board tasked the ER design team (Paul Valle and Dian Poncia) to start disassembly. During the course of the disassembly, an incremental process of component removal followed by visual inspection and material sampling was used. The following collection of images (Sites 1 10) is used to show the areas of the gear box that were noted as critical inspection points and where specific material samples were taken. See Section 8.2 for a detailed chemical analysis of the collected samples. Refer to Figure 10 for item number references.

  • 14

    Inspection Site 1: This site contained excess lubricant and particulates on the Clutch Plate (Item 5). This area was of particular interest due to the increased running torque recorded during the pre-disassembly torque tests.

    Inspection Site 2: This site was used to obtain a fresh grease sample for use in setting a baseline for subsequent materials evaluation.

    Inspection Site 3: This site contained additional particulate debris on the Clutch Plate (Item 5). The Clutch Plate area is of particular interest as it is used to transmit motor loads to the two available gear sets. Due to the close proximity of rotating components and their inherent misalignments, the probability for mechanical interference and debris generation is increased in this area.

  • 15

    Inspection Site 4: This site contained grease and residue from the interaction of the Output Gear (Item 1) and the Unsuited Gear (Item 2).

    Inspection Site 5: This site contained grease and metallic debris caused by unintended contact between the Suited Gear (Item 4) and the Snap Ring (Item 10).

    Inspection Site 6: This site contained grease and particulates from the interaction of the Suited Gear (Item 4) and its adjacent Thrust Washer (Item 7).

  • 16

    Inspection Site 7: This site contained grease and metallic particles generated from dithering action between the Shifting Shaft (Item 6), its drive gear and a closeout snap ring. By design these items are keyed to permit torque transmission, however excess clearance and hardness mismatch lead to galling and wear. Inspection of the design identifies a spiral retaining ring to be installed in indicated position, actual hardware had an open snap ring.

    Inspection Site 8: This contained a small piece of plastic debris (Delrin) located on the RH Torque Spline (Item 12). This debris was most likely dislodged from the splines nut located on the Shift Fork (Item 11). Inspection of nut shows signs of wear, but no significant failures.

    Inspection Site 9: This contained a piece of plastic debris (PVC and Kapton) located on the bottom of the gearbox housing. Debris generation site is unknown and not seen as an incident contributor.

  • 17

    Inspection Site 10: This site contained metallic debris generated by the interaction of the Hoist Cable (Item 13) and the spiral cut drum. Post inspection of the drum and cable showed no signs of detrimental wear or erosion.

  • 18

    6.3.3 Detailed Inspection of Major Components

    After disassembly, the major components from the focus area were sent to the Structural Engineering Division (ES) for a closer examination:

    Table 2: Listing of Major Components within Focus Area

    As mentioned earlier, items not listed in the table above such as the Drum, Linear Guides, Motor, and Radial Ball Bearings were deemed non contributors to any gearbox faults. The ER division was left in control of the non-listed items, however they were asked to not perform any side investigations. Presented below is the summary of the major findings from the examination of the items listed above. See Section 8.3 for the complete listing of findings. Refer to Figure 10 for item number references.

    Item 1, Output Shaft 36T Gear, ARGOSZAD471: The face and outer teeth edges of the Output Gear show significant signs of wearing and chipping due to unintentional contact with Unsuited Gears Dog Plate (Item 2).

  • 19

    Item 2, 18T Gear Assembly, ARGOSZAD448 (Unsuited Gear): The face and outer edges of the Unsuited Gears Dog Plate showed signs of unintentional contact with the Output Gear (Item 1).

    Item 3, Rush Gear 36T, ARGOSZAD455: The face and outer teeth edges of the Rush Gear 36T show significant signs of wearing and chipping due to unintentional contact with Suited Gears Dog Plate (Item 4). Furthermore, the gears shaft experienced .03 in. (.076 cm) axial free play, further increasing the contact potential.

    Item 4, 15T Gear Assembly, ARGOSZAD446 (Suited Gear): The face and outer edges of the Suited Gears Dog Plate showed signs of unintentional contact with the Rush Gear 36T (Item 3).

  • 20

    Item 5, Clutch Plate, ARGOSZAD450: The Clutch Plates annular sector shaped cutouts (6X) show signs of uneven loading. Load contact patterns generated by the Dog Plate Teeth are located on the radial face, the inner diameter surface and the outer diameter surface ideally all six radial surfaces would be equally loaded. Uneven loading causes overturning moment loading to exist on both the Unsuited (Item 2) and Suited (Item 4) Gears. Unaccounted for moment loading reduces needle bearing life and causes misalignments leading to the mechanical interferences seen on the Unsuited and Suited Gears (Items 2 & 4), the Output Gear (Item 1) and the Rush Gear 36T (Item 3).

    Item 6, Shifting Shaft, ARGOSZAD442: The Shifting Shaft shows signs of the following:

    Uneven loading from the Unsuited (Item 2) and the Suited (Item 4) Gear Needle Bearings due to incompatible diameter sizing

    Surface Brinelling due to needle bearing edge loading Surface wear due to incapable surface hardness

  • 21

    Items 7 and 8, Thrust Washers, 7421K26 & 7421K29: The Thrust Washers used to isolate the Unsuited (Item 2) and the Suited (Item 4) Gears from the Shifting Shaft (Item 6) experienced wear from exposed Dog Plate fasteners.

    Item 9, Key, ARGOSZAD494: The Key is used to anti-rotate the Shifting Shaft (Item 6) with respect to its drive gear. The key was hand fit during assembly to a length that allowed it to become lodged under the Unsuited Gear (Item 2). The interference is not a contributor to the incident, since no relative motion occurs between the Shifting Shaft and Unsuited Gear during Unsuited Gear operations. The interference will only be problematic for Suited Gear operations.

  • 22

    Item 10, Snap Ring, VS-100: Excessive clearance between Suited Gear (Item 4) needle bearings and the Shift Shaft (Item 6) caused the snap ring to be side loaded with relative motion.

    Item 11, Shift Fork, ARGOSZAD465: The Shift Fork contains anti friction pads (nylon) to engage the Clutch Plate (Item 5). During pre-disassembly running torque measurements, it was noted that the Shift Fork was preloaded into the Clutch Plate during Unsuited operations. The effect of this preload is apparent when examining the nylon pad wear patterns.

  • 23

    Item 12, RH Torq Spline, ARGOSZAD467: The Torq Spline is used to drive the Shift Fork (Item 11) between the Unsuited (Item 2) and the Suited (Item 4) gear selections showed no signs of failure or wear. Upon examination of the Spline mount design, an unintentional clamp up at Location A as well as an interference at Location B are possible.

    Item 13, Hoist Cable, AI 4FZC: The Hoist Cable which interfaces with the Drum and the test participant showed no signs of failure or wear.

  • 24

    6.4 Fault Tree The Board was not chartered to create an independent fault tree for this incident. However, the Board did review the fault tree that the ARGOS Project generated as shown in Figure 13.

    Rapid Descent of Crewmember in ACES space suit while testing in

    micro-g

    Crew member impacted hand rail

    Caused by

    Output drum could not rotate

    Manual turning of the drive determined that higher forces than expected were required

    Shift fork is misaligned

    Other unknown or new failure mode

    Stop:

    Visual inspection of gear box, damaged metal finish

    Incorrect Software

    Stop: System software was verified correct during startup and

    verified again after incident. (screen shot available)

    Trick Software Failure

    Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally

    ARGOS Came Out of Gear

    Stop: After event shift know was still locked in place. Microswitches did not indicate a out of gear

    Out of gear check between encoders was not activated

    CAN network communication

    failed

    Stop: NODE Guard error checking did not detect an error. All data is correct

    Z-axis electronics box failure

    Stop: All data is correct. Load cell and encoder data was correct when checked after the event

    Output Shaft Encoder Failure

    Stop: All data is correct. Encoder data was correct when checked after the event

    Load Cell FailureStop: All data is correct. Load cell data was

    correct when checked after the event

    Power Outage or Sag

    No F16 was received on motor controller and data indicate motor operation

    Safety System Failure

    ARGOS safety system performed as designed during the event

    Shutdown of the motor controller

    output stage

    Data indicates the motor controller was active during the free fall

    Failure of the ARGOS Brakes

    ARGOS brakes were not activated during the free fall. They did lock when commanded

    Position data indicates no motion in the system

    Motor controller increased current to

    the motor

    Binding of Gear Box Came Free

    Feedback from the motor encoder

    indicated no motion

    Based on data

    Could motor controller be tuned

    different to prevent?

    Motor controller operated as designed/

    programmed

    Could trick software be modified

    High Friction, binding, burr, or

    failure of shift fork pads on shift fork to

    clutch interface

    Inspection of gear box required

    Motor commanded downward by

    software outside fo the control loop

    Z-axis control loop went unstable

    Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally

    Electronics failed due to EMI

    Motor controller and motor failed

    due to EMI

    Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally

    Electronics had been tested known EMI sources in the building and a custom load cell was developed to prevent

    EMI interference. EMI filters are present on all power input lines.

    Electronics had been tested known EMI sources in the building

    Faulty Cables

    No motor controller faults were present indicating cable failure. No can bus failure was present. Data

    transmission was correct after event. Motor powered properly after event

    Figure 13: ARGOS Project Created Fault Tree

  • 25

    The following is a list of general findings from the Projects fault tree:

    The fault tree correctly identified the fault, Rapid Descent of Crewmember in ACES space suit while testing micro-g

    There are 14 level-1 causes in the tree Of the 14 level-1 causes, the project decided to only work on 2 of the 14 paths which are related to

    binding

    Our interviews observed that the project is biased to binding being the causal path The project needs some expert facilitation with the development of a fault tree and associated root cause

    analysis

  • 26

    6.5 Control System The Board was chartered to review the ARGOS z-axis control system and controller to determine if the incident resulted from a controller failure or non-modeled event. Findings and recommendations are to be reported; however, the Board is not chartered to resolve issues nor design a controller. To satisfy these goals the Board met with the ARGOS control engineers and researched pertinent documentation. The investigation results follow.

    An executive summary of the Board findings concerning the controller performance concludes there is insufficient data to state the controller response was erroneous or the controller was unstable. A major contributing factor leading to this conclusion is the proprietary control logic for the motor controller/motor therefore no knowledge of what was occurring in this unit during the incident could be ascertained. Also, no control system simulation was developed therefore analyzing off nominal conditions such as binding and its effects could not be performed. Finally, recorded test data was insufficient, did not include outputs that were required to characterize control

    response. In the absence of sufficient testing, modeling and vendor information, the rationale for a rapid downward controller command is indeterminate, and will be discussed in this section.

    Investigation of the ARGOS z-axis control system and controller is based on the criteria of meeting design, development, testing and evaluation (DDT&E) processes. The DDT&E process elements includes: 1) detailed block diagram of the control system; 2) develop simulations for time domain and stability analysis; 3) define performance and stability requirements; 4) develop test matrix to analyze stability, performance and verify requirements; 5) documentation of all work.

    After meeting with the ARGOS control engineers it was determined there is no dedicated control systems document, no detailed block diagrams, no simulation of the control system. A high level description of the control system is presented in document SRSD-11-016 Failure Modes and Effects Analysis (FMEA) Active Response Gravity Offload System Generation 2. Basically the control system (Figure 14) consists of a proportional/derivative (PD) outer loop, inner loop consisting of the motor controller and motor utilizing a proportional, integration, derivative (PID) logic, a load cell registering the force on the cable. The inner loop is a black box with the PID controller being proprietary therefore there is little insight into its makeup. Additional elements include the gearbox, cable drum, encoders, a/d converters, and saturation limiters, hysteresis, latency, and converters.

    Figure 14: High Level Control Loop

    A detailed block diagram of the control system is a critical early design step and is required for analysis prior to human rating the hardware. This should been done with or without human rating. The block diagram defines all critical control loop parameters needed to complete the design of a system. Without the block diagram the control loop parameters cannot be determined correctly. Even with a Black Box in the loop it is possible to characterize the system (at least to prove stability) with testing. It can then be determined if the safety systems are fast enough to protect the test subject and if the bus speed is sufficient to provide communications between the blocks without causing delays. During discussions with the ARGOS control engineers it was learned the control system was tuned by lowering the gain on the inner loop (black Box) and adjusting the outer loop gains until desired performance was attained. A concern is since the gain is low on inner loop the gain on the outer loop has to be higher to drive the controller during rapid changes in speed. This can lead to saturating the amplifier on the

  • 27

    previous stage resulting in a nonlinear response. To fully characterize the control system the inner loop PID knowledge is required or at a minimum construction of a transfer function. The ARGOS team stated they tried but could not develop a model due to non-linear response. It is the Boards recommendation that the ARGOS team contact control system engineers in other Engineering Divisions (Aeroscience and Flight Mechanics and Structural Engineering Divisions) in an effort to develop a model.

    An ARGOS simulation of the control system is required to conduct performance analysis with off nominal conditions and failures, Monte Carlo runs, stability and frequency response. The simulation will support the ARGOS certification and provide insight into system response for off nominal conditions. If a simulation had been developed then a reenactment of the incident could have been run to observe the system behavior. Thus pointing to the cause and a potential work around. A simulation will require an ARGOS detailed block diagram with representative modeling of the elements. The ARGOS team decided to build the actual unit and test with it. There is a limit to what can be tested on the ARGOS unit, frequency of test, and data gathering. It is the Boards recommendation that the ARGOS team develop a simulation that can characterize the system by performing frequency response, stability analysis, constrained motion testing, interaction between horizontal and vertical controller, and Monte Carlo runs.

    It was difficult to find documentation of the control system how it was developed and finally verified. In some cases detailed documentation did not exist. There was a test matrix for the ARGOS unit; however, since it was applied to the actual unit there were limitations to what could be tested. It is uncertain that all the control system performance and stability requirements were tested on the ARGOS unit. Again the need for a simulation can be argued for. Detailed documentation for all levels of the DDT&E process should be completed. Without this documentation it is nearly impossible to reconstruct the control system and the expected performance.

    Applying the above observations to the incident explains the indecisive result. What can be backed out from the data available is the motor was sending a command to move but no motion was seen (possible binding). It is assumed the inner loop PID controller (black box) continued to increase current to the motor until it broke loose. It is possible there was wind up on the integrator term therefore once the binding was overcome it took the system time to respond. This is a guess since the PID block diagram is proprietary. It is also plausible with a binding condition the controller could have been unstable however no way of determining it. Implementing a DDT&E process as outlined above will mitigate or reduce the possibility of this type of event occurring.

  • 28

    6.6 Electronic System The ARGOS Z-Axis electronic system consists of 3 subsystems as can be seen in the Figure 15 below.

    Interface to Load

    MotorMotor Controller

    Gear Box

    Computer running Trick Simulation

    CAN bus

    Power Distribution5VDC 3 Phase 208VAC 24VDC 24VDC

    Load Cell

    Load Cell Amplifier, A/D Converter, and CAN

    Converter

    E-Stop System

    Limit Switches

    Brakes

    Horizontal System

    RS232

    CAN bus

    A/D Converter and CAN Converter

    Gear Selector Switches

    Encoder

    CAN bus

    Control Loop

    Safety System

    Safety System

    Control Loop

    Data Collection

    Node Guard

    WatchDog Timer

    Fault Detection

    Node Guard Heartbeat

    Node Guard Heartbeat via CAN bus

    Safety System

    Fault Detection

    Figure 15: Electronics System Block Diagram

    The following sections give a brief description for each of the 3 subsystems within the Z-axis system

    6.6.1 E-Stop System

    This system is a dedicated safety system which monitors fault status from the Motor Controller and senses the upper and lower crane limit switches and position encoder and performs safety hazard controls (i.e. outputs to the lifting system brakes and disables the motor controller).

  • 29

    6.6.2 Z-Axis Motor Controller

    This system is a COTS system supplied by the vendor of the Z-axis motor. The system consists of 2 closed loop control systems implemented via complex electronics. One is a PID control loop and the other is a motor current control loop. The COTS system does not provide any electronic mechanism for time synchronizing the 2 internal control loops with the outside world.

    The system provides major external status/control interfaces which the overall ARGOS electronic system uses as follows:

    Discrete fault output and enable input This interface is used by the E-Stop System to perform emergency stops via the brakes and also disables

    the motor controller via the enable input.

    6.6.3 CAN Bus Interface

    This interface is used by the Trick Simulation to send motor control commands and to receive available status from the motor controller.

    This interface is also used to disable the motor controller when faults are detected.

    NOTE: Even though the CAN Bus is a very deterministic interface (i.e. time synchronized), the motor control loops are not synchronized with the Trick simulation computer or software.

    6.6.4 Computer Running Trick Simulation

    The computer which runs the ARGOS software is a COTS computer. There are 5 interfaces which are used to perform the outermost ARGOS control system and data acquisition functions by the Trick Simulation software.

    Z-axis Motor Controller Interface

    This is the CAN Bus interface as described above in the Motor Controller section.

    Z-axis Load Cell Interface

    This interface is accomplished via a CAN Bus enabled A/D converter and a load cell amplifier.

    Z-axis Gear Selection Switches Position Interface

    This interface is accomplished via a CAN Bus enabled A/D converter to read the position of the gear selector.

    Z-axis Drum Encoder Interface

    The ARGOS system has a position encoder separate from the one internal to the Z-axis Motor Controller. This interface is also a CAN Bus.

    X-axis and Y-axis Horizontal System Interface

    This computer also interfaces to the X-axis and Y-axis control systems via asynchronous RS-232 digital interfaces

    The next section on software will address any computing resource limitations for this computer.

    There were no electronic system design deficiencies found by the Board. However, due to the lack of time synchronization between the two internal control loops of the Z-axis Motor Controller and the overall Trick Simulation control loop does present a challenge to the overall control system modeling effort. Please see the Control System section for related control system modeling findings.

  • 30

    6.7 Software

    6.7.1 Background

    The case of a software fault causing the motor to unintentionally drive down at maximum velocity is investigated in this section of the report.

    The ARGOS Software is under development by NASA using the Trick Simulation environment to provide force feedback control system functionality as well as certain system safety parameters required to operate the ARGOS. This software is used by the ARGOS console operator to perform system setup, operation and some of the emergency fault detection and response.

    The scope of this analysis was to determine the following:

    Approved software was in use on ARGOS during the incident Approved software followed the ARGOS configuration management plan and all modifications were

    approved per the plan

    Regression testing performed on ARGOS software safety functions Software Fault Detection Logic Test data recorded during the incident was representative of the system parameters being measured Findings and recommendations to apply to the ARGOS software development process see Section 8.2.3

    6.7.2 ARGOS Hoist Control System Background

    The overall ARGOS Hoist control system works to maintain a target offload force in the lifting cable, which results in a reduced gravity (or microgravity) simulation for the test participant. The two key components of this control system are the Trick Simulation ARGOS controller and the Kollmorgen Servostar S620 motor controller, which work in conjunction with various sensors to consist of the overall ARGOS control system (Figure 16). All control system calculations outside of the motor controller make up the ARGOS Controller written in the Trick Simulation Environment.

    Figure 16: ARGOS Control System Block Diagram

    The ARGOS controller is implemented with the NASA Trick Simulation environment running on a Linux Cent OS workstation. The computer control is running a one millisecond control cycle commanding a Kollmorgen Servostar S620 motor controller over a CAN Bus network. The Trick Simulation provides most of the system integration to read the cable tension, output drum encoder, gear selection switches and communicate with the s620 motor controller. Figure 16 is to be viewed as a high level loop of the controlling components in ARGOS. Multiple controllers are embedded into the ARGOS controller and Motor controller. These are discussed in the controls analysis of the investigation report.

  • 31

    6.7.3 Software Validation

    The software validation evaluates the ARGOS Controller block described in Figure 16. This software is the NASA developed Trick Simulation performing the force feedback control logic and a number of fault detection scenarios. The ARGOS software falls under the requirements of Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A). The plan outlines requirements for use of version control software to managed released versions of production code. All software modifications on ARGOS are approved through a Test Readiness Review (TRR) prior to any human off-load testing. The software version being run is verified on a daily checklist performed prior to each day of operations on ARGOS.

    The approach to verify that approved production software was utilized on the ARGOS during the incident included evaluation of the configuration management plan steps being followed, documentation that the approved software was running, and ensuring source code modifications were approved by a TRR.

    The requirements per the ARGOS CM plan (SRSD-08-005.A) include the following:

    Software changes are approved through TRR Software release is given a version description and control number and is managed with a software

    version control application

    A change request is processed by the ARGOS Configuration Control Board

    An ARGOS operations daily checklist (Reference 9) ensures that the production software executable is selected when running the ARGOS Control Software.

    A common Linux application was utilized to perform a difference check between the ARGOS source code prior to and after the most recent TRR that approved software modifications.

    6.7.4 ARGOS Software Configuration Management

    Based on interviewing the ARGOS software developer, the most recent change to the ARGOS software was the addition of using the output drum encoder velocity as a biasing term in the controller to maintain the current velocity. This modification was approved by the ARGOS EMU TRR conducted on 11/26/2012. The information provided in the TRR is shown in Figure 17 from the TRR slides. This provided a baseline for checking that the ARGOS daily checklist was updated to the current software version and the version was operating on the ARGOS Control Computer. The daily checklist, Figure 18 and Figure 19, show that the operator confirmed the verification step. The ARGOS computer screen capture after the incident is also consistent that the software path was opened correctly (Figure 20).

    NASAJohnson Space Center

    Engineering Directorate

    NAME:

    DATE: PAGE:22

    Larry K. DunganSUBJECT:

    November 2012

    The z-axis software has been updated to improve the realism of the offload simulation New variable allows motor velocity to influence continued motion

    of the system ie. Allows the load to coast until the equal and opposite force is

    received All safety systems and controls are unchanged

    Motor velocity graph was changed from motor RPM to linear velocity

    Software has been fully tested with load Software has been revised and released per the ARGOS

    CM plan Procedure has been updated for new steps and revision

    number

    Software

    Figure 17: Slide from EMU TRR Software

  • 32

    Figure 18: ARGOS Startup Checklist from day of incident

    Figure 19: ARGOS Startup Checklist Path Verification (step 54)

  • 33

    Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident

    The source code used to build the production software was reviewed to determine if the modifications to the software were consistent with the TRR approval. This required a review of all source files to identify changes that were implemented from the previous software version. The changes being evaluated were for the addition of a velocity based variable to be included into the ARGOS Control software and a change to how the motor velocity would be displayed to the operator in terms of load linear velocity rather than motor RPM.

    The files reviewed were the following:

    Filename Date Modified Description

    ARGOSApplication.java 11/15/2012 GUI for velocity control variable

    ControlApplication.java 11/15/2012 GUI for velocity control variable

    ATM60.hh 11/13/2012 Header file for external encoder

    ATM60.cpp 11/14/2012 Configure external encoder to provide velocity data

    S620.cpp 11/14/2012 Motor controller

    adaptive.tv 11/19/2012

    data_record.dr 11/28/2012 Variables saved to log file

    input.py 12/13/2012 Setup parameters

    S_define 11/20/2012 Main TRICK program

    Table 3: Trick Simulation Source Code Files

    Each of the modified files were consistent with the modifications approved by the ARGOS EMU TRR. The file, ARGOSApplication.java, included changes to allow for showing the new version number on the screen, the setting a velocity gain variable Kv, and limiting the minimum and maximum values for Kv. The file,

  • 34

    ControlApplication.java, allowed for the screen layout to include the new variable. The files, ATM60.hh and ATM60.cpp, control what information is collected from the ARGOS output drum encoder. This sensor is able to provide position and velocity over the CAN bus network and the files were modified to configure the encoder to add the velocity output from the encoder to the CAN network data being used by the Trick Simulation ARGOS

    controller. The function readposition() used to read the encoder position was updated to also read the

    velocity variable. The file, S620.cpp, was updated in increase a variable synccount used in the MessageInfo() object from 10 to 50. This changed the rate that the s620 motor amplifier would provide actual motor RPM to the Trick Simulation. This motor velocity variable is not used in the ARGOS control algorithm and is only used for troubleshooting. The Velocity variable used in the controller comes from the output drum encoder (ATM60). The file, data_record.dr, controls what the Trick Simulation environment records to a data file. The modifications all reflect the new velocity controller variable. The velocity variable from the motor controller was removed and replaced with the velocity variable from the ATM60 output drum encoder. A conversion to linear velocity is a calculated variable that is logged to the data file. A calculated variable to convert the motor RPM command to a motor linear velocity command is logged to the data file. The velocity gain variable is logged to the data file. The purpose of converting rotational velocity to linear velocity was that the data is more intuitive to ARGOS customers and operators when operating the system and reviewing test data.

    The file, input.py, is a test parameter file. This file includes ARGOS system limit parameters that are configured based on the test configuration. These parameters are modified after the TRR and define the operational motion limits for the system for the current configuration and set fault detection thresholds. Monitored virtual physical limits include a virtual soft stop motion limit and a virtual hard limit. The soft limit commands zero velocity to the motor controller and will allow the ARGOS operator to back out of the limit position without a system fault. The virtual hard limit causes the ARGOS Trick simulation to command a motor controller disable, initiating the emergency zero velocity ramp command to approach zero velocity and throw the external brakes. Parameters that control fault detection for the load cell measurement include magnitude of unacceptable error between the target off-load force and the measured force along with a duration. Also, an unacceptable minimum and maximum force value will fault the system and result in the brakes locking the system. These parameters were modified due to changes in the lifting path components and changes in the ARGOS system height.

    The file, S_define, is the main Trick Simulation control loop. This source code was modified to include the velocity component of the control system algorithm.

    To conclude, the ARGOS software configuration management was followed as defined in the approved Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A. The software has existed as a component of the overall ARGOS development and has not been identified as a specific software project by the Engineering Directorate. Under current NASA process at JSC, the software developed within the Engineering Directorate would be required to follow the process described by EA-WI-035 Software Project Management and Development.

    6.7.5 Software Regression Testing

    Per the ARGOS CM plan, SRSD-08-005.A, software regression testing is based on the requirements determined by the ARGOS project lead and as approved by test readiness review. The most recent change to the ARGOS software to include a velocity gain parameter did not document requirements for regression testing of the safety functions that perform fault detection logic in the system. The determination to test for controller stability under nominal drive conditions was evaluated with test inputs that included high velocity and impulse inputs into the force feedback system. There was no evidence of previous testing being done to evaluate over constrained operation of the software. This determination was made based on the logic that none of the fault detection logic or system interfaces were modified.

    6.7.6 Fault Detection Software Logic

    The software was reviewed in collaboration with the ARGOS Controls Engineer to establish whether the close call occurred due to a failure of the existing fault detection logic in the software. The fault detection logic is listed in the following Table 4. The Hazard of Un-commanded motion was evaluated due to the causes shown in the following table. The cause that most likely resulted in the close call was not recognized when developing the

  • 35

    software and is likely to be a case of over constraining the physical system such that the motor controller integration term increased motor torque until breakaway occurred and the system ran away without having a chance to recover. The processing capacity of the Trick Simulation computer is greater than the demands of the control system algorithm and fault detection. Trick has built in capability to monitor the control cycles and log when a control frame is delayed beyond the cycle time of the simulation. The ARGOS simulation cycle time is one millisecond. The ARGOS team has stated that a designed rate of missed frames occurs due to devices on the CAN Bus network periodically responding in a duration slightly greater than one millisecond which results in three out of 1000 frames being delayed. This is not due to processing capacity, rather asynchronous hardware clocks. If the processing capacity of any device in the Trick simulation causes more frames to be missed, the Trick Simulation will report these delays. There was no identified evidence of overloaded computer capacity in the close call incident.

  • 36

    Hazard Cause Hazard control Description Criteria Result Effect on Load

    Gear Slippage/Out of Gear/Encoder Failure

    Drive input/output position detection

    Gear slip detection between output drum encoder and motor encoder position data mismatch

    If the motor encoder and output encoder differ by more than 1.2 revolutions (in terms of the motor)

    Shutdown command is sent to motor. Brakes commanded to engage. Trick Simulation enters freeze loop. Output gear slippage message to console

    Depending on the rate of the gears, the minimum drop of the test subject is 1.2 rotations of the motor. This distance is increased by the duration that the position data comes in from the motor controller. The output drum position is measured every 4 msec

    Comes Out of Gear Gear indication switch

    Gear ratio selector indication switches show which gear ratio is engaged by the shift fork

    If both switches are depressed, neither switch is depressed, or if the opposite one of the expected gear ratio (set when initially shifted) is indicated

    Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output gear indication message to console

    Trick Simulation monitors switch positions every 0.25 seconds. If the system goes out of gear completely and the load starts to drop, the gear-slippage logic is more likely to react first based on a higher sample rate and this gear indication logic cycles every 0.25 seconds. If the gear is partialy engaged the switch breaks contact before completely out-of-gear with the dog-teeth remaining fully engaged and the Gear inidication switch logic would command the system to stop and engage the brakes

    Drive moves past motion allowed

    Virtual Soft Limit The virtual soft limit is designed to prevent a test subject from reaching a hard limit

    Absolute output encoder information indicates soft limit position has been reached

    If output velocity calculation results in a commanded velocity further into the limit, the Trick Simulation sends a zero velocity to motor controller instead. The software will output a soft limit message to the console

    This position is initialized during the ARGOS daily checklist. Positions are set to allow full motion in the vertical direction which will not prevent an impact to the floor. The logic is based on encoder data sampling at (4 msec) and will output the appropriate velocity command on the next one millisecond control cycle once data is received

    Drive moves past motion allowed

    Virtual Hard Limit The virtual hard limit is the first hard limit (located before the physical hard limit switch)

    Absolute output encoder information indicates hard limit position has been reached

    Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output hard limit message to console

    Trick Simulation logic is based on encoder data sampled at 4 millisecond intervals. The simulation freeze and brakes are commanded to engage on the next one millisecond control cycle

  • 37

    Bad input to control system

    Load Cell Disconnection

    Check for a reasonable load cell force

    Trick Simulation will detect if the raw load cell force measurement is ever less than -100lbf (-445 N) or greater than 1000lbf (445 N)

    The Trick Simulation will send a shutdown command to the motor controller on the next one millisecond control cycle. Brakes engage. Trick enters freeze loop. Software will output the load cell disconnect message to console

    The Trick Simulation logic will enter a freeze loop and send a shutdown command to the motor controller on the next one millisecond control cycle. The motor controller will ramp to zero velocity from the current velocity and engage the brakes

    Bad input to control system

    Load Cell Disconnection

    Check for a reasonable delta between two consecutive data points

    If the raw load cell force changes by 125lbf (556 N) in one millisecond (between data points)

    Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output relevant load cell disconnect message to console

    The Trick Simulation logic will take two millisecond control cycles to detect this fault and will command shutdown on the next one millisecond control cycle. The Trick Simulation logic will enter a freeze loop and send a shutdown command to the motor controller on the next one millisecond control cycle. The motor controller will ramp to zero velocity from the current velocity and engage the brakes

    Bad input to control system

    Load Cell Disconnection

    Check for a reasonable force error over time. The values in this loop were empirically developed during human testing with the ARGOS team

    A fast check and a slow check: If the force error exceeds 100 lbf (445 N)and remains above 100lbf (445 N)for 300msec or if the force error exceeds 35lbf (156 N) and remains above 35lbf (156 N) for 500msec. This loop does not run if the participant is inside of a soft limit

    Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output relevant load cell disconnect message to console

    When the the load cell is disconnected the filter and analog to digitcal converter output a noisy force between 0 and 20lbf (89 N). The effect is a low cable tension sent to the control system and the hoist will rapidly rise for 300msec to 500msec prior to entering the Trick simulation freeze loop and commanding the motor controller to ramp to zero velocity and engage the brakes

    Bad input to control system

    Negative Force The filtered force feeds into the proportional term of the controller. A negative force can result in undesirable behavior

    If the filtered force is less than zero

    Set the filtered force to zero

    This scenario occurs when the load cell is measuring impacts (ie. Foot impact while jumping) that result in impulse measurements. This Trick simulation logic limits the control system response but is more of a stability control than fault detection as it does not engage the brakes or stop the simulation

  • 38

    Bad input to control system

    High force error To maintain stability during impacts (foot strikes, jumps, etc), cap max filtered force error (feeds into proportional term)

    If the filtered force error exceeds 20 lbf (89 N)

    Set the filtered force error to 20lbf (89 N)

    Proportional term causes under-damped ringing. Limiting this error reduces the amplitude of persistent oscillation

    Control System Failure/ Software Exception

    Node Guard Exceptions such as floating point exceptions, memory exceptions, etc. in Trick

    Trick has exception handling

    Shutdown command sent to motor. Brakes engage

    Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown

    Control System Failure/Computer Failure

    Node Guard Computer shuts down or software fault causes abnormal exit without executing a normal Trick shutdown routine

    S620 motor controller has node guarding and expects a heartbeat within 100ms

    Motor controller throws n04 warning and shuts down. Brakes engage

    Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown

    Control System Failure/Communications check. Break/error in CAN network or failure of Trick software CAN

    Node Guard S620 motor controller receives velocity commands via CAN network

    S620 has node guarding and expects a heartbeat within 100ms

    Motor controller throws n04 warning and shuts down. Brakes engage

    Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown

    Control System Failure/ Motor controller velocity control loop gains incorrect

    Software input variable sanitization routine

    Trick software checks S620 motor controller gain settings at software start

    Check if KP=0.2 (proportional gain) and Tn=140 (integral time constant)

    Software will not start if settings not correct

    This is pre-operational fault logic that will prevent the system from starting.

    Table 4: Fault Detection for the Hazard of Un-commanded Motion

  • 39

    6.7.7 Test Data Review

    Per the ARGOS overview given by the ARGOS test team there are a number of test parameters being logged in a data file every 10msec for analysis after the ARGOS operations are complete. The parameters include:

    Variable Units Description

    Simulation Time seconds Time since start of simulation

    Raw Load Cell Force lbf Tension in lifting cable

    Filtered Load Cell Force lbf Force after nonlinear filter - fed into the control loop

    Target Offload lbf Force control loop tries to match

    Output Encoder Position counts Position of ATM60 absolute encoder on cable drum

    Output Encoder Velocity RPM Filtered velocity of absolute encoder on cable drum

    Linear Velocity in/s Calculated linear velocity of cable

    Commanded Linear Velocity in/s Control loop commanded velocity as linear cable motion

    Kpv -- Velocity Gain Variable

    Kpf -- Proportional Gain Variable

    Kdf -- Derivative Gain Variable

    Table 5: Test Parameters Recorded by the ARGOS Trick Simulation

    Part of the software review was to develop confidence that the recorded parameters from the ARGOS software represented the system response and all values were within the expected range and capability of the sensing hardware and software data-types.

    The value for Raw Load Cell Force (lbf) evaluates the cable tension between the ARGOS hoist and the load. It is a 1000lbf full-scale strain gauge high level output sensor the load cell is 10Vdc full-scale output and is digitized through a 16bit signed CAN bus signal conditioner. The force measurement was within sensor limits throughout the incident and never appears to saturate during the event. The Output Encoder Position is provided by an ATM60 Sick brand absolute encoder. The encoder is an unsigned 32bit integer and has been verified to be well within the rotational limits of the device during the close call operations. The position is initialized during the system start-up to zero the start position. The system position is a signed 32-bit integer that takes the unsigned current position minus the unsigned initial position. The output encoder velocity is a 32 bit signed integer provided in units of revolutions per minute. All of the encoder variables are properly typecast in Trick to prevent overflow of the variables. Both the rotational position and rotational velocity are converted to linear inch units for purposes of the data recording file.

    The Kollmorgen Servostar S620 motor amplifier provides motor position as an unsigned integer to the Trick Simulation but it isnt recorded. Additionally the S620 amplifier is commanded with an RPM command to the motor controller. For purposes of data presentation and data recording this value is converted to linear inches per second to provide units familiar to the ARGOS operator.

    The ARGOS gear selection switches are triggered by the gear selection and send a discrete value to the Trick Simulation through the CAN bus.

    All of the parameters identified for use in the control algorithm and being recorded in the data file were proven to correspond to appropriate programming data-types.

  • 40

    6.8 Safety and Hazard Analysis

    6.8.1 Safety

    The ARGOS Systems Requirements Document makes no reference to JPR 1700.1, JSC Safety and Health Handbook nor EA-WI-023, Project Management of Government Furnished Equipment Flight Projects. The ARGOS was considered a development project that could operate with flexible adherence to requirements. This culture was accepted by Safety and Mission Assurance oversight. The primary NASA safety document applied to ARGOS was NASA Standard 8719.9, Standard for Lifting Devices and Equipment. This standard is heavily referenced and many people viewed ARGOS as a high tech Critical Lift crane. The standard may have been the best fit but ARGOS was not just a Critical Lift and for people. Although this was known at the time, the fact that no standard for a complex human robotic system exists, restricting adherence to the best fit, kept the team from looking for needed requirements and Hazard Controls. The velocity that the system operates at when approaching obstacles is well outside the realm of normal lifting operations.

    6.8.2 Hazard Analysis

    SRSD-12-007 Hazard Analysis for Gen 2 ARGOS Facility Testing documented the hazards of the ARGOS suspension system. It evolved from its initial use to document the standalone Gen 1 ARGOS system to the current Gen 2 configuration, which eventually included humans in the loop. The ARGOS team regularly updated that facility HA to reflect changes to the system that introduced new hazards. The status of the completed HA was presented at each Test Readiness Review (TRR).

    The same is true of the hazards unique to wearing the harness and/or a pressure suit, which were documented separately by the Crew and Thermal Systems in SRSD-12-008 Hazard Analysis for ARGOS for Test Participant Providing an Input into Gen 2 ARGOS. This test subject HA was focused solely on the hazards of the human physiology of being restrained in the harness and/or pressure suit at various orientations. It did not address the ARGOS system performance beyond the harness.

    6.9 Engineering Processes, Roles and Responsibilities This section is a discussion about how ARGOS evolved and the Engineering processes, roles and responsibilities that were observed during the investigation. It is not a technical discussion, rather observations of the environment, culture roles and expectations.

    During the period that ARGOS was initiated, the EA-WI-023 was written to cover GFE flight projects, in 2012 it was revised, to be much more easily used by projects at all levels. Development, research or low TRL level projects are typically not projectized, as they are viewed within Engineering as not needing the rigor because they are more risky and are undergoing constant change as the hardware is developed. This approach allows for rapid development; build-a-little, test-a-little philosophy to obtain quick results at low cost. Specifically for ARGOS, it was an internal Engineering project that had no program or external customer; it was initiated on a small amount of internal funds. It was not categorized as a project or facility, and did not initially involve test subjects. Safety and Mission Assurance support was included from near the start of ARGOS, with buy-in to the engineering development approach. Throughout the development, the ARGOS team researched the design and selection of the components to a very detailed level, to the very best of their ability. Safety was requested to assist in the right lifting requirements. However, beyond Safety, it was noted that there was limited involvement from outside the division and organizations, and that the Institutional Review Board (IRB) was considered the oversight and external review from a human safety as well as engineering oversight. Each version or addition or change to ARGOS over the 6 to 7 years were reviewed at a Test Readiness Review (TRR) Board, in fact, there were 44 TRRs found for ARGOS. In addition, 19 ER CCBs were found from 2008 to current, and one Engineering Leadership Council topic at the Engineering Directorate, during the same period. Within ER all TRRs are chaired by one branch chief for the entire division. That branch chief also happens to be the branch which developed ARGOS. No reviews beyond a TRR were held. A PDR was held at about 50%, at that time some external review was provided. There were no CDR or other reviews beyond the one PDR. At some point the ARGOS went from a development effort to using humans in the test. The test subjects were varied and diverse, from NASA engineers to retired NASA astronauts, and even outside visitors. It also grew in capability from simulating 1/6 gravity (Lunar environment) to microgravity, and

  • 41

    from upright human testing to horizontal microgravity testing. All of these changes added risk and should have triggered additional safety concerns.

    On March 15, 2012, there was a similar close call event of an un-intended drop of a test subject, also 12 to 18 inches (30.5 to 45.7 cm). The root cause of this event was attributed to developmental software that was incorrectly executed for the test, rather than the baseline software. In this case the motor amplifier threw a F32 (Software Failure) fault and the brakes were fully engaged to stop the fall.

    From the Boards perspective, there were signs that were missed. The start of human in the test, the lack of proper outside review, lack of p