did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings...

17
Et 118 599. DOCUMENT RESUME TM 005 083 AUTHOR Waag, Wayne L.; And Others TITLE ASUPT. Automate,d Objective Performanc asureaent System. INSTITUTION Air Force Human Resources Lab., Williams AFB, Ariz. Flying Training D.iv. REPORT- NO APHRLIR-75-3. PUB DATE Mar 75' NOTE 17p.. EDRS PRICE MF-$0.83 HC-$1.67 plus Postage DESCRIPTORS Aircraft Pilots; Criterion Referenced Tests; *Flight Training; *Measurement Techniques; *Performance Tests; Simulators; *Test Construction; Test IDENTIFIERS *Air Force . ABSTRACT To realize its full research potential, a need exists for the development of aniiiiomated objective pilot performance evaluation system for use in the Advanced Simulation in Undergraduate Pilot Training (ASUPT) facility. The present report documents the approach taken for the development of performance measures and also presents data collected from two preliminiry evaluation stuclieS. The results indicated that the objectively derivett measures: (1) correlate highly with instructor ratinO, and (2) did-Criminate between pilots of different experience levels. These,, findings are encouraging and demonstrate the potential of the present approach for generating the needed automated objective pilot performance measurement system. (Author) Or *************************************************** ******************** Douments acquired by ERIC include many informal unpublished,. * * materials not available from other sources. ERIC makes very effort * * to obtain the best copy available. Nevertheless, items of marginal. * * reproducibility are often encountered and this affects the quality * * of the microfiche and hardcopy reproductions ERIC makes available * * via the ERIC Document Reproduction Service (EDES). EDRS is not *' * responsiI.e for the quality of the original document. Reproductionp * * supplied by EDRS are the best that can be made f'om the original. * ***********************************************************************.

Transcript of did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings...

Page 1: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

Et 118 599.

DOCUMENT RESUME

TM 005 083

AUTHOR Waag, Wayne L.; And OthersTITLE ASUPT. Automate,d Objective Performanc asureaent

System.INSTITUTION Air Force Human Resources Lab., Williams AFB, Ariz.

Flying Training D.iv.REPORT- NO APHRLIR-75-3.PUB DATE Mar 75'NOTE 17p..

EDRS PRICE MF-$0.83 HC-$1.67 plus PostageDESCRIPTORS Aircraft Pilots; Criterion Referenced Tests; *Flight

Training; *Measurement Techniques; *PerformanceTests; Simulators; *Test Construction; Test

IDENTIFIERS *Air Force .

ABSTRACTTo realize its full research potential, a need exists

for the development of aniiiiomated objective pilot performanceevaluation system for use in the Advanced Simulation in UndergraduatePilot Training (ASUPT) facility. The present report documents theapproach taken for the development of performance measures and alsopresents data collected from two preliminiry evaluation stuclieS. Theresults indicated that the objectively derivett measures: (1)

correlate highly with instructor ratinO, and (2) did-Criminatebetween pilots of different experience levels. These,, findings areencouraging and demonstrate the potential of the present approach forgenerating the needed automated objective pilot performancemeasurement system. (Author)

Or

*************************************************** ********************Douments acquired by ERIC include many informal unpublished,. *

* materials not available from other sources. ERIC makes very effort ** to obtain the best copy available. Nevertheless, items of marginal. ** reproducibility are often encountered and this affects the quality *

* of the microfiche and hardcopy reproductions ERIC makes available *

* via the ERIC Document Reproduction Service (EDES). EDRS is not *'

* responsiI.e for the quality of the original document. Reproductionp ** supplied by EDRS are the best that can be made f'om the original. ************************************************************************.

Page 2: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

AFHR L -TR -75-3

AIR FORCE

LrCA

US,, MENT OF HEALTH.EOUCAVON A WELFARENATIONAL INSTITUTE OF

EDUCATIONTHIS DOCUMENT HAS BEEN REPRO-DUCE° EXACTLY AS RECEIVED FROMTHE PERSON OR ORGANIZATION ORIGINATING IT POINTS OF VtEW OR OPINIONSSTATED CO NOT NECESSARILY REPRESENT OFFICIAL NATIONAL INSTITUTE OFEOUCATION POSITION OR POLICY

v

MAN

RE

S0U

SCOPE OF INTEREST NOTICE

The ERIC Fpalny euesansgoed`4 document for °costing

to Etkour judgement. thn docu mustn alto of unwed to the clewing.homes noted to the rght. Index.Int should refierc:PMete spodepomts of vivo..

ASUPT AUTOMATED OBJECTIVE PERFORMANCEMEASUREMENT SYSTEM,

By

Wayne L. WaagEdward E. Eddowes

John H. Fuller, Jr., Cant, USAFRobert R. Fuller, Major, USAF

'FLYING TRAINING DIVISIONWilliams Air Force Base, Arizoni135224

March. 1975Interim Report for Period July 1973 June 1974

Approved for public release; distribution unlimited.

LABORATORY

AIR FORCE SYSTEMSCOMMANDBROOKS AIR FORCE BASE,TEXAS 78235

.

Page 3: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

,

S

NOTICE

_ When US Government drawings, specifications, or other data are usedfor any _purpose other than a definitely related Governmentprocurement operation, the Gtvernment thereby,. incurs noresponsibility nor any obligation whatsoever, and the fact that theGovernment may have formulated, furitished, or in any way suppliedthe said drawings, specifications, or other data is not to be regarded byimplication or otherwise, as, in any manner licensing the holder or anyother person or corporation, or conveying Any rights or permission tomanufacture, use, or sell any patented invention that may in any waybe related thereto.

This interim report was submitted by Flying Training Division, AirForce Human Resources Laboratory, Williams Air Force Base, Arizona85224, under project '1123, with Hq Air Force Human' ResourcesLaboratory (AFSC), Brooks Air Force Base, Texas 78235.

This report has been reviewed and cleated for open publication and/orpublic release by the appropriate Office of Information (01) inaccordance with AFR 190-17 and DODD 5;30.9. There is no objectionto unlimited distribution of this report to the public at large, or byDDC to giesNational Technical Information Service.(NTIS).

This technical report has been reviewed and is approved.

WII4-IAM V. flAGLN, Technical Director -

Flying Training Division

Approved for publication.

HAROLD E. FLFHER, Colonel, USAFCommander

1

()

Page 4: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)

Unclassified

4

REPORT DOCUMENTATION PAGE'READ INSTRUCTIONS

BEFORE COMPLETING FORM1. REPORT NUMBER .

AFHRL-TR-75-3

2. GOVT ACCESSION NO.

.

3. RECIPIENT'S CATALOG NUMBER

t. TITLE (and Subeitke) 14

ASUPT AUTWATED OEUECTIVE PERFORMANCEMEASUREMENT SYSTEM

.

5. TYPE OF REPORT a PERIOD COVERED

Interim . 4July 1973 June 1974

6. PERFORMINGORG: REPORT NUMBER

7. AUTHOR(t) .- iWayne L. Waag . 'John H. Fuller, Jr. .Edward E. EddOwes Robert R. Fuller

8. CONTRACT OR GRANT NUMBER(a)

9. PERFORMING ORGANIZATION NAME AND ADDRESS

Flying Training DiviiionAir Force Human Resources LaboratoryWilliams Air Force Base, Arizona 85224

10. PROGRAM ELEMENT. PROJECT, TASKAREA a WORK UNIT NUMBERS.

62703E. 11230108

11. CONTROLLING OFFICE NAME AND ADDRESS -

Hq Air Force Human Resourcei Laboratory (AFSC)Brooks Air Force Base, Texas 78235

.

12. REPORT DATEe

March 197513. NUMBEN OF PAGES

16

n4. MONITORING AGENCY NAME a ADDREIS(if different from Controlling Office)

.

. Lli_

15. SECURITY CLASS. (of this report)

Unclassified.

15a. DECLASSIFICATION/DOWNGRADIN,GSCHEDULE

.. e.

16. DISTRIBUTION STATEMENT (of thi Report)1

0 ..

Approved for public release; distribution unlimited, ic....'/

17. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, if different from Report)

..

. i ,1

18. SUPPLEMENTARY NOTES - ' - .. .....

. I

, .. ,

..

19. KEY WOROS (Continue 44 revere: side if necessary and identify by block number)

pilot petformance measurement ..

proficiency measurement .

pilot evaluation. 4

objective measurement.-....

,...,,

.f

20. ABSTRACT (Continuo on do if necessary ancildintify by block number). '4.,. ..._

To realize its full research potential a need exists for the development of an automated objective pilotperformance evaluation system for use M.. the Advanced Simulation in Undergraduate Pilot Training (ASUf)facility. The present report documents the approach taken for the,development of performance measures and alsoprecepts data collected from two preliminary evaluation studies. The.results indicated that the objectively derivedmeasures. (1) correlate highly , NO instructor ratings, and (2) discriminate between pilots of different experiencelevels. These findings are encouraging and demonstrate the potential of the present approach for generating theneeded automated objective pilot performance measuremeSsystem.

.. .

FORMtinb'to .1 JAN 73 1473 EDITION OF 1 NOV 65 15,013SOLETE

.

4

UnclassifiedSECURITY CLASSIFSCATION DF T.HIS PAGE (When Data Entered)

Page 5: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

ep.

Problem

SUMMARY

4.

The Advanced Simulation in Undergraduate Pilot Training (ASUPT) facility,,ii de ed to be aresearch device capable of providing answers regarding the hardware design and efftive use of flightsimulators. Using state-ofthe -art motion and visual systems, the relationship betweeysimulator fidelity andtraining effectiveness, as well as the applicability of advanced training concepts aje to be investigated. SinceASUPT was designed to be a research simulator, the development of an adequate perfontiance measurement'system becomes-the foundation of the proposed program of research.

Approach

One of the salient characteristics of flying is that it is criterion-directed. The execution of anymaneuver requires that a certain definable objective be met. The degree to which these objectives are metwould appear to represent an adequate description of performance. In other words, a cnterion-referencedapproach to ..objective performance measurement is proposed. Come. quently, for each maneuver, thecriterion objectives must be defined in terms of parameters available within, the simulator., Using this,.approach, a set of performance measures can be generated for each of the maneuvers to be .flown mASUPT.

Results

To evaluate the potential of the proposed approach, two preliminary studies were condoMeasues were generated for seven basic instrument maneuvers of varying levels of difficulty. ts of-different experience levels flew these maneuvers and were evaluated by experienced instruct pilots. TheJesuits indicated that. (1.) instructor pilots were consistent in their stibjective evaluation;y( the objective

,measures correlated highly with the subjective evaluations, and (3) the objective res discnminated4,: between pilots of different experienclevels.

Implications- .

The data suggest the approach taken to 8e a viable line. The baisic assumptions cif the measurementsclizeme were corroborated by the data. The results. (1) slo:est instructor evaluations represent a usefulcriterion for developing objective measures, (2) indicate the objectively derived 'measures possess a highdegree of validity, and (3) provide some insight into the manner in which instructors assign grades.Potentially fruitful areas of further research are discussed.

/ '

f

1

Page 6: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

ACE

This document rep nts a .ftortion of the research program on Project 1123,Flying Training Dev pment, Dr William V. Hagin, Project Scientist; Task 112301,Development cf. rformance Measurement Techniques for Air Force Flying Training, DrWayne" L. g, T Scientist; being carried out by the Air Force Human ResburcesLabors , Flying T dining Division, Williams Air Force Ease, Arizona.

V

2'

1

Page 7: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

tTABLE OF CONTENTS

I. Introduction ,, >

Measurement System RequirementsMeasurement Development for ASUPTPreliminary Evaluation Study IPreliminary Evall4ation;- Study/IIImplicatioits --

-x°

.

..

.

-

. t-

,

'I 0

Page

5

55

69

10.

Referfncss , 1

LIST OF TABLES.

Table . .

1 Comparison of Instructor Judgments at Cockpit and Console t .,

For-Straight and Level..

2 .Correlations of Objective Performance' Measures With Instructor Dialuations,

for Straight and Level.

3 Prediction of Overall Instructor Pilot Ratings For Straight and Level

4 .,Prediction of Smoothneis Ratings For Straightand Level

5 tnter-Rater Correlations for Study II

. 6 Correlations'of Objective Measures with Instructor Ratings for Study II

7 DesCriptive Statigics of OWeetive Measures for Each Rating Category

r 8 Comparison of Objective Measures for Experienced and Jnexperienced Pilots

.

'.,

41'

Page

f4

7

7

, 8

8

9

11

12

13

p

i

3

Page 8: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

,

OMATED OBJECTIVE PERFORMANCEliVASUREMENT SYSTEM,

f

I. INT ODUCTION

research device capable 9f providing answThy Advanced .SiMulation in Unclezcelr Pilot Training (ASUPT) facility is designed to be a

ding both the hardware design and effective use of flightsimulators. Using state-of-the-art motion and visual systems, the relationship between simulator fidelity andthe training effectiveness of these systems, as well as the applicability, of advanced training concepts are tobe investigated. Since ASUPT is designed to be a research simulator, the development of an adequateperformance measurement system betOmes an essential component of the research program. This reportdocuments 'the approach to performance measurement systeni development which has been taken andpresents the results of two blief validation studies.

Measurement System Requirements

The criterion for evaluating a flight simulation device is its training effectiveness. From past evidenceindicating positiv; transfer effects, it is assumed that performance in the simulator will be positively relatedto 'performance in the aircraft. As a result, the thrust of the present effort is to develop measures whichreflect performance in the simulator. In addition, it is possible'to use pilot performance in the simulatorasa criterion against which to investigate alternative simulator hardware configurations and training strategies.

One of the salient characteristics of flying is that it is tnterion-directed. For the execution of anymaneuver or sequence of maneuvers there are definable objectives which must be accomplished. The degreeto which these objectives are met represents an adequate description of performance. In other words, acriterion-referenced approach Pto pilot performance fo the basis Or the present effort. Following such anapproach, it is apparent that the definition of trite on objectives is of foremost importance. Within thecontext of measurement development for ASUPT, e critical question is whether these criterion objectivescan be stated in terms of parameters availabl within the simulator. In other words, can behavioralobjectives be defined in terms of the,state of simulated aircraft and contratnputs of the pilot?

Aside from the requirement to define performance in terms of observable behavioral objectives, thereare other constraints which are applicablejhe first is parsimony in the selection of simulator parameters tobe sampled. Ciliterion objectives are to be defined using as few parameters as possible. Furthermore, theyare to be sampled and analyzed on a real-time basis so that the resulting measurement and feedback areimmediate. Since measurement will be an integral part of training students in ASUPT, it is also 'necessarythat the resulting output be meaningful and easily interpreted by both instructors and students.

In suimnaiy, a criterion-referenced approach to measurement system development is to be pursuedwithin the constraints of the following requirements:

1. Measures Will assess the degree to which' the criterion-objectives are met.

Measures will reflect only the most salient characteristics of performance.

3 Measures will be meaningful and interpretable to the userthe student and instructor pilot.i

. 4 Measures will be generated.on a real-time basis so.tat feedback is immediate.

ent Development for ASUPT

ough the criterion-referenced approach represents the rationale for present developmentalefforts, iksfurther set of assumptions ha4 guided implementation on ASUPT.

1 "i'llfeasurement system development parallels skill acquisition...1114k student pilot. Since studentpilots ac wire flying sIbIls in a hierarchical fashion, measurement development should proceed in a similar'manner.' mply stated, a building block approach is assumed in whit imeasurements for basic flying skillsare the flist to be implemented.

2. 'pze focus of the measurement _system is the ividual flight maneuvers. This level ofmeasurement seams to be most consistent with present flying training syllabuses.

Page 9: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

3, Maneuvers can be conceptualized as integrated sequences of steady states and transition& Thefundamental /light attitudes plus transitions from one attitude to another form the conceptual segments for .

most maneuvers' (Meyer, Laveson, Weissman, 8e, Eddowes, 1974). It is necessary to define behavioralobjectives for each segment since they we likely to differ from one to another.

f4. Maneuver performance can he evaluated by two sets of parametersthose reflecting the state of

the aircraft and those reflecting the control inputs of the pilot. Superior performance is assumed tomanifest three characteristics: (a) accomplishing the criterion objectives as defined by the aircraft stateparameters, (b) avoiding excessive rates and acceleration forces so that the maneuver is executed smoothly,and (c) accomplishing these objectives with the least amount of effort, that is, by minimizing control /inputs.

5. The evaluation of performance for a given parameter involves a comparison of the obtained valuewith some ideal value For parameters reflecting aircraft state, the deviation from the ideal provides anindex of error. Since the ideal is seldom attained, it is more realistic to define acceptable performance interms of an empirically determined tolerance band about the ideal value. For parameters reflectingsmoothness and control inputs, the adapted ideal represents a performance level characteristic of the highlyexperienced pilot.

6. The implementation of the measurement systeni requires four phases of development: (a)definition of criterion objective in terms of a candidate set of simulator parameters, (b) evaluation of theproposed set Of measures for the purpose of validation and 'simplification, (c) specification of criterionperformance by requiring experience instructor ,pilots to fly the Taneuyer in 'question, and (d) thecollection of normative data using students as they progress through the training program.

Preliminary Evaluation Study .I

In keeping with the approach' outlined previously, the first maneuver for which measures wiredeveloped was straight-and-level flight. Since the criterion objectives for thin maneuver arewell-defined--maintain constant altitude, airspeed and headingit was felt that an evaluation of itsmeasurement scheme would provide a good starting place to determine whether the approach held promise.Any performance assessment system, if it is valid, should have .at least two characteristics. First, it shouldbear a positive relationship to expert opinion regarding the quality of performance. Second, it shouldreliably discriminate between subjects having differing levels of experience. The demonstration of suchconstruct validity seemed necessary for the evaluation of the present measurement scheme.

Using the pre-programming capability in ASUPT, a simple test scenario was developed. (1) initializethe simulator to 15,000 feet, 160 knots, heading 090 degreess "(2) unfreeze thee_ ulator and allow 10seconds for the pilot to "settle down," (3) sample selected parameters d (4) freeze thesimulator. The parameters sampled were altitude, airspeed, heading, stick mov nt, throt e mo ment,elevator stick force, pitch rate, pitch acceleration, roil rate, roll accelerations, -v 'cal velocity, an rticalacceleration. Mean deviation and root-mean-square (RMS) deviationswere com uted for altitude, ai peed,heading, and stick force. RMS scores were computed for the remaining parameters.

A seven-point rating form was developed for use by instructor pilots Who were to provide qualitative,evaluations. Five items were to be rated: (1) altitude control, (2) airspeed ,control, (3) heading control, (4)over-all level of performance, and (5) smoothness. For each evaluation, two\raters wefuledone inside the

and d another at the console who only observed the repeater instruments. In ttu manner, an estimateof inter-rater reliability could be computed.

Subjects for the initial evaluation were .12 Air Force employees (Flying Training Division, Air ForceHuman Resources Laboratory) and one student pilot. Piloting experience ranged from zero to 000+hours. Each subject was given a two-minute practice/instruction period followed by five one-minute

wThree instructor Pilots were used as raters and were randomly alternated between the cockpit and console.All missions were flown using full motion and q-seat. Of the total 65 runs, one was lost due to systemfailure.

The means of the cockpit anti console ratings, along with their correlations, are presented in Table 1.The high intercorrelations suggest the instructors to be highly'consistent in theirtatings. It is interesting tonote that the lowest correla n is for smoothness. This is most likely due to availability of motion etIesthe rater inside the cnckpit. In any, case, the high intercorrelatiOris suggest the ratings to be a recriterion against whi ate the objective measures

6, R

Page 10: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

Table 1. Comparison:a Instructor Judgments at Cockpit and Console_For Straight and Level

MeasureMean

CockpitMean

.z' ConsoleInterathCorrelation

Alti ride 'ControlHeading ControlAirspeed ControlOverall RatingSmoothness

4.74584.76274.54244.62714.4574 ------

__---

..----' 4.7031

4.65624.73-44--4.67194.6250

.8722

.9257

.8535,.9109.6297

ze I

The next step was to determine whether the objectively derived measures would reliably predict theinstructor ratings. The adopted criteria were the overall rating and smoothness rating. Table 2 presents the.correlations between the objectively derived measures of performance and these instructor evaluations forboth the ,cockpit and console raters. A glance at these results suggest several things. First, the correlationsbetween the measures and instructor evaluations are rairly consistent for both the cockpit and consoleratings. Second, and most importantly, substantial relationships exist between a number of the objectivelyderived measures and the instructor pilot (IP) subjective evaluations. Third, the measure RMS verticalvelocity predicted both criteria quite well. These findings were highly encouraging-suggesting that theseobjectively derived measures do relate to the Instructor's evaluation of performance.

Table 2. Correlations of Objective Performance MeasuresWith Instructor Evaluations for Straight and Level

Overall Rating Smoothness Rating

Measure Cockpit Console Cockpit

Mean Altitude Error' Mean Airspeed Error

Mean 'Heading ErrorRMS AltitudeRMS AblopeedRMS HeadingMean Stick ForceRMS Stick ForceRMS Stick Movement

CMS Throttle MovementRMS Pitch RateRMS Pitch AccelerationRMS Roll RateRMS Roll AccelerationRMS Vertical VelocityRMS Vertical Acceleration

-.6492.5826

-.5075-.7803-.7690-.6393

.2740-.4434-.0098*-.1643*-.3168

.0781*-.0371*

.. -.1527*-.7737-.4172

.

.r -.720

.6515-.5134- -.8348-.7739-.6498

- .3859-.3101-.0237*-.1855*-.2625

.1277*:..00950706:

-.7560. -.377,0.

-.4645.4365

-.4729-.5750-.5678-.5931

.3717-.3249-.2789-.3320-3019-.2006*-.0365*

.0145*-.7004-.4663

''

Console

. -.4834. .433V---"--

-.525-.6071-.6193-.6941

_7..32366399

-.30287.5036

*

.0931*.0009*

--..75219747.

* Nonsignificant.'

Using a forward selection multiple regression procedure, subsets of variables were selected which werepredictive of the criterion. An iterative procedure was used wherein variables were added to the predictionequation until the increment in explained variance became statistically nonsignificant. At this point, thevariables in the prediction equation were eliminated from the predictor set and the procedure. repeated. In

- this manner multiple sets of predictors were defined. The criteria adopted were the overall rating andsmoothness rating obtained from within the cockpit. thing the equations developed against the cockpitratings, an attempt was made to predict the raings obtained at the console. The results of :these analyses arepresented in Tables 3 and 4.

7

0

Page 11: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

,

RM$ Altitude 9066 .9044Mean Altitude ErrorMean Stidk ForceRMS Throttle Movement

Set 2RMS Vertical Velocity .8839; .8867RMS AirspeedRMS Stick MovmentRMS Pitch Rate

Set 3,RMS Heading .8303 .8178Mean Airspeed ErrorRMS Stick ForceRMS Pitch AccelerationRMS Vertical Acceleration

Set 4Mean Heading Error .5075 .5134

---

Table 3. etion of OVerall Instructor Pilot RatingsFor Straight and Level

Measure ea. Cockpit 4 Console

Set I

Table 4. Prediction of Smoothness RatingsFor Straight and Level

Measure CoCkOlt Console

_---RMS Vertical VelocityRMS Throttle MovementRMS AltitudeRMS Pitch Rate

Set 1

.7926 .800r

Set 2RMS heading .7155Mean Altitude ErrorRMS Pitch Acceleration

RMS AirspeedRMS Vertical Velocity

Mean Heading ErrorMean Airspeed ErrorRMS Stick Movement

Mean Stick ForceRMS Stick Force

Set 3.6718

Set 4. '

.6248

Set 5.5585

.7086

.7053

I.6496

:5277

8

.

Page 12: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

pr.,

1

-, Four supse of variables were selected which were predictive of the, overall ratings. The first subset..,.

consisting of altitude, mean altitude error, mean stick force; and RMS throttle movement yielde a

multiple 1,1 o .9066. The correlation between the dicte4 score (using the equation developed for ttiecockpit rating and the console rating was .9044. Wnilar degrees of correspondence were obtainettforremaining subsets of predictions. For smoothness, five subsets of predictors were identified. Again, high

1,u

egrees of correspondence were obtained between the multiple Rs developed for the cockpit ratings antebsequent correlation between predicted scores arid the console ratings.

k The results of the first study were highly encouraging and seemed to warrant the.. followingconclusions. First, instructor paots are highly consistent in their evaluations of performance. Consequently,it is possible to use such evaluations as one criterion against which to validate objective measures ofperformance. Second, the objective measures of performance developed for straight and level will reliablypredict instructor evaluations. The demonstration of such predictive validity suggested the approach takento be a reasonable one. To further evaluate the proposed syssem, a second study was undertaken.

Preliminary Evaluation Study II

Scenarios -were developed and the performance measurement software written formaneuvers. change of airspeed, constant ai_rw _ed climbs/descents, rate climbs/descents, and the steepSimilar rating forms were developed for each maneuver and the simultaneous cockpit/console evaluatiprocedures followed. Four T-37 instructors were the raterstwo alternating at the console and the olhtwo alternating in the cockpit. Ten subjects were used in the second study, again representing a wideof skills. Four student pilots in 137 training, three T-37 IPs, and three civilians were included. Each su jectflew the following set of maneuvers: three airspeed changes; one constant airspeed climb; one constantairspeed descent, one rate climb; one rate 'descent; and three steep turns. For each climb/descent a level-offto altitude was required.

Inter-rater correlations were computed for each maneuver and are presented in table 5. As indicated,'the data for climbs and descents were pooled. Overall inter-rater'correlations were computed for categorieswhich were rated for all maneuvers. The data indicates substantial agreement among the raters, especiallyfor the overall and smoothness ratings, even though the values were somewhat less than obtained forstraight-and-level. Several possible reasons for the lowered inter-rater correlations should be mentioned.First, the maneuvers in the second study were of increased complexity. Since these maneuvers requireseveral transitions in addition to a steady state condition, the instructor's job of monitoring all the relevantparameters is increased. Likewiie, the performance of transitions from one steady state to another increasesthe number of cues available to the rater within the cockpit. A second-possibility concerns rater bias. It ispossible that different subjective criteria were used in ratings of the T-37 IPs as opposed to students. Anexamination of the ratings of one of the IP's performance records yielded large discrepancies between thecockpit and console ratings. The objective measures appeared to agree with the cockpit ratings in th4t theperformance was quite good. However, according to the console rater, the performance was considered tobe unsatisfactory: Such data strongly suggest the possibility of rater bias.

Table 5. Inter-Rater Correlations for Study II

MeasureChange ofAirspeed'

CAS . Rate, ClImbidtsoint"-- '---Climb/Descent

SteepTurn Overall

Altitude ControlAirspted ControlHeading ControlRate Control'Bank ControlOverallSmoot ess °

.7714

.7763

.6121++

.7741.7311 .

1

.6334

.4818

.8987++

6822.7055

,

.7740

.8078;7534

" .6272+

.8045

.9661

.6935

.+

.8592

.0.7279.6758

.7716,/

.836ty

puted for mar-sieilivcomputed since item not rated 11)11 maneuvers.

4

Page 13: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

-0\

maneuver were compUted and presented in Table 6. In this case, the ratings from the in the cockpit were

; Correlations between each of the objective measures and the overall and smootthess ratings for each

used as the criteria. A perusal of the data Warrants several conclusions. As expected, parameters reflectingperformance of the criterion objectives were most related to the ratings. likewise, the values aboutthe ideal were correlated more highly than mean deviations. The only exception was bank error. Inthis case, an error in the computing software was discovered, thereby invalidating thelesultingmeasure.*,

A forward selection regression analysis was computed in an attempt to develop predietio equationsfor the overall steep turn rating. The steep turn was selected since it represented the most diffi of themaneuvers tested, The initial etlaset of seven variables selected by the procedure yielded a multip Lot".820r This equation was th used to predict the console ratings. The obtained correlation be eenpredicted and observed console ratings'was .9137. Again, multiple subsets were isolated whipredictive of the criterion. Although not verified, it scents likely that similar sets of predicjion equationscould have been developed for the other maneuvers. In any case, it is certainly clear that the objectivemeasures developed are highly predictive of instructor ratings, thereby demonstrating the validity of thepresent approach to measurement.

. A further set of analyses were computed inorder tolerate the objective measures to the instructorevaluatiOns. Performance records for all maneuvers were pooled and placed into groups according to theevaluation of the cockpit instructor. Four groups were defined according to the grade assigned of U, F, G,dr E. Descriptive statistics were then computed for each of the four groups. The results are presented mTable 7. Measures for rate of climb and bank were deleted due to the small number of cases within each ofthe groups. An examination of the data warrants several conclusions. First, it is apparent that there areclearly defined trends for_a number of the objective measures across the four groups. There are clear cutdecreases in root mean square for airspeed, heading, and altitude as a function or the subjective ratings.I.Altewise, there are decreases in the variability across subjects. Consequently, loWered ratings arecharacterized by increasing within.subject error as well as, increasing between-subject variability. Such datareflect the fact that there is one way to execute the maneuver correctly, but many ways to corning errorsand therefore receive a lower evaluation.

.The results also verified the assumption That superior maneuver performance involves theminimization of_control inputs. Performances rated excellent were those which minimized the amount ofstick movements and also minimized stick deviation from the null position. 'Furthermore, superiorPerformance was

,andcharacterizedby minimum amount of control force, or, in other minds, the efficient

use of trim: While stick inputs were found to be related to rated performance, throttle movements werenot. Of the proposed measures of smoothness only pitch rate, vertical velocity and, vertical accelerationwere related to rated performance. Again, the pattern of decreaiing means and variances was found.

As discusted previously, a requirement of the measurement system is that it reliably discrimritkes,between pilots of different experience levels. In.this case, depending on their previous flying experience,subjects 'were plated into one of two categories high versus low experienceAnd descriptive statistics,computed, The results are presented in Table 8.,Itshould be pointedtorth)Me statistics computed for--each objective measure were based on data milting from all the malieuvers in winch that measure wascomputed. For example, the statistics for heading measures were based on all maneuvers except the steepturn. The results indicate that a substantial number of the objective measures will differentiate between thetwogroups. Generally, the inexperienced group was characterized by higher error scores and much greatervariability. For the steady state parameters, the RIMS error scores weremore discriminative than the meanerror scores.

*Implications *The pesults of these two evaluation studies warrant a number of conclusions. First, the data suggest

the approach-taken to be a viable one. The assumptions concerning superior maneuver performance havebeen corroborated by the data. experienced pilots, and likewise those performances`rated excellent, werecharacterized bt (j) meeting the criterion objectives in terms of minirniziAg aircraft state errors, (2)minimizing rates and acbelerationsat least in the pitch and Z axes, and (3) minimizing control inputs. Inother words, superior performance ,involves gettin the job done, doing it smoothly, and with a leastamount of effort_ I

; 10

4

Page 14: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

Mea

sure

/ Ove

rall

Tr

le 6

. Cor

te d

ons

of O

bjec

tive

Mea

sure

s w

ith I

nstr

ucto

r R

atin

gs f

or S

tudy

II

of A

eed

S o

othn

ess

Ove

rall

CA

S C

limb

/Des

cent

Sm

ooth

ness

Ove

rall

Rat

e C

limb/

Des

cent

Sm

ooth

ness

Ove

rall

StS

el, T

urn

Sm

ooth

ness

'

Mea

n A

irsp

eed

Err

or-.

1658

.-.

0636

-.04

70-.

0681

.054

5--

.623

0272

7R

MS

Aiis

peed

-.58

54-.

6856

-.01

64-.

4104

-.07

02_-.

4776

.-.

4951

-.17

76

Moa

n H

eadi

ng-E

rror

/.2

804

.137

7.0

527

.059

4+

' .-.

6112 +.

.

RM

S H

eadi

ng/

-.54

63 .

.-.

4144

-.50

90-3

614

-.66

70-.

7620

-.66

49+

+

Moa

n A

ltitu

de E

rror

-.67

47-.

6386

.213

0.3

374

-.09

46-.

7899

.010

8.5

8305

-.53

20.5

400

-.70

85-.

6512

-.56

90-.

4755

-.68

68-.

7193

Moa

n B

ank

Err

orR

MS

Ban

k+ +

+ ++

2136

*+

+ ++ +

+ +

7.55

4

-.53

75-.

6121

-.03

84*

-.-.

RM

S A

ltitu

de

=.1

542

.09+

59+

+.0

799

-.80

29,

-.31

94-.

3530

2-.8

312

.001

3+

.+

Mea

n R

ate

Err

or+

++

++

+

RM

S R

ate

+M

ean

Stic

k Fo

rce

-.29

01-.

6303

-.3,

475

.176

1

4.R

MS

Stic

k Fo

rce

898

-.14

01.1

021

-.13

43-.

4920

-.01

99

RM

S T

hrot

tle M

ovem

ent

.04

-.02

74.

1.3

325

-7..6

1782

396

.139

5

-.41

45-.

6483

't

RM

S St

iCk

Mov

emen

t-.

07

-...2

823

-.24

16-.

2475

'tR

MS

Fore

-Aft

Stic

k Po

sitio

n-.

2664

\,

-.23

94-.

1641

-.22

95.1

081

-.28

51-.

3122

-.39

69R

MS

Lat

eral

Stic

k Po

sitio

n-,

2467

-.48

75-.

6364

-.60

90-.

6630

-.77

.71

.075

3-.

6315

-.47

63-.

6380

;R

MS

Pitc

h R

ate

' .-.

1)45

-.33

72-,

1161

*

,RM

S V

ertic

al V

eloc

ity'

-.60

69\-

.612

0,

-7..3

2455

894

-7..5

1812

7,37

-.52

05

RM

S Pi

tch

Acc

elht

atie

hR

MS

Rol

l Rat

e,

-.02

03, -

.120

2.1

988

'.0

411

-.04

90

-.11

3408

39,

.001

6-.

5514

.040

6

-.81

45-.

7437

.070

9

-.53

75-.

0385

.050

6R

MS

Rol

l Acc

eler

atio

n-.

2311

\ -.2

214

.337

8-.

0618

.095

3

RM

S V

ertic

al A

ccel

erat

ion

-.31

30-4

.564

5' -

3286

.-.

4762

-.25

82-.

7136

-.69

75-.

4225

-.45

28-.

5116

,-.

4521

,

* M

easu

re in

corr

ectly

com

pute

d du

e to

sof

twar

e er

ror.

+ N

ot c

ompu

ted

for

man

euve

r.

44,

Page 15: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

A.

Tab

le 7

. Des

crip

tive

Stat

istic

s of

Obj

ectiv

e M

easu

rei f

or E

ach

Rat

ing

Cat

egdr

y

rti

er.

Mea

sure

,

T.

.,M

eans

a"S

tand

ard

Dev

iatio

nst

U'

rG

ItU

F0

Mea

n A

irspe

ed6,

2029

2.63

95.8

516

.980

58.

0183

5.37

323.

4874

1.55

45 ,

RM

S A

irsp

eed

12,3

590

6.40

863.

9006

2.26

947.

3095

. 3.6

592

2.08

2013

643

Mea

n H

eadi

ng'

-1.8

377

2.13

37-.

9720

-.09

387.

2208

5.26

463.

0924

1.40

64

RM

S H

eAdi

ng8.

0273

5.08

333.

2715

1.45

354.

5197

' 3.9

316

1.73

67.7

454

,M

ean

Alti

tude

'- 29

1.42

50-4

6.18

48-2

1.21

074.

9002

718.

3638

125.

3819

59.7

469

19.9

138

RM

S A

ltitu

de68

6.96

2413

3.91

5063

.143

327

.561

757

7.05

0582

.527

840

.693

814

.786

2

Mea

n St

ick

Forc

e-2

.237

3-1

.274

1-.

6847

-.71

253.

4299

1.74

421.

9502

1.22

28R

MS

Stic

k Fo

rce

5.27

052.

8326

2.41

51"

-1.6

180

3.69

541.

8237

1.66

161.

2253

RM

S St

ick

Mov

emen

t.3

910

.261

0.1

955

.183

3.1

835

.'.1

507

--:

.097

0.0

841

10.4

.1R

MS

Thr

ottle

Mov

emen

t.6

958

.625

4.7

671

.789

9.2

190

.288

4.4

653

.625

8

'GI

r)R

MS

Fore

-Aft

Stic

k Po

sitio

n3.

2308

2.20

981.

9672

1.77

361.

8311

.844

7.6

526

.520

3R

MS

Lat

eral

Stic

k Po

sitio

n.6

336

.368

7.2

848

.190

3.2

870

4940

.137

2.0

964

RM

S Pi

tch

Rat

e4.

9255

2.01

46-

1.11

461.

1477

4.37

68r

1.63

20,

.862

8,

1.05

57R

MS

Pilc

h A

ccel

erat

ion

.' 29

.749

920

.942

017

.771

120

.121

319

.570

019

.878

§2.

3.41

70,1

9.44

27R

MS

Rol

l Rat

e.4

012

'53

14.5

801

5197

.316

3.2

967.

.337

3.1

062

RM

S R

oll A

ccel

erat

ion,

.580

6.5

286.

.619

9.4

462

.233

3.3

462

.363

7.3

160

RM

S V

ertic

al V

eloc

ity12

05.0

500

137.

6677

123.

8052

168.

8898

1789

.361

022

2.80

5018

4.13

5721

4.38

13R

MS

Ver

tical

Acc

eler

atio

n3.

2379

1555

81.

4965

91.

2012

2.08

41.8

067

.737

9.5

517

Page 16: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

Table 8. Comparison of Objective Measures for ExperiencedAnd Inexperienced Pilots

masureExperience Inexperienced

Mean S.D. Man S.D.

Mean Airspeed 1.4730 s 2.5232 2.3486 6.1876

RMS Airspeed 3.4929 2.0878 6.7856 5.0718Mean Heading -.1843 2.2824 .6084 5.4465RMS Heading 2.1827 1.4071 5.1275 3.6841Mean Altitude, -21.6591 77.8214 -71.5790 316.2510'RMS Altitude' 67,5192 673319 193.8939 322.1645Mean Climb Rate e''', .1610 1.4210 .9480 2.7849RMS Climb Rate 23482 .8100 4.4623 1.5058Mean Bank 1.4191 .8855 4.1372 3.5044Mean Stick Force -.9480 1.9727 -1.0806 2.0091RMS Stick Force 2.4354 1.7274 2.8486 23823RM$ Stick MovementRMS Throttle Movement

, .1847.6668

.1009

.4829.2778

.' _7679.15143872

RMS Fore-Aft Stick Position 2.0347 .6661 4 2:1971 1.1259RMS lateral Stick Position -2; .2517 .1356 3968 .2294Fitch Rate . 1.0769 .9846 ,, .2.3859 2.4821Fitch Acceleration 18.2833 23.9736 22,1778 18.4969Roll Rate .5146 '. -3078 .5749 .3257Roll Acceleration . ..... .5670 3494 .5420 .3432Vertical Velocity 1723421 208.9697 4099243 825.0552Vertical Acceleration 1.5010 .7557 . 1.7047 1.2598

"-

Second, the data indicate that instructor evaluations are a tisdble criterion for future measurementsystem development. -Thez relatively high levels of agreement between the cockpit and console ratings areparticularly encouraging since the availability of cues was radically different for each rater. The consolerater only had access to the repeater instruments while the instructor in the cockpit could observe thestudents' behavior in addition to the flight instruments. Furthermore, kinesthetic cues were also available to

the cockpit rater. The importance of these different cue sources in instructor evaluations is an area which

should be addressed in future research studies.

Third, the objectively derived measures were shown to possess a certain degree of validity. Significant

correlations between these measures and instructor evaluations were obtained. FurthermOre, the measures

were shown to discriminate between, pilots . of differeht experience levels. These results are most, encouraging and convincingly demonstrate the fruitfulness of the present approach.

Fourth, the results provide some insight into the manner in which instructors assign grades. Ahierarchical model seems most consistent with the data. To receive an excellent (E) rating, errors on thecritical parameters must all be low. Ai the quality of the rating decreases.the potential for different errorsincrease. In fact, it is possible to commit error involving one parameter, control the others quite well, andstill receive a low evaluation. This suggests that both the number and degree of error are important. Thiinvestigation of instructors grading strategies is another prime, area for future research.

Fifth, the results provide preliminary data concerning the simplification of the present set ofparameters. As expected, parameters reflecting performance of the criterion objectives yielded the highestvalidity coefficients.- However, measures of mean error were not as effective as root - mean -square errorFurthermore, a number of the proposed measures of smoothness did not produce any significantrelationships. Roll rate, roll acceleration, and pitch acceleration were not related to either instructor ratings

or sperience level. Likewise, 'throttle movements were not found to beimportant.

C;

13

16

Page 17: did-Criminate - ERICdid-Criminate. between pilots of different experience levels. These,, findings are. encouraging and demonstrate the potential of the present approach for generating

Taken as a whole, the results of these prelimpary investigations are encouraging. The demonstratedvalidity of measures, developed for these basic instrument maneuvers, indicate the fruitfulness of thepresent a y. o. h. E currently underway for the development of performanit measures -for -morecomplex maneuvers. It is expected that the resulting measurement system will meet both the research andtraining needs for future studies to be accomplished in ASUPT.

REFERS .90

Meyer, RP., La *son, J1., Weissman, NS., & EddoW6s, E.E. Behavioral taxonomy' ofundergraduate pilottraining tasks and skills: Executive summary. AFHRL-TR-74-33(1). Williams AFB, Ariz.. FlyingTraining Division, Air Force Human Resources LaboWory, December 1974.

. :. 14 .

.17

w.

far