Predicting Time Series Outputs and Time-to-Failure for an ... · and Time-to-Failure for an...

ii

“paris-8” — 2015/1/27 — 22:11 — page 1 — #1 ii

ii

ii

Predicting Time Series Outputs

and Time-to-Failure for an

Aircraft Controller Using

Bayesian Modeling

Yuning He∗

January 27, 2015

AbstractSafety of unmanned aerial systems (UAS) is paramount,but the large number of dynamically changing controllerparameters makes it hard to determine if the system iscurrently stable, and the time before loss of control if not.We propose a hierarchical statistical model using TreedGaussian Processes to predict (i) whether a flight will bestable (success) or become unstable (failure), (ii) the time-to-failure if unstable, and (iii) time series outputs for flightvariables. We first classify the current flight input intosuccess or failure types, and then use separate models foreach class to predict the time-to-failure and time seriesoutputs. As different inputs may cause failures at differenttimes, we have to model variable length output curves.We use a basis representation for curves and learn themappings from input to basis coefficients. We demonstratethe effectiveness of our prediction methods on a NASAneuro-adaptive flight control system.

1 Introduction

Most unmanned aircraft (UAS) are equipped with elab-orate control systems. In particular, autonomous op-erations require that the flight control system performsreliably and in the presence of failures. For certifica-tion, controller stability and robustness must be demon-strated. UAS controllers can range from simple PIDcontrol to advanced adaptive controllers. Their behav-ior is determined by numerous parameters, many ofwhich can be set during design time. More challeng-ing, however, are adaptive control systems that changetheir internal parameters (e.g., gains, model parame-ters, modes) dynamically during the flight in responseto a changing mission profile, unexpected environment,or damage.

There exist numerous architectures for adaptiveflight control systems. Regardless of its architecture,however, safety requires that certain robustness and sta-bility properties can be guaranteed for the currently ac-

∗[email protected], UARC, NASA Ames Research Center

Histogram of Flight Length T for Failures

T

Fre

quen

cy

200 400 600 800

020

4060

80

Figure 1: Histogram of Tfail. Most failures occur beforeT = 600 time steps. Successfully controlled flights aremore common and last for T = 1901 time steps.

tive set of parameters. In most cases, such guaranteescannot be provided a priori, so on-board dynamic moni-toring techniques are needed to assess if—given the cur-rent parameter setting—the aircraft (AC) will remainstable and, if not, when the AC will become unstable.Such information is vital for the on-board autonomousdecision making system, especially for UAS.

In this paper, we describe a statistical methodwhich enables us to answer two questions during theflight: (i) given the current values of control systemparameters, will the system remain stable within thenext T time steps?1 And, (ii) if not, at which timeTfail < T will the system be unstable? We also describea statistical method for efficient prediction of the actualtime-series outputs up to T time steps. This moredetailed information can be used for optimal and safeautonomous decision making. For example, a damagedrudder might require a specific parameter setting. If itcan be predicted that there needs to be extreme rudderdeflections within the near future, then the currentsituation can be regarded as unsafe and other mitigationtechniques must be considered.

Our statistical framework to address these questionsis hierarchical. Most simulation runs are successfullycontrolled and last for T = 1901 time steps. Mostfailure runs have a time-to-failure Tfail < 600 timesteps. When observing the distribution of simulationruns with a specific Tfail over numerous parametersettings, the histogram of Tfail in Figure 1 suggeststhe existence of a few easy discernable failure groups.Although, the general appearance of the time seriesare typically quite different for runs with different

1For our case study in this paper, one time step is 1/100s.

https://ntrs.nasa.gov/search.jsp?R=20150018872 2018-07-16T05:20:44+00:00Z

ii

“paris-8” — 2015/1/27 — 22:11 — page 2 — #2 ii

ii

ii

0 50 100 150 200 250

−15

−50

510

Run #3: output #8:alpha

t

y (a

lpha

)

0 50 100 150 200

−15

−55


t

y (a

lpha

)

0 50 100 150 200 250

−15

−55


t

y (a

lpha

)

0 50 100 150 200

−10

05

10


t

y (a

lpha

)

Run #3

Run #6

Run #18

Run #29

Alpha

Alpha

Alpha

Alpha

0 100 200 300 400

−50

510


t

y (a

lpha

)

0 50 100 150 200 250 300 350

−40

48


t

y (a

lpha

)

0 100 200 300 400 500 600

−50

510


ty

(alp

ha)

0 100 200 300 400 500

−50

510


t

y (a

lpha

)

Run #14

Run #20

Run #24

Run #56

Alpha

Alpha

Alpha

Alpha

Figure 2: Typical curves for different inputs that resultin (left) Tfail ≈ 250 and (right) Tfail ≈ 350 . . . 500. Thehorizontal axis is time and the vertical axis is one of thesimulator outputs.

Tfail, Figure 2 shows that the shape of time-series forruns with Tfail ≈ 250 look relatively similar to eachother as do curves with Tfail ≈ 350 . . . 500 time steps.Although the data in the figure have been obtained withfrom an experimental setup with a specific adaptivecontrol system (see Section 6), similar behavior can beencountered often.

We therefore developed a hierarchical model, wherethe entire data first undergo a classification with re-spect to prominent differences in failure time. Thentime-to-failure and time series prediction are performedseparately for each of the classes.

We illustrate our approach with a case study, wherewe have used our statistical framework on a high-fidelitysimulation of a neuro-adaptive flight control system –the NASA IFCS (Intelligent Flight Control System).It will be described in Section 6. We have examinedclassification strategies with one, two, and four classesand provide comparative results as well as an analysisof the quality of time-to-failure estimation in terms ofmissed and false alarms.

2 Methodology & Architecture

We have implemented a toolchain for time-to-failure andtime series prediction using a two-stage hierarchical sta-tistical model (Figure 3A). Our architecture consists ofa Categorical Treed Gaussian Process (CTGP) statis-tical model for the mapping from input to success orfailure class, and within each given failure class, a dis-tinct Treed Gaussian Process (TGP) model for the re-lationship of input to time-to-failure (which is the out-put curve length). The mapping from input to classis learned through a CTGP supervised classificationmethod (Figure 3A).

In addition to predicting time-to-failure, we also

A

B

Figure 3: Two-stage prediction of time to failure (A)and supervised learning (B).

predict the actual time series outputs for flight variablesin both success and failure cases. As different inputsmay cause failures at different times, we have to modelvariable length output curves. We do so by modeling themappings from input to a fixed number of coefficients ina curve basis representation, using one TGP model foreach real-valued coefficient. Using a curve basis allowsus to have a fixed size representation of variable lengthoutput curves. We hypothesize that, by fitting distinctmodels for each of the distinct classes, the overall resultsof prediction accuracy for time-to-failure and curvebasis coefficients will be improved. Our hierarchicalmodel can capture model parameter variation across thesuccess and failure classes.

Next we give some background on TGP models anddiscuss other Gaussian Process (GP)-related optionsfor modeling output curves to provide context for ourstatistical modeling decisions.

ii

“paris-8” — 2015/1/27 — 22:11 — page 3 — #3 ii

ii

ii

3 Background

3.1 Treed Gaussian Processes The Treed Gaus-sian Process (TGP) model [9] is more flexible and over-comes the limitations inherent in GP models by subdi-viding the input space, and modeling each region rν ofthe input space using a different GP. Fitting different,independent models to the data in separate regions ofthe input space naturally implements a globally non-stationary model. Moreover, dividing up the space re-sults in smaller local covariance matrices that are morequickly inverted and thus provide better computationalefficiency. Also, partitions can often offer a natural anddata-inspired blocking strategy for latent variable sam-pling in classification. The partitioning approach in thestandard TGP model is based on the Bayesian Classifi-cation and Regression Trees model developed by Chip-man et al. [5, 6].

The TGP subdivision process is done by recursivelypartitioning the input space.This recursive partitioningis represented by a binary tree in which nodes corre-spond to regions of the input space, and children arethe resulting regions from making a binary split on thevalue of a single variable so that region boundaries areparallel to coordinate axes. The TGP parameters Θinclude parameters for defining the subdivision processwhich ultimately determine the number of regions in thesubdivision.

3.2 Classification Treed Gaussian Processes Itis also useful to be able to predict functions C :Rp → {1, . . . ,M} with a categorical output of Mpossible classes. Recently, an extension of TGP calledCTGP (Classification TGP) [2, 1] was developed to useTreed Gaussian Processes for classification. The CTGPmodel first extends the GP model from regression toclassification by introducing M latent variables Zm,m = 1, . . . ,M , to define class probabilities via a softmaxfunction p(C(x) = m) ∝ exp(−Zm(x)). In CTGP,each class function Zm(x) ∼ TGP(Θm) is modeledusing a TGP and each class may use a different tree.

3.3 Curve Prediction with GP One approach topredicting output curves instead of scalar-valued out-puts is to extend the GP model to functions y : Rp →Rq, where the vector output y represents samples of acurve at q = T time points. In the context of statisti-cal emulation of a simulator computer program, Contiand O’Hagan [7] call this the Multi-output (MO) emula-tor and provide the statistical model for q-dimensional

Gaussian Processes:

y = f(·)|B, σ2, r ∼ Nq(m(·), c(·, ·)Σ),

m(x) = BTh(x),

c(x,x′) = exp{−(x− x′)TR(x− x′)},

where h : R → Rm is a vector of m regressionfunctions shared by each of the scalar-valued componentfunctions f1, . . . , fq, B ∈ Rm×q is a matrix of regressioncoefficients, and R = diag(r) is a diagonal matrix ofp positive roughness parameters r = (r1, . . . , rp). As inthe 1-dimensional GP, a common choice for defining themean m(·) is the linear specification h(x) = (1,x)T forwhich m = p+ 1.

In addition to the MO emulator, Conti andO’Hagan [7] outline two other possible approaches formulti-output emulation: Ensemble of single-output (MS)emulators and the Time Input (TI) emulator. In the MSapproach, each of the T curve values are predicted in-dependently using T single-output emulators. On theother hand, the TI approach adds the time parametert to the input x and builds one, single-output emulatorfor y(x, t) : (Rp×R)→ R. The MO emulator is the sim-plest from the computational perspective with a com-putational load that is comparable to a single-outputGP emulator, in which the bottleneck is n × n matrixinversion for n training inputs S. The MS method usesT single-output GP emulators and thus has a compu-tational burden T times more than that of the MOmethod. A naive implementation of the TI emulatorwould require nT times the computation of the MO em-ulator as the training samples are now S × {1, . . . ,M},but the structure of the problem allows the requirednT × nT matrix inversions to be done via n × n andT × T matrix inversions. Of course, this is still morecomputation than required by the MO emulator.

For prediction, the key component is the correlationbetween outputs over the input domain as there areonly a limited number of training samples available.There are more substantial differences in the MO,MS, and TI methods in terms of their covariancestructures, at least for a diagonal covariance matrix Ras specified above. For the covariance of outputs ft(x1)and ft(x2) at the same time t, the MS method is mostflexible as it estimates different roughness parameters rfor each output time while the MO and TI methodsestimate a single r for all times. In terms of thecovariance between different outputs ft1(x1) and ft2(x2)at different locations x1 and x2, the MO method is mostgeneral because this covariance is c(x1,x2) × Σ(t1, t2)and the matrix entry Σ(t1, t2) is not constrained. TheTI approach is less flexible than the MO approachbecause the covariance between different outputs is theGaussian process variance times exp(−rT (t1 − t2)2),

ii

“paris-8” — 2015/1/27 — 22:11 — page 4 — #4 ii

ii

ii

essentially replacing the generality of Σ(t1, t2) with anexponentially decreasing squared time difference. Themain disadvantage of the MS approach is a lack ofcorrelation structure over time because it emulates theoutput curve values at different times independently.For this reason, Conti and O’Hagan [7] dismiss the MSapproach for emulation of dynamic computer models.

4 Our Statistical Models

For increased generality in modeling, we do not wantto assume stationarity and thus we do not use the GP-based MO approach described above for predicting out-put curves. One possibility is to build a non-stationaryMO approach by extending TGP as described in [8, 9]from 1-dimensional outputs to model multi-dimensionaloutputs by using multi-dimensional GP models in dif-ferent regions of the input space. Multi-dimensionaloutputs are not supported in the current TGP R pack-age [9], so a new implementation would need to be doneto test this idea. The current TGP method for scalaroutput is already quite slow for even modest size prob-lems in terms of the dimension of the input space andthe number of training examples. Although the compu-tation for multi-dimensional GPs is comparable to thecomputation for 1-dimensional GPs in terms of the num-ber of training examples n, there is more computationand space needed for the multi-dimensional case. Thusthe TGP-based MO idea does not seem to be a practi-cal one, and we opt for another non-stationary modelingapproach.

In our approach to predicting one output variablecurve, we represent output curves y ∈ RT in termsof a linearly independent set of D orthogonal curvesB ∈ RT×D: y

.= Bc. We use orthogonal curves

which measure different curve characteristics so that thebasis coefficients in c = (ci) ∈ RD are as uncorrelatedas possible. Then we model the coefficients ci, i =1, . . . , D, independently using D TGP models. Thusby changing the curve representation, we can use anMS approach without being subject to the criticism ofnot modeling correlations over time. Now the multipleoutput values being modeled are not values of the curveat distinct time points but rather the coefficients in a“basis” representation of the curve. In addition to theMS advantages of simplicity and flexibility of modelingcorrelations of the same coefficient output value overdifferent inputs x, the MS implementation can beparallelized by running each single output emulator,both fitting and prediction, in parallel. In manyapplications, using D � T orthogonal curves will sufficeto accurately represent output curves and the use ofa good basis provides a substantial data reduction.Although we may not need a full orthogonal basis in

which D = T , we refer to B as a ”basis” or a ”reducedbasis”. Another advantage of our basis representationapproach is that it allows us to have a fixed size outputrepresentation D for applications which have outputcurves whose length T vary with input x.

In the case of variable length output curves, mod-eling the output curve y(x) requires modeling both thelength T (x) as well as the T (x) curve values at differenttimes. In our basis approach, we model the output curvey by modeling its coefficients in a chosen orthogonal ba-sis of D curves. Thus we have D+1 scalar output func-tions to model: x 7→ T , x 7→ c1, . . ., x 7→ cD. We useTGP for these scalar-valued prediction problems. Thestatistical model for our MS, basis approach is given by:

y|T, c, B, Vy ∼ N([Bc]1:T , Vy),

T (x) ∼ TGP(ΘT ),

ci(x) ∼ TGP(Θi), i = 1, . . . , D

where [Bc]1:T indicates truncation of the predictedoutput curve to its correct length T .

5 Our Prediction Method

It is now straightforward to give our full algorithmfor emulating variable-length output curves from asimulator. Our strategy for emulation is as follows:

1. Fitting. Find coefficients { ci }Di=1 to approximatethe training output curves y in some basis {bi }Di=1:

(5.1) y.=

D∑i=1

cibi

2. Learning Coefficient Mappings. Learn the D map-pings from inputs x to coefficients ci for i =1, . . . , D.

3. Coefficient Prediction. Predict the coefficientscnewi,pred for a new input xnew for i = 1, . . . , D.

4. Output Curve Prediction. Predict the output curveynew corresponding to input xnew using the pre-dicted coefficients cnewi,pred:

(5.2) ynewpred :=

D∑i=1

cnewi,predbi

6 Results

We will demonstrate our approach using a simulationmodel of the NASA IFCS adaptive control system. Thisdamage-adaptive control system has been developedby NASA and was originally test-flown on a manned

ii

“paris-8” — 2015/1/27 — 22:11 — page 5 — #5 ii

ii

ii

F-15 aircraft [3]. The underlying architecture (Fig-ure 4) is very generic. The output signals of the model-referencing PID controller are subject to control aug-mentation that is produced by the neuro-adaptive com-ponent (a Sigma-Pi or Adaline neural network). Adynamic inverse converts the required rates into com-mands for the actuators of the aircraft. The neuralnetwork itself is trained online to minimize the errorbetween the desired and the actual aircraft behavior.For details see [4, 3] Although this system has orginallybeen developed for a manned aircraft, it could be easilyused to control a UAS. This system is configured usinga considerable number of parameters (e.g., gains, NNlearning rate), which makes it an interesting testbed forour approach.

In our experiments, we ran the simulator with amultitude of different parameter settings and levels ofdamage, recording 12 output signals and the elapsedtime Tfail if the system went unstable. A maximumsimulation time of 19s was used.

Figure 4: The IFCS adaptive control system

6.1 Classification Results Our experimental dataset consists of a total of 967 simulation runs. Table 1summarizes two different classification strategies: divid-ing the data into two classes (Two Class Problem) andinto four classes (Four Class Problem). Each strategyseparates the data into categories according to the indi-vidual time to failure. In our setup, a time of T = 1901(the maximum recorded length in the simulation) indi-cates as success (S).

We performed experiments to determine the bestclassifer for the data. We split the 967 runs into 637training examples and 300 test examples to evaluateclassifier performance. We implemented five differentclassification methods and compared the results forseveral classifiers (nearest neighbor, several variants of

2-class Prob. runs 4-class Prob. runs

F 0 < T ≤ 1900 569 F1 0 < T ≤ 180 257F2 180 < T ≤ 280 241F3 280 < T ≤ 1900 71

S T = 1901 398 S T = 1901 398

Table 1: A summary of the data classification strategieswe used in our experiments.

TGP and CTGP, and SVM). As a whole, the CTGP-based classification methods gave, despite relativelylong training times, the lowest classification error rates.Table 2 summarizes our CTGP classification results onthe 100 test cases.

2-class F S total

training 507 360 867

test 62 38 100

# errors 3 8 11

error rate 4.8% 21.1% 11.0%

4-class F1 F2 F3 S total

training 228 213 66 360 867

test 29 28 5 38 100

# errors 5 9 5 2 21

error rate 17.2% 32.1% 100% 5.3% 21%

Table 2: Overall CTGP performance on the differentclassification strategies for two and four classes.

The overall CTGP classification error rate for the4-class case is 27%, making 27 incorrect classificationsout of the 100 tests. Although the overall error rate ishigher for the 4-class problem than the 2-class problem,the error rate for the S class is lower. In the 4-classcase, CTGP misclassified only 2 of the 38 S tests, fora S error rate of just 5.3%. The S class had moretraining examples than the other three classes. Thefailure1 and failure2 error rates were 17.2% and 32.1%,respectively. All 5 test cases in class failure3 wereclassified incorrectly. Because the number of trainingdata for this class is substantially lower than the otherclasses, this result is not surprising and should bedisregarded. We expect that the error rate for thisfailure type would improve with more training samples.

Since our intended application is a safety-criticalflight control system, the consequences of misclassifica-tion are very different for successes and failures. In ge-neral, safety-critical systems require that there be nomissed failures (a failure misclassified as a success),and as few false alarms (misclassfied success) as pos-sible. In Table 3 we break down our misclassificationerror rates by these two categories. For the Two ClassProblem, we not only have a lower overall error rate(11%), but also a much smaller missed failure error rate

ii

“paris-8” — 2015/1/27 — 22:11 — page 6 — #6 ii

ii

ii

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate (1 Specificity)

True

pos

itive

rate

(Sen

sitivi

ty)

ROC curve (AUC=0.9154)

Figure 5: ROC for prediction of time-to-failure.

(4.8%).

2-class Prob. 4-class Prob.false alarm rate 21.1% 5.3%

missed failure rate 4.8% 30.6%

Table 3: False Alarm and Missed Failure Percentages

The performance of our time-to-failure estimationwith respect to missed and false alarms can be bestvisualized using a Receiver Operations Characteristics(ROC) curve. Figure 5 shows the curve for our 100 testcases in a coordinate system of true positive rate overfalse positive rate. The graph of a high-performancesystem rises sharply, meaning we obtain a good sensi-tivity (i.e., we can detect most alarms) having to takeinto account only few false alarms. The diagonal corre-sponds to a random selection. With a curve close to theupper left corner in Figure 5, our prediction results aregood.

In Figure 6, we can see one of the advantages ofusing our statistical method is that we not only canmake predictions, but our Bayesian modeling methodalso returns a confidence interval/error bar on theprediction. Here the independent axis is the test runindex, which we ordered by increasing time to failure.The dependent axis is the time to failure T . Thetrue failure times (dark blue smooth center line—dotsconnected for better visibility) shows the ground truthvalues of T where the curve is formed by connectingT values with short line segments. TGP is a Bayesianmodel, which means it can provide samples from thedistribution of T |x. The posterior mean predictions T̂by TGP(circles) are plotted. For perfect predictions,the circles would be along the dark center curve.

A0 10 20 30 40 50 60

10

02

00

30

04

00

50

0

predict T: black=ground truth, blue=tgpnTrainFail=505, nTestFail=62

test input #

T

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

B−1500 −1000 −500 0 500 1000

0.0

00

00

.00

05

0.0

01

00

.00

15

0.0

02

0

Distribution of Prediction Error (ALL)solid=TGP (mean=circle), dashed=SVM (mean=square)

Prediction Error: T.predicted−T.true

De

nsity

●

C−100 −50 0 50 100

0.0

00

0.0

05

0.0

10

0.0

15

0.0

20

0.0

25

Distribution of Prediction Error (ONLY_FAILURES)solid=TGP (mean=circle), dashed=SVM (mean=square)

Prediction Error: T.predicted−T.true

De

nsity

●

Figure 6: A: Posterior mean Tfail prediction using TGPModel for Class F2. Prediction errors with TGP (solid)and SVM (dashed) for the entire data set (B) and forfailure runs only (C).

6.2 Output Curve Prediction In Table 4, we showthe prediction errors for each of the PCA, Fourier, andwavelet bases (with dimension D = 25) for each of theclassification strategies 4 class, 2 class, and 1 class (i.e.,just 1 model for all types of output curves). Predictionerrors are mean values over the 12 output variables.During basis fitting, here we padded training vectorsy to Tmax by repeating the last element in y.

F1 F2 F3 S F S all

µPCA 0.46 0.45 0.78 0.32 0.63 0.42 1.39σ2PCA 0.33 0.29 0.34 0.15 0.30 0.15 1.08

µFourier 0.56 0.51 0.82 0.41 0.82 0.41 1.60σ2Fourier 0.36 0.31 0.34 0.15 0.32 0.15 1.00

µwavelet 0.56 0.52 0.85 0.34 0.84 0.34 1.44σ2wavelet 0.40 0.35 0.34 0.15 0.36 0.15 1.09

Table 4: Output curve prediction errors for differentkinds of bases and 4-class, 2-class, and 1-class perfectclassification.

The errors reported in the table entries in Table 4were computed as follows. Let xi denote the test in-puts with true output curves yi ∈ RT (xi) and predictedoutput curves ypred

i ∈ RT (xi). We use the standard de-viation σi for true output curve yi to standarize errorsacross different output curves and different output vari-ables. The error ev,c for output variable v and class cwith the set of test output curves Sv,c is given by

(6.3) ev,c =

∑i∈Sv,c

||ypredi −yi||1σi∑

i∈Sv,cT (xi)

.

The true and predicted output curves yi and ypredi are

for the given output variable v, but we leave out thedependence on v to simplify the error formula (6.3).Thedenominator of (6.3) is the total number of output curvevalues predicted, thus making the reported error anaverage over all predicted points.

The PCA basis gives the best performance for eachof the classification strategies. This can easily be seen

ii

“paris-8” — 2015/1/27 — 22:11 — page 7 — #7 ii

ii

ii

2 4 6 8 10 12

0.0

0.4

0.8

1.2

failure1(red=PCA,green=Fourier,blue=wavelet)

output variable

erro

r

2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

1.2

failure2

output variableer

ror

2 4 6 8 10 12

0.2

0.6

1.0

failure3

output variable

erro

r

2 4 6 8 10 12

0.1

0.2

0.3

0.4

0.5

0.6

0.7

success

output variable

erro

r

Figure 7: Output curve predicition errors for differentbases and output variables.

in the 4 class strategy in the plots in Figure 7. Somerepresentative predictions using the dim D = 25 PCAbasis for output variable 1 can be seen for the 3 failureclasses and the success class in Figure 8. Note that the 4different classes have differing amounts of training data.Of the 867 training runs, 228 are F1 cases, 213 are F2

cases, 64 are F3 cases, and 360 are S cases. The verysmall amount of training data for F3 makes this classthe most difficult to predict (Figure 8, bottom left).

We obtained curve prediction results using CTGPto determine, which output class model is applied. Forthe 4-class model and F2, we obtained a prediction errorof µPCA = 0.49 with σ2

PCA = 0.28. With two classesF1, the prediction error was µPCA = 0.32, σ2

PCA = 0.15.Compared to the Table 4, we see that the CTGP resultsfor the two-class classification are considerable betterthan those in Table 4. Examples of predicted curves(for output variable 1) are shown in Figure 9. In theseplots, the black curve is the ground truth curve and isplotted for the correct number of time steps T (x) fortest input x. The red curve is the predicted curve usingthe model for the correct class C(x), and it is plotted for

the maximum length TC(x)max that defines the class C(x).

If CTGP incorrectly predicted the class, then the bluecurve is the predicted curve using the model for theincorrectly predicted class Cpred(x) and it is plotted for

the maximum length TCpred(x)max of that class. In the title

for each test plot there is an indication of whether the

Class F1 Class: F2

0 50 100 150

−2.

0−

1.0

0.0

Run #166: output #1 (black), Predicted from pca(25) fit (red), nTrain=228

t

x

0 20 40 60 80 100 120

02

46


t

x

0 20 40 60 80 100 120 140

−1.

5−

0.5

0.5


t

x

0 20 40 60 80 100 120 140

−3

−2

−1

0


t

x

0 50 100 150 200 250

05

1525


t

x

0 50 100 150 200 250

05

1015

20


t

x

0 50 100 150 200

−6

−2

26


t

x

0 50 100 150 200 250

05

1015


t

x

Class F3 Class: S

0 50 100 150 200 250 300 350

−6

−2

26


t

x0 100 200 300

−4

02

46


t

x

0 50 100 150 200 250 300

−20

−10

05


t

x

0 100 200 300 400 500

−4

02

46


t

x

0 500 1000 1500

−2

24

68


t

x

0 500 1000 1500

−2

02

46


t

x

0 500 1000 1500

−2

24

68


t

x

0 500 1000 1500

−3

−1

13


t

x

Figure 8: D = 25 PCA-based predictions for each class(Output Variable 1).

CTGP classifier predicted the correct class or not, andthe incorrect class prediction is given for classificationerrors.

The red prediction curve for the correct class isalways at least as long as the black ground truth curvebecause the true length T (x) must be less than or equal

to TC(x)max in order for x to be in the class C(x). (Note

that this is different from the plots in Figure 8, in whichwe truncated the predicted curve to the known correctlength T (x).) For tests in which CTGP predicts thewrong class Cpred(x),the predicted blue curve may beshorter or longer than the black ground truth curve.In the third example in the upper left of Figure 9, the

correct class is F1 with TF1max = 180, but the input was

incorrectly classified as S with TSmax = 1901 so that the

predicted curve is much longer. In the last examplein the lower right of Figure 9, the correct class is S

with TSmax = 1901 but the predicted class was F3 with

TF3max = 600 so that the predicted curve is shorter.

In Figure 9, we see more examples of excellent curvepredictions using the correct output class predicted by

ii

“paris-8” — 2015/1/27 — 22:11 — page 8 — #8 ii

ii

ii

Class F1 Class F20 500 1000 1500

−1.

50.

01.

0

CORRECT Predicted Class!Run #22: output #1:y_error_1 (black), Predicted (red) [dim=25], nTrain=228

t

y (y

_err

or_1

)

0 500 1000 1500

−4

−2

0


t

y (y

_err

or_1

)

0 500 1000 1500

−2.

5−

1.0

0.5


t

y (y

_err

or_1

)

0 500 1000 1500

−2

−1

01


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015

INCORRECT Predicted Class=failure2, Correct=failure1Run #166: output #1 (black), Predicted from correct class (red), wrong class (blue)

t

y (y

_err

or_1

)

0 500 1000 1500

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−1.

5−

0.5

0.5


t

y (y

_err

or_1

)

0 500 1000 1500

−3

−2

−1

0


t

y (y

_err

or_1

)

0 500 1000 1500

02

46

8


t

y (y

_err

or_1

)

0 500 1000 1500

−4

02

46

8

INCORRECT Predicted Class=success, Correct=failure1Run #304: output #1 (black), Predicted from correct class (red), wrong class (blue)

t

y (y

_err

or_1

)

0 500 1000 1500

−2.

5−

1.0

0.5


t

y (y

_err

or_1

)

0 500 1000 1500

−2.

0−

1.0

0.0


t

y (y

_err

or_1

)

0 500 1000 1500

−2

−1

01


t

y (y

_err

or_1

)

0 500 1000 1500

−2.

0−

1.0

0.0


t

y (y

_err

or_1

)

0 500 1000 1500

−4

−2

01


t

y (y

_err

or_1

)

0 500 1000 1500

−10

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015


t

y (y

_err

or_1

)

0 500 1000 1500

−10

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

−10

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015


t

y (y

_err

or_1

)

0 500 1000 1500

−10

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

−5

05

1020


t

y (y

_err

or_1

)

0 500 1000 1500

−4

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

05

1020


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015


t

y (y

_err

or_1

)

0 500 1000 1500

−15

−5

515


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015

20


t

y (y

_err

or_1

)

0 500 1000 1500

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

05

1525


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015

20


t

y (y

_err

or_1

)

0 500 1000 1500

−6

−2

26


t

y (y

_err

or_1

)

0 500 1000 1500

05

1015


t

y (y

_err

or_1

)

Class F3 Class: S

0 500 1000 1500

−6

−2

26


t

y (y

_err

or_1

)

0 500 1000 1500

−4

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−20

−10

05


t

y (y

_err

or_1

)

0 500 1000 1500

−4

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−2

24

68


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−2

24

68


t

y (y

_err

or_1

)

0 500 1000 1500

−3

−1

13


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

4


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

4


t

y (y

_err

or_1

)

0 500 1000 1500

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

−2

26

10

INCORRECT Predicted Class=failure2, Correct=successRun #775: output #1 (black), Predicted from correct class (red), wrong class (blue)

t

y (y

_err

or_1

)

0 500 1000 1500

−10

05

10


t

y (y

_err

or_1

)

0 500 1000 1500

−4

−2

02

4


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

46


t

y (y

_err

or_1

)

0 500 1000 1500

−2

02

4


t

y (y

_err

or_1

)

0 500 1000 1500

−3

−1

13


t

y (y

_err

or_1

)

0 500 1000 1500

−5

05


t

y (y

_err

or_1

)

0 500 1000 1500

−6

−2

26

INCORRECT Predicted Class=failure3, Correct=successRun #921: output #1 (black), Predicted from correct class (red), wrong class (blue)

t

y (y

_err

or_1

)

Figure 9: Predicted Output variable 1 curves usingCTGP classification. D = 25 PCA-based predictions:(upper left) class F1, (upper right) class F2, (lower left)class F3, (lower right) class S.

CTGP. We also see something quite interesting in theexamples with incorrect classification: the predictedoutput curve using the model from an incorrect class istypically quite good near the beginning of the simulationrun and often does reasonably well over a significantfraction of the true output curve. As expected, thepredicted curve using the correct class is usually betterthan the one using the incorrect class.

7 Conclusion

In this paper, we discussed a novel statistical toolchainfor efficient detection of time-to-failure and predictionof output curves of an adaptive controller. Due tononlinearity and the large number of parameters in sucha controller, it is extremely hard to see which parametersettings yield a stable control system and which oneslead to instability after some time. We predict if thecontroller is stable, the time-to-failure if unstable, andthe output curves. Reliable information is extremelyimportant for autonomous decision making on-board a

UAS. For our prediction technique, we used a two-stagestatistical approach based on Treed Gaussian Processes(TGP). This method reduces the overall error in thetime-to-failure prediction by an order of magnitude forour adaptive control simulator application.

References

[1] T. Broderick and R. B. Gramacy. Classification andcategorical inputs with treed Gaussian process models.ArXiv e-prints, April 2009. http://arxiv.org/PS_

cache/arxiv/pdf/0904/0904.4891v2.pdf.[2] Tamara Broderick and Robert B. Gramacy. Treed

gaussian process models for classification. Proc. 11thConference of the International Federation of Clas-sification Societies (IFCS-09), pages 101–108, March2009.

[3] John J. Burken, Peggy Williams-Hayes, John T.Kaneshige, and Susan J. Stachowiak. Reconfigurablecontrol with neural network augmentation for a modi-fied F-15 aircraft. Technical Report NASA/TM-2006-213678, NASA, 2006.

[4] A. Calise and R. Rysdyk. Nonlinear adaptive flightcontrol using neural networks. IEEE T. on Contr.Syst. T., 18(6):14–25, 1998.

[5] H. Chipman, E. George, and R. McCulloch. Bayesiancart model search (with discussion). J. Am. Stat.Assoc., 93:935–960, 1998.

[6] H. Chipman, E. George, and R. McCulloch. Bayesiantreed models. Mach. Learn., 48(3):303–324, 2002.

[7] Stefano Conti and Anthony O’Hagan. Bayesian emu-lation of complex multi-output and dynamic computermodels. Journal of Statistical Planning and Inference,140(3):640 – 651, 2010.

[8] Robert B. Gramacy. Bayesian Treed Gaus-sian Process Models. PhD thesis, Universityof California at Santa Cruz, December 2005.http://faculty.chicagobooth.edu/robert.

gramacy/papers/gra2005-02.pdf.[9] Robert B. Gramacy. tgp: An r package for bayesian

nonstationary, semiparametric nonlinear regressionand design by treed gaussian process models. Journalof Statistical Software, 19(9):1–46, June 2007.

http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.4891v2.pdf

http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.4891v2.pdf

http://faculty.chicagobooth.edu/robert.gramacy/papers/gra2005-02.pdf

http://faculty.chicagobooth.edu/robert.gramacy/papers/gra2005-02.pdf

Predicting Time Series Outputs and Time-to-Failure for an ... · and Time-to-Failure for an...

Documents

Transcript of Predicting Time Series Outputs and Time-to-Failure for an ... · and Time-to-Failure for an...