Functional Data Analysis for Design of Experiments...Functional Data Analysis for Design of...

38
Copyright © SAS Institute Inc. All rights reserved. Functional Data Analysis for Design of Experiments DATAWorks 2019 April 10 th Tom Donnelly, PhD, CAP Peter Hersh, PhD Brady Brady Systems Engineers Chris Gotwalt, PhD Director R&D Statistics, JMP Division SAS Institute Inc.

Transcript of Functional Data Analysis for Design of Experiments...Functional Data Analysis for Design of...

  • Copyright © SAS Inst itute Inc. A l l r ights reserved.

    Functional Data Analysis for Design of Experiments

    DATAWorks 2019 April 10th

    Tom Donnelly, PhD, CAPPeter Hersh, PhD

    Brady BradySystems Engineers

    Chris Gotwalt, PhDDirector R&D Statistics,JMP DivisionSAS Institute Inc.

  • Outline

    • My old Army problem• What are Functional Data?• What is Functional Data Analysis (FDA)? • How do we analyze functional data?• How do we analyze Functional Principal Component

    scores (FPCs) with DOE factors?• Simple case study - data from a real process• Complex case study - data from a computer simulation• Summary

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • First ran into Functional Data 13 Years ago at the Army’s Edgewood Chemical Biological Center

    10-factor Agent Transport & Dispersion Simulation• Able to model Concentration at a particular time, • or Dosage at end of time,• but NOT Concentration shape over time• Prof. Jeff Wu suggested using Functional Data Analysis

    (work by his former student, Prof. Ying Hung, Rutgers)

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Examples of Functional Data

    • Time series data• Spectral data • Sensor streams • Measurements taken over a range • Vibration signals• Tool wear• Radar/sonar signatures• Gun barrel degradation• Trajectories of flights between cities• Video tracking of surgeon hand movement

    • Almost any response in a longitudinal order

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • What is Functional Data Analysis?Functional data analysis (FDA) is a branch of statistics that

    analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA

    framework each sample element is considered to be a function.

    Traditional Rectangular Data Functional Data

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    The curve is the fundamental unit of observation

    Functional Data can also be Xs.When one has curves as outputs of a DOE they are usually the Ys.

  • Functional Data Analysis seminal work by James O. Ramsay and Bernard W. Silverman

    200920052005(1e 1997) Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Functional Data Analysis

    Two ways to use Functional Data Analysis

    1. Functional Response DOE (F-DOE): Goal is to use DOE factors to predict the functional response –the curve - (Featured method in this presentation)

    2. Functional Response Machine Learning (F-ML): Goal is to use functional data – the curve(s) –to predict something (e.g. yield of a batch)

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • • F-DOE & F-ML use functional principal components analysis (F-PCA)

    • F-PCA factors the data into FPC Scores and Eigenfunctions in a dimension reduction that is closely analogous to classical PCA

    • FPC Scores are scalars that explain function-to-function variation

    • Eigenfunctions explain the time (longitudinal) variation

    • We fit models with the FPC scores, cluster them, graph them, just like any other continuous data…

    • For F-DOE we fit the FPC scores as functions of the DOE factors using (FPC score) X (Eigenfunctions) as intermediate formulas, and (Modeled FPC score) X (Eigenfunctions) as final prediction formula

    Functional Data Analysis

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Responses for DoE

    Where do we find Functional data?

    Sensor data over timeSpectral Data

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Spectral Data

    We could easily summarize this dataand make a model on the mean, minor max.

    Why should we care about Functional data?

    With that summarization we are notusing a big portion of our data. What if the shape is important?

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • How do we analyze Functional data?

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    • Convert a “stream” of data points into a function by fitting Splines or Fourier basis functions.

    • Create Functional Principal Components of the basis function

  • Basis Functions vs. FPCs vs. Modeled FPCs• The Basis Functions (Fourier, Splines) are just an intermediary step.

    • We choose combinations of basis/nKnots/knot locations to get a good functional fit to the original data.

    • The basis functions/coefficients themselves are too cumbersome to work with in most cases.

    • Create the corresponding FPCs and that is all you work with afterwards.

    • FPCs are much simpler (lower dimensionality) and easy to work with.

    • Modeled FPCs are created by treating the FPCs as the responses (Ys) for models fitting DOE factors (Xs). This allows us to use the DOE factors to predict the shapes defined by the Basis Functions in our final model.

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + 1.0 � 𝐸𝐸1 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + 1.0 � 𝐸𝐸1 𝑋𝑋 + 0.05 � 𝐸𝐸2 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + 1.0 � 𝐸𝐸1 𝑋𝑋 + 0.05 � 𝐸𝐸2 𝑋𝑋 + 0.5 � 𝐸𝐸3 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + 1.0 � 𝐸𝐸1 𝑋𝑋 + 0.05 � 𝐸𝐸2 𝑋𝑋 + 0.5 � 𝐸𝐸3 𝑋𝑋 + 012 � 𝐸𝐸4 𝑋𝑋

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Simple Case Study Based on Real DataUsing the Functional Principal Components

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    FPCs efficiently summarize your functional data in a few components, but how do we use these to help analyze our data?

    Milling Profile

    60

    Spec

    100

    120

    140

    160

    180

    0 5 10 15 20

    Time

    https://www.netzsch.com

    Milling Chamber

    Holding Vessel

    Example DoE response

  • Example DoE Response

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    https://www.netzsch.com

    Milling Chamber

    Holding Vessel

    Attempting to get a milling process into spec quickly and stay in spec for a long time without having to make adjustments.

    Milling Profile

    60

    Spec

    100

    120

    140

    160

    180

    0 5 10 15 20

    Time

    Ideal

    Worst

  • Example DoE Response

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Attempting to get a milling process into spec quickly and stay in spec for a long time.

  • 17 Unique-Trial Definitive Screening

    Design of Experiments

    Time

    Size

    (nm

    )

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Size vs Time Datafor Each Batch/Trial

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    B-Spline on Initial DataOne Knot – Cubic Order

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Single Eigenfunction and Associated FPCs for each Batch

    MoreDifferent: +79

    Similar: -58 ± 2

    𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + FPC1 � 𝐸𝐸1 𝑋𝑋

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Model the FPCs as functions of the DOE factors

    • Fit Splines or Fourier Basis functions to the raw longitudinal data

    • Use Functional Principal Components Analysis to create FPC Scores and Eigenfunctions• Eigenfunctions explain the longitudinal (e.g. time) variation• FPC Scores are scalars that explain the function-to-function variation

    • Model the FPC scores fitting the DOE factors using • Generalized (penalized) regression such as LASSO, or• Gaussian Process (Kriging)

    • When displaying DOE factor predictions AND shape of response vs. longitudinal variable, the intermediate model is of the form𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + FPC1 � 𝐸𝐸1 𝑋𝑋 + FPC2 � 𝐸𝐸2 𝑋𝑋 + …

    and the final model is of the form 𝑌𝑌1 𝑋𝑋 = µ 𝑋𝑋 + (Modeled FPC1) � 𝐸𝐸1 𝑋𝑋 + (Modeled FPC2) � 𝐸𝐸2 𝑋𝑋 + …

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    "Size/nm FPC 1 Prediction Formula" * "Size/nm Eigenfunction 1" + "Size/nm Mean Formula"

    "Size/nm FPC 1" * "Size/nm Eigenfunction 1" + "Size/nm Mean Formula"

    Model the FPCs as functions of the DOE factorsBatch 2899

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Model the FPCs as functions of the DOE factorsBatch 2899

    DoE FactorsFunctional Data Curve

    FPC Score

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Model the FPCs as functions of the DOE factorsBatch 2908

    DoE FactorsFunctional Data Curve

    FPC Score

  • Final Prediction Model

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    DoE FactorsFunctional Data Curve

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

  • Complex Case Study using Simulation Data128-Trial Space-Filling DOE in Six Factors + Time

    128 Computer Simulations Split into 3 Subsets: 90 Training, 30 Validation(Tune), and 8 Test

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Validation 90-30-8

    Train

    Validate(Tune)

    Test

  • 128 Unique-Trial Space-FillingDesign of Experiments

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Y vs Time Datafor Each Trial

  • 128 Simulations Split into 3 Subsets: 90 Training, 30 Validation(Tune), and 8 Test

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Y vs. Time

    Y

    0.00

    0.25

    0.50

    0.75

    1.00

    0.00

    0.25

    0.50

    0.75

    1.00

    0.00

    0.25

    0.50

    0.75

    1.00

    Train

    Validate(Tune)

    Test

    0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

    Time

  • Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Fourier Basis Model

    on Initial Data

    90 Training,30 Validation,

    8 Test

  • FPCs fit as function of DOE factors using Gaussian Process Model

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    DoE FactorsFunctional Data Curve

    Test Trial #16

    Overlay of simulation data on top of

    Functional Data Curve

  • FPCs fit as function of DOE factors using Gaussian Process Model

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    DoE FactorsFunctional Data Curve

    Test Trial #2

    Overlay of simulation data on top of

    Functional Data Curve

  • FDA, Neural, & Gaussian Process Model Predictions - All Fit to Same 90 Trial Training Subset -

    Overlaid on Y vs. Time

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Test Trial #2 data NOT used in any

    of the models

  • Functional Data Analysis Conclusion

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    • Functional data shows up in many forms such as sensor data, spectral data, almost any response in a longitudinal order

    • Often times these data are summarized to allow for analysis. This summarization does not take advantage of all the data that has been collected and can lead to missing out on effects of the shape of data.

    • When combined with a Design of Experiments (DoE) one can develop models that predict FDA curve shape as a function of the design factors.

  • Thank You

    Questions?

    Copyright © 2018, SAS Institute, Inc. All rights reserved.

    Functional Data Analysis �for Design of Experiments�Slide Number 2First ran into Functional Data 13 Years ago at the Army’s Edgewood Chemical Biological CenterSlide Number 4What is Functional Data Analysis?��Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework each sample element is considered to be a function.Functional Data Analysis seminal work by �James O. Ramsay and Bernard W. SilvermanSlide Number 7Slide Number 8Where do we find Functional data?Why should we care about Functional data?Slide Number 11Basis Functions vs. FPCs vs. Modeled FPCs�Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Thank You��Questions?