Curve Estimation

download Curve Estimation

of 41

Transcript of Curve Estimation

  • 8/9/2019 Curve Estimation

    1/41

    Introduction to Curve Estimation

    Wilcoxon score

    Density

    700 800 900 1000 1100 1200 1300

    0.0

    00

    0.0

    02

    0.0

    04

    0.0

    06

  • 8/9/2019 Curve Estimation

    2/41

    Michael E. Tarter & Micheal D. Lock

    Model-Free Curve Estimation

    Monographs on Statistics and Applied Probability 56

    Chapman & Hall, 1993.

    Chapters 14.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 1

  • 8/9/2019 Curve Estimation

    3/41

    Outline

    1. Generalized representation

    2. Short review on Fourier series

    3. Fourier series density estimation

    4. Kernel density estimation

    5. Optimizing density estimates

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 2

  • 8/9/2019 Curve Estimation

    4/41

    Generalized representation

    Estimation versus Specification

    !

    We are familiar

    with its theory

    and application.

    ee

    ee

    ee

    eeu

    How can we be

    sure about the

    underlying distribution?

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 3

  • 8/9/2019 Curve Estimation

    5/41

    Usual density representation:

    composed of elementary functions

    usually in closed form

    finite and rather small number of personalized parameters

    Generalized representation:

    infinite number of parameters

    usually: representation as infinite sum of elementary functions Fourier series density estimation Kernel density estimation

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 4

  • 8/9/2019 Curve Estimation

    6/41

    Complex Fourier series

    f(x) =

    k=

    Bk exp{2ikx}

    x [0, 1].

    {Bk

    }are called Fourier coefficients.

    Why can we represent any function in such a way?

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 5

  • 8/9/2019 Curve Estimation

    7/41

    Some useful features:

    k = exp{2ikx}, {k} forms an orthonormal sequence, that is

    10

    exp{2i(k l)x}dx = 1 k = l0 k = l

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 6

  • 8/9/2019 Curve Estimation

    8/41

    {k} is complete, that is

    limm

    10

    f(x)

    mk=m

    Bk exp{2ikx}2 dx = 0

    Therefore, we can expand every function f(x), x [0, 1], in space L2with Fourier series.

    L2 function assumes that f2

    =|f(x)|

    2

    dx < , which holds formost of the curves we are interested in.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 7

  • 8/9/2019 Curve Estimation

    9/41

    Fourier series density estimation

    Given an iid sample {Xj}, j = 1, . . . , n, with support on [0, 1](otherwise rescale).

    Representation of true density:

    f(x) =

    k=

    Bk exp{2ikx} with Bk =

    1

    0

    f(x)exp{2ikx}dx

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 8

  • 8/9/2019 Curve Estimation

    10/41

    Estimator:

    f(x) =

    k=

    bkBk exp{2ikx} with Bk = 1n

    nj=1

    exp{2ikXj}

    {bk} are called multipliers.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9

  • 8/9/2019 Curve Estimation

    11/41

    Estimator:

    f(x) =

    k=

    bkBk exp{2ikx} with Bk = 1n

    nj=1

    exp{2ikXj}

    {bk} are called multipliers.

    Easy computation:

    Use exp

    {2ikXj

    }= cos(2kXj)

    i sin(2kXj) and Bk = B

    k

    (complex conjugate). B0 1.Therefore, computation only needed for positive k.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 9

  • 8/9/2019 Curve Estimation

    12/41

    Bk is unbiased estimator for Bk.

    However, f is usually biased because number of terms is either infinite

    or unknown.

    Another advantage of sample coefficients {

    Bk}: Same set leads tovariety of other estimates.

    Thats where multipliers come into play!

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 10

  • 8/9/2019 Curve Estimation

    13/41

    Fourier multipliers

    Raw density estimator:

    bk =1 |k| m0 |k| > m f(x) =

    mk=m

    Bk exp{2ikx}

    Evaluate f(x) in equally spaced points x [0, 1].

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 11

  • 8/9/2019 Curve Estimation

    14/41

    Estimating the expectation

    =

    1

    0

    xf(x)dx = = 12

    +m

    k=mk=0

    1

    2ikBk

    bk =

    (2ik)1 |k| m, k = 00 |k| > m

    , evaluate at x = 0 and add1

    2.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 12

  • 8/9/2019 Curve Estimation

    15/41

    Advantages of multipliers

    Examination of various distributional features without recomputing

    sample coefficients.

    Optimize the estimation procedure.

    Smoothing of estimated curve vs. higher contrast.

    Some examples . . .

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 13

  • 8/9/2019 Curve Estimation

    16/41

    Raw Fourier series density estimator with m = 3

    Density

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 14

  • 8/9/2019 Curve Estimation

    17/41

    Raw Fourier series density estimator with m = 7

    Density

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 15

  • 8/9/2019 Curve Estimation

    18/41

    Raw Fourier series density estimator with m = 15

    Density

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 16

  • 8/9/2019 Curve Estimation

    19/41

    Kernel density estimation

    Histograms are crude kernel density estimators where the kernel is a

    block (rectangular shape) somehow positioned over a data point.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17

  • 8/9/2019 Curve Estimation

    20/41

    Kernel density estimation

    Histograms are crude kernel density estimators where the kernel is a

    block (rectangular shape) somehow positioned over a data point.

    Kernel estimators:

    use various shapes as kernels

    place the center of a kernel right over the data point

    spread the influence of one point with varying kernel width

    contribution from each kernel is summed to overall estimate

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 17

  • 8/9/2019 Curve Estimation

    21/41

    q q q q q q q q q q

    0 5 10 15

    0.

    0

    0.

    5

    1.

    0

    1.

    5

    Gaussian kernel density estimate

    q q q q q q q q q q

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 18

  • 8/9/2019 Curve Estimation

    22/41

    Kernel estimator

    f(x) =1

    nh

    nj=1

    K

    x Xj

    h

    h is called bandwidth or smoothing parameter.

    K is the kernel function: nonnegative and symmetric such thatK(x)dx = 1 and

    xK(x)dx = 0.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 19

  • 8/9/2019 Curve Estimation

    23/41

    Under mild conditions (h must decrease with increasing n) the

    kernel estimate converges in probability to the true density.

    Choice of kernel function usually depends on computational criteria.

    Choice of bandwidth is more important (see literature on Kernel

    Smoothing).

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 20

  • 8/9/2019 Curve Estimation

    24/41

    Some kernel functions

    4 2 0 2 4

    0.

    0

    0.1

    0.

    2

    0.

    3

    0.4

    Gauss

    2 1 0 1 2

    0.

    0

    0.2

    0.

    4

    0.6

    0.

    8

    1.0

    Triangular

    3 2 1 0 1 2 3

    0.0

    0

    0.0

    5

    0.

    10

    0.

    15

    0.2

    0

    0.2

    5

    0.

    30

    Epanechnikov

    K(y) =3(1 y2/5)

    4

    5, |y| 5

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 21

  • 8/9/2019 Curve Estimation

    25/41

    Duality of Fourier series and kernel methodology

    f(x) =k

    bkBk exp{2ikx}

    =1

    n

    nj=1

    k

    bk exp{2ik(x Xj)}

    With h = 1:K(x) =

    k

    bk exp{2ikx}

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 22

  • 8/9/2019 Curve Estimation

    26/41

    The Dirichlet kernel

    The raw density estimator has kernel KD:

    KD(x) =

    m

    k=m exp{2ikx} = =

    sin((2m + 1)x)

    sin(x)

    where limx0

    KD(x) = 2m + 1.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 23

  • 8/9/2019 Curve Estimation

    27/41

    Dirichlet kernels

    0.4 0.2 0.0 0.2 0.4

    10

    0

    10

    20

    30

    Dirichlet with m = 4

    0.4 0.2 0.0 0.2 0.4

    10

    0

    10

    20

    30

    Dirichlet with m = 8

    0.4 0.2 0.0 0.2 0.4

    10

    0

    10

    20

    30

    Dirichlet with m = 12

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 24

  • 8/9/2019 Curve Estimation

    28/41

    Differences between kernel and Fourier representation

    Fourier estimates are restricted to finite intervals while some kernels

    are not.

    As Dirichlet kernel shows, kernel estimates can result in negative

    values if the kernel function takes on negative values.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 25

  • 8/9/2019 Curve Estimation

    29/41

    Kernel: K controls shape, h controls spread of kernel.

    Two-step strategy: Select kernel function and choose data-

    dependent smoothing parameter.

    Fourier: m controls both shape and spread.

    Goodness-of-fit can be governed by entire multiplier sequence.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 26

  • 8/9/2019 Curve Estimation

    30/41

    Optimizing density estimates

    Optimization with regard to weighted mean integrated square error

    (MISE):

    J(f , f , w) = E

    10

    f(x) f(x)2 w(x)dx.

    w(x) is nonnegative weight function to emphasize estimation over

    subregions. First consider optimization with w(x) 1.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 27

  • 8/9/2019 Curve Estimation

    31/41

    The raw density estimator again

    J(f , f) = 2mk=1

    1

    n 1 |Bk|2

    + 2

    k=m+1

    |Bk|2

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 28

  • 8/9/2019 Curve Estimation

    32/41

    The raw density estimator again

    J(f , f) = 2mk=1

    1

    n 1 |Bk|2

    + 2

    k=m+1

    |Bk|2

    !

    Variance component

    ee

    ee

    ee

    eeu

    Bias component

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 28

  • 8/9/2019 Curve Estimation

    33/41

    Single term stopping rule

    Estimate Js = J(fs, f)J(fs1, f), gain of including sth Fouriercoefficient. MISE is decreased if Js is negative.

    Include terms only if their inclusion results in negative difference.

    Multiple testing problem!

    Inclusion of higher-order terms results in rough estimate.

    Suggestions: Stop after t successive nonnegative inclusions. Choice

    of t is data/curve dependent.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 29

  • 8/9/2019 Curve Estimation

    34/41

    Other stopping rules

    Different considerations about estimating MISE lead to various

    optimization concepts.

    Not at all generally superior to single term rule. Depends on curve

    features.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 30

  • 8/9/2019 Curve Estimation

    35/41

    Multiplier sequences

    So far: raw estimate with bk = 1 or bk = 0.

    Now allow {bk} to be sequence tending to zero with increasing k. Concepts depend again on considerations about MISE.

    Question of advisable stopping rule remains.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 31

  • 8/9/2019 Curve Estimation

    36/41

    Two-step strategy with multiplier sequence

    1. Estimate with raw estimator and one of former stopping rules.

    2. Applying a multiplier sequence to the remaining terms will alwaysimprove the estimate.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 32

  • 8/9/2019 Curve Estimation

    37/41

    Weighted MISE

    J(f , f , w) = E

    1

    0

    f(x) f(x)

    2w(x)dx.

    Weight functions w(x) emphasize subregions of support interval

    (e.g. left or right tails).

    Turns out that unweighted MISE leads to great accuracy in regionswith high density.

    Weighting will improve estimate when other regions are of interest.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 33

  • 8/9/2019 Curve Estimation

    38/41

    Data transformation

    Data needs rescaling to [0, 1]. Always possible:Xjmin(X)

    max(X)min(X)

    Next approach: Transform data in nonlinear manner to emphasize

    subregions.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 34

  • 8/9/2019 Curve Estimation

    39/41

    Data transformation

    Data needs rescaling to [0, 1]. Always possible:Xjmin(X)

    max(X)min(X)

    Next approach: Transform data in nonlinear manner to emphasize

    subregions.

    Let G : [a, b] [0, 1] be strictly increasing one-to-one functionwith g(x) = dG(x)

    dx.

    k(G(x)) = exp{2ikG(x)} is orthonormal on [a, b] with respectto weight g(x).

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 34

  • 8/9/2019 Curve Estimation

    40/41

    Transformation and optimization

    Data transformation with G(x) is equivalent to weighted MISE

    with w(x) = 1/g(x).

    Only difference to unweighted MISE: Computation of Fourier

    coefficients involves application of G(x).

    Strategy: Transform data, optimize with unweigthed procedures,retransform.

    Most efficient: Transform data to unimodal symmetric distribution.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 35

  • 8/9/2019 Curve Estimation

    41/41

    Application to gene expression data

    Problem: Fitting two distributions to another by removing a

    minimal number of data points.

    Idea: Estimate the two densities in an optimal manner. Remove

    points until goodness-of-fit is high with regard to modified MISE.

    Stefanie Scheid - Introduction to Curve Estimation - August 11, 2003 36