A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression,...

32
A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG 1 Active members: Ashish Mahabal Julian Faraway Jiayang Sun Grace Wang Xiaofeng Wang Lingsong Zhang

Transcript of A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression,...

Page 1: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

A project of “Imaging Regression, Classification and Clustering,”

a sub-WG of the Imaging WG

1

Active members:•Ashish Mahabal•Julian Faraway•Jiayang Sun

•Grace Wang•Xiaofeng Wang•Lingsong Zhang

Page 2: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

TransientNon variableBrighter obj. Light Curves

StatisticsImages

• Objectives• Classification, in Real-time, using minimal data• Impact in terms of larger pictures

• Challenges– Heterogeneity of data sources

• CRTS numbers, LSST numbers, minimal overlap (quote DASCH), part of larger set of parameters

– Large and massive amount of light curves– Missing data, measurement errors and irregularly

sampled data, ….• Data Sets– 3 Data sources and 2.67 types (light curves, stats, images)

2

Presenter
Presentation Notes
----- Meeting Notes (5/20/13 17:06) ----- [1] larger picture in next couple of slides [2] Heterogeneity: CRTS, LSST (fainter than CRTS’s), DASCH is brighter, [3] Massive: > 500m+ light curves [4] Missing, measurement errors are heteroscedastic, in CRTS, (4 images in 30 s)
Page 3: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Data

1. Transients from CRTS (3 types)2. Mostly non-variables: Objects @ random

locations – also used by Astro group (2 types)3. Brighter samples of CVs and RR-Lyrae –

important for connecting datasets (e.g. many brighter CRTS objects will saturate LSST, just like almost all DASCH sources are saturated in CRTS) [Some are transients and some are periodic]

3

Presenter
Presentation Notes
RR Lyrae are periodic
Page 4: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

What is a transient?

Fast transient (flaring dM), CSS080118:112149–131310

4 individual exposures, separated by 10 min Light curve

One that has a large brightness change (delta-magnitude) within a short timespan (small delta-time)

4

Presenter
Presentation Notes
Object in the central square is the transient. Other objects in the 4 images are seen not to change their brightness
Page 5: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Data CharacteristicsClassifying (all) transients (in real time) is hard • Too many ‘ordinary’ transients

– Finding needles in a hay stack

• Too many possible ‘parameters’ – e.g. colors, positions, flux = # photons

CRTS --> LSST 5

Presenter
Presentation Notes
Finding large number of transients has become possible only in the last few years The numbers will ramp up dramatically (e.g. LSST, SKA = square kilometer array) SKA will be a survey at radio wavelengths
Page 6: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

LBV

AGNAsteroids

RotationEclipse

Microlensing Eruptive PulsationSecular

(DAV) H-WDs

Variability Tree

NovaeN

SymbioticZAND

Dwarf novae

UG

Eclipse

Asteroid occultation

Eclipsing binary

Planetary transits

EA

EB

EW

Rotation

ZZ CetiPG 1159

Solar-like

(PG1716+426, Betsy)long period sdB

V1093 Her

(W Vir)Type II Ceph.δ Cepheids

RR Lyrae

CW

Credit : L. Eyer & N. Mowlavi (03/2009)

(updated 04/2013) δ Scuti

γ Doradus

Slowlypulsating B stars

α Cygni

β Cephei

λ Eri

SX Phoenicis

Hot OB Supergiants

ACYG

BCEP

SPBe

GDOR

DST

PMSδ Scuti

roAp

Miras

Irregulars

Semi-regulars

M

SRL

RV

SARVSmall ampl. red var.

(DO,V GW Vir)He/C/O-WDs

PV TelHe star

Be stars

RCB

GCASFU

UV Ceti

Binary red giants

α 2 Canes VenaticorumMS (B8-A7) withstrong B fields

SX ArietisMS (B0-A7) withstrong B fields

Red dwarfs(K-M stars)

ACV

BY Dra

ELL

FKCOMSingle red giants

WR

SXA

β Per, α Vir

RS CVn

PMS

S Dor

Eclipse

(DBV) He-WDs

V777 Her

(EC14026)short period sdB

V361 Hya

RV Tau

Photom. Period.FG SgeSakurai,V605 Aql

R Hya (Miras)δ Cep (Cepheid)

DY Per

Supernovae

SN II, Ib, IcSN Ia

Extrinsic

Radio quiet Radio loud

Seyfert I

Seyfert 2

LINER

RLQ

BLRG

NLRG

WLRG

RQQ

OVVBL Lac

Blazar

Stars Stars

Intrinsic

CEPRR

SXPHESPB

Cataclysmic

Challenge 1: Characterize/Classify as much with as little data as possible

We concentrate here on lightcurves (time series)6

Presenter
Presentation Notes
Some changes inside the node, extrinsic: Ashish please add …. Color notes + intrinsic vs. extrinsic Intrinsic variation is when something happens inside the object Extrinsic when it is due to some movement (e.g. eclipse or rotation) AGNs are Active Galactic Nuclei Blazars are radio-loud (emit more energy at radio wavelengths) Those two categories and supernovae are extra-galactic Cataclysmic Variables (CVs) and RR-Lyrae that we see are from within our own Galaxy
Page 7: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Challenge 2: Only a small fraction is rare*

• Current Status: – About 1 strong (but mostly ‘ordinary’) transient/106 sources by machine– High threshold to pick most dramatic transients (identification by human)

• Future: – With LSST, a million transients will be found per night, which is why we need

automatic classification algorithms

CRTS statistics as of May 2013: http://nesssi.cacr.caltech.edu/catalina/Stats.html

SNe Ast/Flr

Tel All OTs SNe CVs Blazars Ast/flares CV/SN AGN

Other

CSS 3390 1003 676 216 269 436 438 438

MLS 3387 479 69 75 225 600 1597 522

SSS 680 99 251 17 11 108 32 167

SNHunt 186 186 0 0 0 0 0 0

Total 7643 1767 966 308 505 1144 2067 1127

7

Presenter
Presentation Notes
updated
Page 8: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

A Blazar: a variability-based counterpart of a previously unidentified Fermi source.

8

Example: Blazar - its type was confirmed by the spectrum graph on the right

Presenter
Presentation Notes
Blazar lightcurve. Hight state simply means its bright state, right figure is the spectrum that confirms the object to be a blazar (mainly blue continuum)
Page 9: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Challenge 3: A Variety of Parameters • Discovery: magnitudes, delta-magnitudes• Contextual:

– Distance to nearest star– Magnitude of the star– Color of that star– Normalized distance to nearest galaxy– Distance to nearest radio source– Flux of nearest radio source– Galactic latitude

• Follow-up– Colors (g-r, r-I, i-z etc.)

• Prior classifications (event type)• Characteristics from light-curve

– Amplitude– Median buffer range percentage– Standard deviation– Stetson k– Flux percentile ratio mid80– Prior outburst statistic

Not all parameters are always present leading to swiss-cheese like data sets

http://ki-media.blogspot.com/

New lightcurve-based parameters:•Whole curve measures•Fitted curve measures•Residual from fit measures•Cluster measures•Other

9

Page 10: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Challenge 4: Lightcurve demonstrating upper limits

10

Presenter
Presentation Notes
Red triangles indicate upper limits (observations done, but no object detected) Such truncated light curves have not been studied presently by our group. Its part of the near future plan
Page 11: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Our ApproachesMethods (recall our objective: Classification): 1. Modern EDA before classification on stats, lightcurves in 1-d

and high-d (graphical computation, SiZer and PP) 2. Improvement from 4 directions:

1. Better with new derived statistics2. Better classification procedure (single, ensemble) 3. Better with previously ignored information

‘semi-supervised’ learning4. Better in terms of using less or incremental approachNotes: Classification based on derived statistics or entire

curve (2-4)3. Methodology Development

11

Page 12: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

EDA on Non-Variables

12

Presenter
Presentation Notes
Talk about variability in sample size and sample spacing. Talk about natural variation even in non-variables. Mention that some cases might even be variables. The gaps are indicative of times when an object is not seen from Earth. The clusters are annual clusters, with the data spanning about 8 years.
Page 13: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

EDA on a transient (change si sudden)

13

Presenter
Presentation Notes
Blue line is least squares fit. Point out how flares are noticeable in some case but not so obvious in others.
Page 14: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

EDA on a transient with changes that can take a few ys

14

Presenter
Presentation Notes
Note the aperiodic pattern of variation
Page 15: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

EDA on a sub-group: active galactic nuclei, which includes blazar

15

Presenter
Presentation Notes
Active Galatic Nucleus examples. Note increasing magnitude. Won’t show examples of other types.
Page 16: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Derive new statistics

• How?– Fit curves (by FDA, NP, Gaussian process

modeling)• FDA: registration• NP: incorporate known variances - we have a cute

method• GPM: use the known variances to build the prior

– Residuals:• Variability, outliers/signals, …

– Others16

Page 17: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Modeling the Light CurvesGaussian Process Regression

Can tweak:

• Smoothness• Signal variance• Error variance

Unusually, thisIs known

17

Presenter
Presentation Notes
Talk about general idea of modeling the light curves. GPR is just one approach that allows this. Talk about how the error variance is known in this case (which is unusual in Statistics). Talk about how GPR can use this information directly. Approach needs more work – currently using Lowess to model the curves.
Page 18: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Modeled Curve

ResidualsSummary measures

Generation of new summary measures

FittedSummary measures

Clusters of observationsIn 30 minute groups of 4

Summary measures

18

Presenter
Presentation Notes
Talk about how our new measures are based on the modeled curve. Also measures are developed from clusters of measurements in short bursts\
Page 19: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

New Summary Statistics1. Whole curve measures

Median magnitude (mag); mean of absolute differences of successive observed magnitude; the maximum difference magnitudes

2. Fitted curve measuresScaled total variation scaled by number of days of observation; range of

fitted curve; maximum derivative in the fitted curve

3. Residual from fit measuresThe maximum studentized residual; SD of residuals; skewness of residuals;

Shapiro-Wilk statistic of residuals

4. Cluster measuresFit the means within the groups (up to 4 measurements); and then take

the logged SD of the residuals from this fit; the max absolute residuals from this fit; total variation of curve based on group means scaled by range of observation

5. Other 19

Page 20: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

20

Page 21: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

21

Page 22: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

22

Page 23: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

23

Page 24: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

24

High dimensional views via modern graphics and PP

Page 25: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Available Data with non-variables and7 transient types

Training SetN=2480 Test Set

N=1240

Randomsplit

Richards Richards+New

LDA 63 76

Tree 69 74

SVM 76 82

Percent correctly classified in the test set:

Others: Multinomial logist DA + New Ensembles

25

Presenter
Presentation Notes
Talk about. Random split of data. Same methods used for each set of measures. Our measures have been added to the Richards measures. Talk about how the same default settings have been used on the classification procedures and that we expect we could optimize these to obtain even better performance.
Page 26: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

VdCC- another approach to incorporate known variances

26

• Idea:

Page 27: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Whole Curve Comparisons

• PfClust -> PfClassification• Functional Centroid Method (FCC)–Model m(x) and of the whole curves for each class– Develop SCB for each m(x) – Define a functional distance measure between curves– Classify a new curve to one of the existing classes or a

new class of curves based on the distances

27

Page 28: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Development of Functional Method

Exploration Step: are they different and separable?

• Directly estimate the (pair-wise) mean difference between classes

• Bootstrap method to estimate the (point-wise) confidence intervals.

28

Page 29: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Selective comparison

29

Page 30: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Selective comparison

30

Page 31: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Selective comparison

31

Page 32: A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression, Classification and Clustering ,” a sub-WG of the Imaging WG. 1. Active members:

Conclusion• Developed new derived statistics• Applied better modeling/classification procedures

– GRM, … • Moved on 5 methodology development directions:

– PfClassification, Functional CuvClass, VdCC, Ensemble, Scale-space Comparison

• Allowed for incremental classification

Our team had a great time working together and expect continuation of research and ties that will contribute to Statistics and sciences beyond this light curve analysis.

A revision of this talk with additional new work will be presented at the JSM ☺

32