A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression,...
Transcript of A project of “Imaging Regression, Classification and ... · A project of “Imaging Regression,...
A project of “Imaging Regression, Classification and Clustering,”
a sub-WG of the Imaging WG
1
Active members:•Ashish Mahabal•Julian Faraway•Jiayang Sun
•Grace Wang•Xiaofeng Wang•Lingsong Zhang
TransientNon variableBrighter obj. Light Curves
StatisticsImages
• Objectives• Classification, in Real-time, using minimal data• Impact in terms of larger pictures
• Challenges– Heterogeneity of data sources
• CRTS numbers, LSST numbers, minimal overlap (quote DASCH), part of larger set of parameters
– Large and massive amount of light curves– Missing data, measurement errors and irregularly
sampled data, ….• Data Sets– 3 Data sources and 2.67 types (light curves, stats, images)
2
Data
1. Transients from CRTS (3 types)2. Mostly non-variables: Objects @ random
locations – also used by Astro group (2 types)3. Brighter samples of CVs and RR-Lyrae –
important for connecting datasets (e.g. many brighter CRTS objects will saturate LSST, just like almost all DASCH sources are saturated in CRTS) [Some are transients and some are periodic]
3
What is a transient?
Fast transient (flaring dM), CSS080118:112149–131310
4 individual exposures, separated by 10 min Light curve
One that has a large brightness change (delta-magnitude) within a short timespan (small delta-time)
4
Data CharacteristicsClassifying (all) transients (in real time) is hard • Too many ‘ordinary’ transients
– Finding needles in a hay stack
• Too many possible ‘parameters’ – e.g. colors, positions, flux = # photons
CRTS --> LSST 5
LBV
AGNAsteroids
RotationEclipse
Microlensing Eruptive PulsationSecular
(DAV) H-WDs
Variability Tree
NovaeN
SymbioticZAND
Dwarf novae
UG
Eclipse
Asteroid occultation
Eclipsing binary
Planetary transits
EA
EB
EW
Rotation
ZZ CetiPG 1159
Solar-like
(PG1716+426, Betsy)long period sdB
V1093 Her
(W Vir)Type II Ceph.δ Cepheids
RR Lyrae
CW
Credit : L. Eyer & N. Mowlavi (03/2009)
(updated 04/2013) δ Scuti
γ Doradus
Slowlypulsating B stars
α Cygni
β Cephei
λ Eri
SX Phoenicis
Hot OB Supergiants
ACYG
BCEP
SPBe
GDOR
DST
PMSδ Scuti
roAp
Miras
Irregulars
Semi-regulars
M
SRL
RV
SARVSmall ampl. red var.
(DO,V GW Vir)He/C/O-WDs
PV TelHe star
Be stars
RCB
GCASFU
UV Ceti
Binary red giants
α 2 Canes VenaticorumMS (B8-A7) withstrong B fields
SX ArietisMS (B0-A7) withstrong B fields
Red dwarfs(K-M stars)
ACV
BY Dra
ELL
FKCOMSingle red giants
WR
SXA
β Per, α Vir
RS CVn
PMS
S Dor
Eclipse
(DBV) He-WDs
V777 Her
(EC14026)short period sdB
V361 Hya
RV Tau
Photom. Period.FG SgeSakurai,V605 Aql
R Hya (Miras)δ Cep (Cepheid)
DY Per
Supernovae
SN II, Ib, IcSN Ia
Extrinsic
Radio quiet Radio loud
Seyfert I
Seyfert 2
LINER
RLQ
BLRG
NLRG
WLRG
RQQ
OVVBL Lac
Blazar
Stars Stars
Intrinsic
CEPRR
SXPHESPB
Cataclysmic
Challenge 1: Characterize/Classify as much with as little data as possible
We concentrate here on lightcurves (time series)6
Challenge 2: Only a small fraction is rare*
• Current Status: – About 1 strong (but mostly ‘ordinary’) transient/106 sources by machine– High threshold to pick most dramatic transients (identification by human)
• Future: – With LSST, a million transients will be found per night, which is why we need
automatic classification algorithms
CRTS statistics as of May 2013: http://nesssi.cacr.caltech.edu/catalina/Stats.html
SNe Ast/Flr
Tel All OTs SNe CVs Blazars Ast/flares CV/SN AGN
Other
CSS 3390 1003 676 216 269 436 438 438
MLS 3387 479 69 75 225 600 1597 522
SSS 680 99 251 17 11 108 32 167
SNHunt 186 186 0 0 0 0 0 0
Total 7643 1767 966 308 505 1144 2067 1127
7
A Blazar: a variability-based counterpart of a previously unidentified Fermi source.
8
Example: Blazar - its type was confirmed by the spectrum graph on the right
Challenge 3: A Variety of Parameters • Discovery: magnitudes, delta-magnitudes• Contextual:
– Distance to nearest star– Magnitude of the star– Color of that star– Normalized distance to nearest galaxy– Distance to nearest radio source– Flux of nearest radio source– Galactic latitude
• Follow-up– Colors (g-r, r-I, i-z etc.)
• Prior classifications (event type)• Characteristics from light-curve
– Amplitude– Median buffer range percentage– Standard deviation– Stetson k– Flux percentile ratio mid80– Prior outburst statistic
Not all parameters are always present leading to swiss-cheese like data sets
http://ki-media.blogspot.com/
New lightcurve-based parameters:•Whole curve measures•Fitted curve measures•Residual from fit measures•Cluster measures•Other
9
Challenge 4: Lightcurve demonstrating upper limits
10
Our ApproachesMethods (recall our objective: Classification): 1. Modern EDA before classification on stats, lightcurves in 1-d
and high-d (graphical computation, SiZer and PP) 2. Improvement from 4 directions:
1. Better with new derived statistics2. Better classification procedure (single, ensemble) 3. Better with previously ignored information
‘semi-supervised’ learning4. Better in terms of using less or incremental approachNotes: Classification based on derived statistics or entire
curve (2-4)3. Methodology Development
11
EDA on Non-Variables
12
EDA on a transient (change si sudden)
13
EDA on a transient with changes that can take a few ys
14
EDA on a sub-group: active galactic nuclei, which includes blazar
15
Derive new statistics
• How?– Fit curves (by FDA, NP, Gaussian process
modeling)• FDA: registration• NP: incorporate known variances - we have a cute
method• GPM: use the known variances to build the prior
– Residuals:• Variability, outliers/signals, …
– Others16
Modeling the Light CurvesGaussian Process Regression
Can tweak:
• Smoothness• Signal variance• Error variance
Unusually, thisIs known
17
Modeled Curve
ResidualsSummary measures
Generation of new summary measures
FittedSummary measures
Clusters of observationsIn 30 minute groups of 4
Summary measures
18
New Summary Statistics1. Whole curve measures
Median magnitude (mag); mean of absolute differences of successive observed magnitude; the maximum difference magnitudes
2. Fitted curve measuresScaled total variation scaled by number of days of observation; range of
fitted curve; maximum derivative in the fitted curve
3. Residual from fit measuresThe maximum studentized residual; SD of residuals; skewness of residuals;
Shapiro-Wilk statistic of residuals
4. Cluster measuresFit the means within the groups (up to 4 measurements); and then take
the logged SD of the residuals from this fit; the max absolute residuals from this fit; total variation of curve based on group means scaled by range of observation
5. Other 19
20
21
22
23
24
High dimensional views via modern graphics and PP
Available Data with non-variables and7 transient types
Training SetN=2480 Test Set
N=1240
Randomsplit
Richards Richards+New
LDA 63 76
Tree 69 74
SVM 76 82
Percent correctly classified in the test set:
Others: Multinomial logist DA + New Ensembles
25
VdCC- another approach to incorporate known variances
26
• Idea:
Whole Curve Comparisons
• PfClust -> PfClassification• Functional Centroid Method (FCC)–Model m(x) and of the whole curves for each class– Develop SCB for each m(x) – Define a functional distance measure between curves– Classify a new curve to one of the existing classes or a
new class of curves based on the distances
27
Development of Functional Method
Exploration Step: are they different and separable?
• Directly estimate the (pair-wise) mean difference between classes
• Bootstrap method to estimate the (point-wise) confidence intervals.
28
Selective comparison
29
Selective comparison
30
Selective comparison
31
Conclusion• Developed new derived statistics• Applied better modeling/classification procedures
– GRM, … • Moved on 5 methodology development directions:
– PfClassification, Functional CuvClass, VdCC, Ensemble, Scale-space Comparison
• Allowed for incremental classification
Our team had a great time working together and expect continuation of research and ties that will contribute to Statistics and sciences beyond this light curve analysis.
A revision of this talk with additional new work will be presented at the JSM ☺
32