The VAO is operated by the VAO, LLC. Ashish Mahabal ([email protected])[email protected]...

13
The VAO is operated by the VAO, LLC. Ashish Mahabal ([email protected] ) Ciro Donalek Matthew Graham Ray Plante George Djorgovski Data 2 Knowledge study project VAO-LSST Meeting, NOAO, 24 March 2011

Transcript of The VAO is operated by the VAO, LLC. Ashish Mahabal ([email protected])[email protected]...

Page 1: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

The VAO is operated by the VAO, LLC.

Ashish Mahabal ([email protected])

Ciro DonalekMatthew Graham

Ray PlanteGeorge Djorgovski

Data 2 Knowledge study project

VAO-LSST Meeting, NOAO, 24 March 2011

Page 2: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

2

Goals

• Feasibility study•What is out there• What is needed

• Milestones• What can be done

Page 3: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

Exploration of observable parameter spaces and searches for rare or new types of objects

Djorgovski

Page 4: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

4

Overview – many connections

Astroinformatics (next meeting in Sep. 2011) VOStat and other R/Statistics tools Data challenges Various sky surveys

Related issues Semantics Classification/characterization Distributed data GPUs

Focus on time domain

Page 5: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

Focus on time-domain5

Expertise, and it encompasses all aspects of data mining (save one)Plus, real-time forces us to be fast.

Portfolio building – growing columns of tablesBayesian networks utilizing auxiliary informationLightcurve techniques for characterizing objects

Page 6: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

Missing stat and CS tools6

Page 7: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

Missing stat and CS tools7

Bootstrap aggregatingMixture of expertsBoostingSimulated annealingSemi-supervised learning….

From IVOA KDD User guide for Data Mining (Nick Ball)

Page 8: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

8

Science goal: to solve the growing gap between the huge generation of data and our understanding of it

Data Gathering (e.g., new generation instruments …)

Data Farming: Storage/ArchivingIndexing, SearchabilityData Fusion, Interoperability, ontologies, etc.

Data Mining (or Knowledge Discovery in Databases):Pattern or correlation searchClustering analysis, automated classificationOutlier / anomaly searchesHyperdimensional visualizationData visualization and understanding

Computer aided understandingKDDEtc.New Knowledge

Data storage , PbytesData access >103 access

Scalability: Petaflops, ExaflopsComputing power (multicore)Algorithm: parallelismVisualization: N-dimensional

Page 9: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

9

Currently on the plate

• DAME• Knime (Konstanz Information Miner)• Orange (Visual/python)• Weka (ML/Java)• Rapidminer (standalone)

Page 10: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

10

Comparison matrix for DM/Viz tools

Accuracy Scalability Interpretability Usability Robustness Versatility Speed Popularity

Page 11: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

11

Related activities

Skyalert integration (Graham) – adding data and methods Solicitation of examples from community

WD, Blazars’ example Making R more astronomy friendly

Various datasets Differing number of rows, columns For supervised/unsupervised classification

TA on GPUs – incorporate in pipeline

Page 12: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

Slide from Budavari12

CUDA zone, PyCUDA, …

Page 13: The VAO is operated by the VAO, LLC. Ashish Mahabal (aam@astro.caltech.edu)aam@astro.caltech.edu Ciro Donalek Matthew Graham Ray Plante George Djorgovski.

March 23, 2011Ashish Mahabal

VAO People working on this13

• Ashish Mahabal, Ciro Donalek, Matthew Graham, George Djorgovski (Caltech)

• Ray Plante (NCSA)

• But we are in touch with many others in astro/CS/stats and relying on many groups including LSST transients and informatics working groups