Post on 22-Jan-2016
description
2004-06-03 CMU-CS lunch talk, Gerard Lemson
1
Computational and statistical problems for the Virtual Observatory
With contributions from/thanks to:GAVO team: Wolfgang Voges, Matthias Steinmetz, Harry Enke, Hans-Martin
AdorfJoerg Colberg (NVO@UPitt), Pat Dowler (CVO), Tony Banday (MPA),
Class X team
2004-06-03 CMU-CS lunch talk, Gerard Lemson
2
Overview
• Intro to VO• IVOA standards process• Some concrete examples, demos• Scenarios, science cases• Interesting problems
2004-06-03 CMU-CS lunch talk, Gerard Lemson
3
Intro to VO
• Very large data sets• Multi-wavelength astronomy made
easy• Federation of distributed archives.• Publication of expert services.• New software developments.• Why contribute ?• Too easy to do bad science ?
2004-06-03 CMU-CS lunch talk, Gerard Lemson
4
2004-06-03 CMU-CS lunch talk, Gerard Lemson
5
IVOA standards and specifications
• Collaboration of national VOs• Develop standards for interoperability
– publication (registry)– description (dm, ucd)– query (dal, voql)– data transfer (votable)– services (grid/web services)
• Interest groups:– architecture– applications– theory
2004-06-03 CMU-CS lunch talk, Gerard Lemson
6
Babylonian confusion
2004-06-03 CMU-CS lunch talk, Gerard Lemson
7
VO domain model as Esperanto
2004-06-03 CMU-CS lunch talk, Gerard Lemson
8
2004-06-03 CMU-CS lunch talk, Gerard Lemson
9
Workstation
Registry
RegistryQuery
Interface
VOQL Engine(parsing,splitting,planning)
Portal
WSDL/SOAP
SOAP(Web)Services
MetaDataRepository
DomainModel
Data Access Layer
ResourceModel
MPA
Simulations
Mapping,Services
ROSAT
RASSFields BSC
RASSPhotons
Mapping,Service
SDSS
SQLServer
Mapping,Services
ADQL/VOTable
(GA)VO
NGSE
VOTable
2004-06-03 CMU-CS lunch talk, Gerard Lemson
10
Protocols
• VOTable + UCD DM based XML + XSLT
• SCS/SIAP/SSAP ADQL VOQL• SkyNode• Registry resource model and
harvesting interface
2004-06-03 CMU-CS lunch talk, Gerard Lemson
11
Data models
• Targeted “small” data models– Quantity– Observation– Simulation
• Domain model as ontology • Meta-data repository• Bindings• Representations, views, transformations
2004-06-03 CMU-CS lunch talk, Gerard Lemson
12
-name : string
Standards::Category
*-baseClass 0..1
1
-possibleValue
1..*
-abbreviation : string-amount : numeric
Standards::AtomicUnit
-power : rational-amount : numeric
Standards::ComponentUnit
Standards::CompoundUnit
1-component 1..*
Standards::Unit
*
-component1
-amount : numeric
Values::AtomicQuantity
*
-unit
1
Values::Classifier
-name : string
Values::ComponentQuantity
Values::CompositeQuantity
1
-component
1..*
Values::Identifier
Values::Quantity
*
-quantity 1Values::Value
-identifier : string
Experiments::Experiment-identifier : string-documentationURL : string
Protocols::Protocol
*
-recipe
1
Experiments::Result
1
-result
*
Protocols::ConfigurationDescriptor
-identifier : string
Protocols::Objective
1
-observable
*
Experiments::Subject
*
-observable *
1
-observation
*
-name : string-isIndependent : boolean
Protocols::Variable
*
-property
1
1
-variables
1..*
Experiments::Image
Experiments::ObjectList
Experiments::ConfidenceIndication
1
-confidence
*
Experiments::ValueAssignment
1
-values
* *
-variable 1
Experiments::Measurement
1
-value
1
Experiments::Identification
*
-value1
Experiments::Classification
* -value1
Protocols::AstronomicalObservatory
Protocols::Analysis
Protocols::Callibration Protocols::Simulator
Experiments::Configuration
*
-protocol
1
1
-configuration
*
1
-configurationParameter
1
Protocols::SourceExtraction
Experiments::InputData
Experiments::TimeOrderedData
Experiments::VisibilityData
Standards::CoordinateSystem
-name : string
Standards::EnergyBand
-locator : string-description : string
Products::PhysicalArtifact
-name : string-description : string
Standards::Name
*
-subject
1
*
-artifact 1
1
-inputData
*
*
-id
1
Standards::ClassificationSystem
*
-baseClassifcation
*
*
-category
1
Standards::NamingSystem
1
-object
*
*
-phenomenon
1
*
-phenomenon
1
-identifier : string-description : string
Standards::ReferenceSystem
Protocols::InputDataType
1
-inputDataType
* *
-type
1
Standards::MagnitudeSystem
Protocols::DataProcessingProtocols::Stacking
Protocols::CrossMatching
Standards::Constant
-name : string-abbreviation : string
Standards::PhysicalConstant
*
-value1
-name : string
Types::AbstractType
Types::DatatypeTypes::Representation
-name : string
Types::Field1
-field
*
*
-type1
*
-referenceSystem0..1* -type_11
*
-type 1
Protocols::Query
Phenomenology::AtomicNumericPhenomenon
*
-phenomenon
1
Phenomenology::BaseNumericPhenomenon
Phenomenology::CategoricalPhenomenon
Phenomenology::CompositePhenomenon
Phenomenology::DecompositionalPhenomenon
Phenomenology::DerivedNumericPhenomenon
-power : integer
Phenomenology::DerivedPhenomenonComponent
Phenomenology::Identification
Phenomenology::NumericPhenomenon
-name : string-description : string
Phenomenology::Phenomenon
Phenomenology::PositionalPhenomenon
-name : string
Phenomenology::Property
Phenomenology::ScientificArtifact
Phenomenology::SpatialSubjectType
-name : string-description : string
Phenomenology::SubjectType
*
-type
1
Phenomenology::Substance
1
-property1..*
1 -components1..*
*
-phenomenon
1
Phenomenology::TangibleObject
*
-component 1
*
-phenomenon
1
1
-uncertainty1
Experiments::Uncertainty
2004-06-03 CMU-CS lunch talk, Gerard Lemson
13
«key» -identifier : string
Experiments::Experiment
Experiments::Result
-result*
Experiments::Measurement
Experiments::ValueAssignment
Values::Quantity
1
-value
1
Values::ClassifierExperiments::Classification -value
1
Experiments::Subject
-values*
-observation*
«key» -identifier : string-documentationURL : string
Protocols::Protocol-recipe
1
«key» -identifier : string
Protocols::Objective
-observable*
-observable
*
«key» -name : string-isIndependent : boolean
Protocols::Variable-variable
1
-variables1..*
«key» -name : string-description : string
Phenomenology::SubjectType-type
1
«key» -name : string
Phenomenology::Property-property
1
-property1..*
«key» -name : string-description : string
Phenomenology::Phenomenon
-phenomenon1
«key» -locator : string-description : string
Products::PhysicalArtifact
-artifact 1
2004-06-03 CMU-CS lunch talk, Gerard Lemson
14
Theory in the VOWith Joerg Colberg
http://ivoa.net/pub/papers/TheoryInTheVO.pdf
• Spatial query protocols irrelevant• No object-based federation• New phenomena/observables.• Different kind of provenance.• Model dependency.• Theoretical archives rather
unstructured.• Theory/observational interface.
2004-06-03 CMU-CS lunch talk, Gerard Lemson
15
Observed
Simulated
Thanks to Alexis Finoguenov, Ulrich Briel, Peter Schuecker, MPE)
Thanks to Volker Springel
2004-06-03 CMU-CS lunch talk, Gerard Lemson
16
Some concrete efforts
• NVO (USA): Registry (DIS), ADQL, SkyNode, data mining (UPitt+CMU)
• AstroGrid (UK): grid/web services, work flows
• AVO (ESO, CDS, AstroGrid): Aladin visualization tool, science demos
• CVO (Canada): archive federation• France VO: GalICS• GAVO (Germany): data publication (
RASS photons), application prototypes, data mining, theory
2004-06-03 CMU-CS lunch talk, Gerard Lemson
17
Scenarios, use cases, results
• Registry based data discovery and retrieval (GAVO, DIS)
• Class X classifier and generalizations• X-Ray cluster analysis using simulations• Cluster detection by combining SDSS
and RASS catalogues (Schuecker et al, astro-ph/0403116)
• Discovery of obscured quasars using VO tools (Padovani et al, astro-ph/0406056)
2004-06-03 CMU-CS lunch talk, Gerard Lemson
18
Typical workflow
Find potentialcounterparts
Analyse result:classify, plot, fit
Identify
Externalarchives
Prepare input sources(upload, query, ..)
Analysisservice
ProbabilisitcMatcher
List upload services SOAP/HTTP
Extractgeneralized
SED
2004-06-03 CMU-CS lunch talk, Gerard Lemson
19
Multi-Catalogue Multi-Cone Search
"Download Manager"Probabilistic MatcherVOTable Processor
Simple ConeSearch Service #1
ServiceRegistry
Table onLocal Disk
Simple ConeSearch Service #2
VOTables
VOTables
VOTable
BaseURLs
BaseURLs
Simple ConeSearch Service #3
VOTable
MatcherDataSets
Local Disk
VOTable
VOTablesTable
One or moreSCS Queries
Local Disk
InternetTable
Download manager
2004-06-03 CMU-CS lunch talk, Gerard Lemson
20
ClassX@GAVO
Multi-Catalogue List
UploadClassification
MultiCatalogue
XMatch
SUMMS@VizieRNVSS@VizieR
HTTP GET
CSV USNO@VizieR
CSV
HTTP GET
CSVRASS
Archive@GAVO
HTML
CSV
RASSSourceQuery
JDBCResultSet
JSP/HTMLForm
(DisplayClassXResult)
HTTP GET
ClassXClassifier@HEASARC
JSP/HTMLForm
(DisplayXMatchResult)
JSP/HTMLForm
(DisplayMatchResult)
JSP/HTMLForm
(DisplayQueryResult)
HTML HTML HTML
ProbabilisitcMatcher
Java Objects
JSP/HTMLForm
(PoseQuery)
HTTP GET
SQLHTTP POST
HTTP POST
HTTP POST
HTTP POSTJava API
Call
2004-06-03 CMU-CS lunch talk, Gerard Lemson
21
Theory/observational interface: X-Ray clusters
Goal: interpret observations of X-Ray cluster using results of hydro simulations:
1. Extract parameters from the observation (services) that can be queried directly (dm, ucd).
2. Find simulations that may be relevant, that are “similar” to observation by searching registry for hydro simulations of clusters (registry, voql). Requires simulation results to be published and described in sufficient detail (dm, ucd).
3. Observe simulations using “virtual telescope” (application, grid/webservices) configured according to telescope configuration extracted from observation (dm).
4. Compare real with virtual observation (services).5. For interesting simulation, extract full simulation result
(dal) for further analysis,6. or analyse the simulation using services (grid-services)
provided by the archive or some other service provider
2004-06-03 CMU-CS lunch talk, Gerard Lemson
22
Virtualtelescope
Simulationarchive
Theory/observational
interface
Find possibly relevanthydro simulations(registry, dm, voql)
Observe selectedsimulations virtually
(application, services)
X-RayObservation
“Find similarsimulations”
Retrieve data (dal)
FeatureExtractor
Registryservice
ComparatorRetrieve data (dal)Analysissoftware
Analyse
Compare
Extract queriablefeatures (dm, ucd)
Retrieve telescopeconfiguration (dm)
2004-06-03 CMU-CS lunch talk, Gerard Lemson
23
Computational, statistical and astronomical challenges I
Data models• Data modeling • Data model transformations, views• Archive structure• Database tuningQuerying, matching• Distributed query algorithms• Probabilistic matchers, systematic errors, identification of
moving sources• Improve identification using full point process information• Add physical properties, not just position, to identification• Complex, frequency dependent source definition• Characterization of complex results in "few" parameters
for discovery (PCA (after transformation)? 3D->2D ?)• Comparison of real and virtual observations
2004-06-03 CMU-CS lunch talk, Gerard Lemson
24
Usage
• Complex model • Simplify using view concept• Example from RDB • XSLT for translation between
domain XSD and application-specific derived schemas.
2004-06-03 CMU-CS lunch talk, Gerard Lemson
25
-_RAJ2000-_DECJ2000-M_APP-classification-image
SextractorGalaxies
CREATE VIEW SEXTRACTOR_GALAXIES ASSELECT S.RA AS _RAJ2000,
S.DEC AS _DECJ2000, -2.5 * LOG(S.FLUX) AS M_APP, S.CLASSIFICATION, I.STORAGE_URL AS IMAGE
FROM SOURCE S, SOURCE_CATALOGUE SC,
IMAGE I, SOURCE_EXTRACTOR AS SE
WHERE S.CLASS = ‘GALAXY’AND S.FLUX < 15AND S.CATALOGUE_ID = SC.IDAND IMAGE.ID = SC.IMAGE_IDAND SC.EXTRACTED_WITH = SOURCE_EXTRACTOR.IDAND SE.IDENTIFIER = ‘SExtractor’
2004-06-03 CMU-CS lunch talk, Gerard Lemson
26
Probabilistic cross matching
RASS FSC
USNO
NVSS
2004-06-03 CMU-CS lunch talk, Gerard Lemson
27
Computational, statistical and astronomical challenges II
Data mining• Algorithms for analyzing generic SEDs (classifiers
? visualization ? incorrect identification ?)• Source extraction using multiple images, at very
different wavelengths, how to take into account different physics/images of same source at different wavelengths ?
• Cluster finders using multiple catalogues • Publish sophisticated statistical analysis
algorithmsImplementation• Efficient implementation virtual telescopes
(parallel, distributed, grid based, data structures)