SpaceWarps: crowd sourcing the discovery of gravitational...

15
SpaceWarps: crowd sourcing the discovery of gravitational lenses Anupreeta More Gravlens Leiden July 12, 2016 Kavli IPMU, U. of Tokyo Phil Marshall (US) Aprajita Verma (UK) SW collaboration spacewarps.org [email protected]

Transcript of SpaceWarps: crowd sourcing the discovery of gravitational...

Page 1: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

SpaceWarps: crowd sourcing the discovery of gravitational lenses

Anupreeta More

Gravlens Leiden

July 12, 2016

Kavli IPMU, U. of Tokyo

Phil Marshall (US) Aprajita Verma (UK) SW collaboration

spacewarps.org [email protected]

Page 2: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Area (deg2)100

CFHTLS (wide)

PS1 (MDS) BCS

HSC KiDS DES

LSST Euclid

Optical imaging surveys

1000 10000

Lenses in large surveys

• We need efficient and systematic means to find gravitational lenses in these large imaging surveys

• Automated searching techniques are useful but suffer from high rate of false positives

• Visual inspection is a crucial step in filtering candidates even for an automated search

• We expect to find ~1000s of lenses in large imaging surveys covering ~1000 sq. deg area

Page 3: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Area (deg2)100

CFHTLS (wide)

PS1 (MDS) BCS

HSC KiDS DES

LSST Euclid

Optical imaging surveys

1000 10000

Lenses in large surveys

• We need efficient and systematic means to find gravitational lenses in these large imaging surveys

• Automated searching techniques are useful but suffer from high rate of false positives

• Visual inspection is a crucial step in filtering candidates even for an automated search

• We expect to find ~1000s of lenses in large imaging surveys covering ~1000 sq. deg area

Page 4: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Area (deg2)100

CFHTLS (wide)

PS1 (MDS) BCS

HSC KiDS DES

LSST Euclid

Optical imaging surveys

1000 10000

Lenses in large surveys

• We need efficient and systematic means to find gravitational lenses in these large imaging surveys

• Automated searching techniques are useful but suffer from high rate of false positives

• Visual inspection is a crucial step in filtering candidates even for an automated search

• We expect to find ~1000s of lenses in large imaging surveys covering ~1000 sq. deg area

Page 5: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps

PIs:

Phil Marshall (US), Aprajita Verma (UK), Anupreeta More (Japan)

• Blind lens search

• Stage 1: Fast inspection; O(105) —> O(103) images

• Stage 2: Careful inspection; O(103) —> O(102) candidates

• Targeted lens search

• Stage 0: Lens finding algorithm produced candidates are used

• Stage 1: Careful inspection; O(104) —> O(103) candidates

spacewarps.org

Page 6: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Training sample:

Simulationshttps://github.com/anupreeta27/SIMCT

More, et al. (2016)

Real image+Catalogs

sim. lens

Final image

Code

SIMCT

Group Lenses

Lensed Quasars

Page 7: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Training sample:

Simulationshttps://github.com/anupreeta27/SIMCT

More, et al. (2016)

Real image+Catalogs

sim. lens

Final image

Code

SIMCT

• First citizen science project in Zooniverse that

• includes thorough and convincing simulated training material

• uses the training sample to calibrate volunteer performance

• Essential for training users and keeping them alert !!!

• Important for characterizing the selection function of the resulting lens sample Group Lenses

Lensed Quasars

Page 8: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Training sample:

Simulationshttps://github.com/anupreeta27/SIMCT

More, et al. (2016)

Real image+Catalogs

sim. lens

Final image

Code

SIMCT

• First citizen science project in Zooniverse that

• includes thorough and convincing simulated training material

• uses the training sample to calibrate volunteer performance

• Essential for training users and keeping them alert !!!

• Important for characterizing the selection function of the resulting lens sample Group Lenses

Lensed Quasars

Page 9: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Training sample:

Simulationshttps://github.com/anupreeta27/SIMCT

More, et al. (2016)

Real image+Catalogs

sim. lens

Final image

Code

SIMCT

• First citizen science project in Zooniverse that

• includes thorough and convincing simulated training material

• uses the training sample to calibrate volunteer performance

• Essential for training users and keeping them alert !!!

• Important for characterizing the selection function of the resulting lens sample Group Lenses

Lensed Quasars

Page 10: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps

Results from First lens search

Page 11: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps

About 60 new lens candidates (30 candidates are more promising)( http://spacewarps.org/#/projects/CFHTLS/discoveries )

Discoveries

More, et al. (2016)

• 2 million image classifications in first week (10k per hour), 11 million over 6 months

• Collaboration of now over 27,000 citizen scientists

8 More et al.

which are fainter have poor photometric redshift measure-ments. Consequently these galaxies are assigned a wrong lu-minosity and velocity dispersion estimates. This then resultsin simulated lenses which look implausible or unrealistic.For example, the lensed images for some of the failed simu-lations have larger image separation than what you expectfrom the luminosity and/or size of the galaxy. We roughlyexpect mass to follow light so more massive galaxy typicallylooks brighter and/or bigger.

At group-scales, the photometric and redshift estimatesare used when define the group membership. Therefore, er-rors in redshift estimates generate galaxy groups with BGGor member galaxies with dissimilar properties. In some cases,low redshift spiral galaxies have been incorrectly assignedhigh redshift. Spiral galaxies are typically less massive andlow redshift spiral galaxies are unlikely to act as gravita-tional lenses. Hence, some such instances do not appearconvincing or the lensed images do not have the expectedconfiguration or image separations.

We also use a single Sersic component to describe thelight profiles of background galaxies. This is clearly notthe most accurate description for galaxies, especially, star-forming galaxies which form a significant fraction of thelensed galaxy population. Star-forming galaxies have com-plex structures such as star forming knots, spiral arms, barsand disks. The simulated lensed images do not display thesefeatures. This is fine for most cases or mostly for imagestaken from ground based telescopes such as the CFHT butsometimes the shapes of the lensed images can appear sym-metric or perfect, especially, if the images are very bright.

4 TRAINING SAMPLE: DUDS AND FALSEPOSITIVES

Citizen scientists need training not only to identify gravi-tational lenses, but also to reject images which either haveno lenses or have objects which are lens impostors. Hence,in addition to the simulated lenses, we added a sample ofduds and false positives to the training sample. Duds areimages which have been visually inspected by experts andconfirmed to contain no lenses. False positives (FPs) are sys-tems which look like lenses but are not, for example, spiralgalaxies, starforming galaxies, chance alignments of featuresarranged in a lensing configuration and stars.

We selected a sample of 450 duds for the Stage I clas-sification in SpaceWarps and a sample of 500 false posi-tives for the Stage II inspection. The sample of false pos-itives was selected from the candidates which passed theStage I of SpaceWarps. We note that this is the firsttime, we have a systematically compiled sample of visu-ally inspected false positives by the SpaceWarps volunteersand categorized by the science team. We produce addi-tional larger sample of a few thousand FPs by scanningthrough the low probability images after the completionof Stage II. All of these data products are available athttp://spacewarps.org/#/projects/CFHTLS/. Such a sam-ple has tremendous utility for training and testing of variouslens finding algorithms (e.g., Chan et al. 2014).

Figure 4. Distribution of di↵erent types of candidates as a func-

tion of the posterior probability P. The types of the candidatesare the FPs, the new candidates and the known candidates. The

new and known candidates have higher detection rate for higher

values of P, as expected.

5 METHODOLOGY TO PRODUCE THESpaceWarps-CFHTLS LENS SAMPLE

The SpaceWarps works as a single unified system which usesthe method of visual inspection to find gravitational lenses.For the first SpaceWarps lens search, the volunteers wereshown images at two stages. At Stage I, volunteers are re-quested to undertake a rapid inspection to select lens can-diates ranging from possible lenses to almost certain lenses.At Stage II, volunteers carefully inspect the candidates fromStage I and select only promising lens candidates. A dailysnapshot of classifications performed by volunteers are pro-vided to the science team every night. This daily batch isanalyzed by the Space Warps Analysis Pipeline (SWAP).The philosophy and the details of SWAP are described indetail in Paper I, here we briefly summarize how it works.

Each subject (or image cutout) is assigned a prior prob-ability of 2⇥10�4 to contain a lens system. Every volunteer isassigned an agent characterised by a 2⇥2 confusion matrix,which quantifies the volunteer’s ability to correctly classifyan image to contain a lens (P

L

) or to not contain a lens (PD

).The values of the confusion matrix are determined basedon the performance of the volunteer on the training sample,specifically, P

L

(and P

D

) is determined based on the fractionof simulated lenses (and duds) correctly classified. After ev-ery classification, the agent updates the prior probability ofthe classified subject based on the volunteer’s classificationand the confusion matrix, according to the Bayes theorem.Note that the volunteer’s confusion matrix is updated afterthe classification of every training image. The thresholds forthe probabilities to accept (or reject) a subject if it contains(or does not contain) a lens can be chosen in SWAP. Thoseimages which cross these threshold values are retired andare not subsequently shown to the volunteers, so that vol-

c� 2013 RAS, MNRAS 000, 1–22

P(lens)_SW vs Lens Experts

Page 12: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Performance

12 Marshall et al.

0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

Tru

ePos

itiv

eR

ate

Stage 1, online, M0

kk = 0.5

Stage 1, o�ine, M0

kk = 0.5

Stage 1, online, M0

kk = 0.75, No Learning

Stage 1, online, M0

kk = 0.75

0.00 0.01 0.02 0.03 0.04

0.70

0.75

0.80

0.85

0.90

0.95

Figure 6. Receiver operating characteristic curves for the SpaceWarps system, using the CFHTLS training set. Left: Stage 1, right:

Stage 2. Insets show a magnified view of the top lefthand corner of each plot. Linestyles illustrate the software agent properties (initialconfusion matrix elements, learning or not) discussed in the text, as well as the di↵erence between the online and o✏ine analyses. The

Stage 2 sample was defined using the online Stage 1 results (solid blue curve), while the o✏ine analysis was chosen for the final Stage 2

results (dot-dashed orange curve). Lens probability varies along the curves: the stars indicate the point where the value of this quantityis 0.95.

using each agent’s entire history, is making less noisy proba-bility estimates. This could be consistent with other citizenscience/crowdsourcing projects where, in the absence of highquality information about classifiers, sophisticated strategiestend to under-perform naive but simpler ones (Waterhouse2013, although we note that this rule of thumb seems not toextend to simple voting in this case!).

Assuming Poisson statistics for the fluctuations in thenumbers of recovered lenses, the uncertainty in the measuredStage 2 TPR values is around 8%, but the online and o✏inesamples are highly correlated, such that the uncertainty onthe di↵erence between the ROC curves is somewhat less thanthis. Still, a larger validation set is needed to test these algo-rithmic choices more rigorously. At Stage 1, high TPR canbe measured to better than 1%.

Adopting the online Stage 1 analysis, and the o✏ineStage 2 analysis, we show in Figure 7 a plot of the morefamiliar (to astronomers) quantities of completeness versuspurity, again for the two stages. As in Figure 6, the detectionthreshold varies along the curves. Completeness is defined asthe number of correctly detected sims divided by the totalnumber of sims in the training set, while purity is the num-ber of correct detections divided by the total number of de-tections.7 If the training set samples from the same systemsas the test set, then the completeness of the training setwill be equivalent to the completeness of the test set. Thepurity depends on the proportion of sims to duds, and sothe purity of the test set must be approximated by rescalingthe training set to the expected proportion of lens systemsto not-lens systems in the survey. First we compute the ex-pected number of false positives by multiplying the FPRby the expected number of non-lenses in the survey. Then

7 The completeness is equivalent to the TPR and is also known

elsewhere as the “recall.” The purity is also known as the “preci-sion.”

Figure 7. Completeness-estimated purity curves for theSpaceWarps system, using the CFHTLS training set. The curves

are truncated at the high purity end by the “detection” limit

subject probability (subjects in the online Stage 1 analysis weredeemed to be detected at p = 0.95, while at Stage 2 rounding

error led to an upper limit of p = 1.0).

we multiply the TPR by the expected number of lenses inthe survey, to get the expected number of true positives.The sum of the true positives and the false positives givesthe expected sample size; dividing the expected number oftrue positives by this sample size gives the purity. Note thatthe completeness is invariant to this transformation. TheStage 1 curves are truncated by the retirement of subjectsin this phase, which sets the minimum size of this sample.We see from the solid blue curve that over 90% completenesswas reached, albeit in a sample with 30% purity.

Does this performance level vary between the di↵erent

c� 2014 RAS, MNRAS 000, 1–21

12 Marshall et al.

Figure 6. Receiver operating characteristic curves for the SpaceWarps system, using the CFHTLS training set. Left: Stage 1, right:

Stage 2. Insets show a magnified view of the top lefthand corner of each plot. Linestyles illustrate the software agent properties (initialconfusion matrix elements, learning or not) discussed in the text, as well as the di↵erence between the online and o✏ine analyses. The

Stage 2 sample was defined using the online Stage 1 results (solid blue curve), while the o✏ine analysis was chosen for the final Stage 2

results (dot-dashed orange curve). Lens probability varies along the curves: the stars indicate the point where the value of this quantityis 0.95.

using each agent’s entire history, is making less noisy proba-bility estimates. This could be consistent with other citizenscience/crowdsourcing projects where, in the absence of highquality information about classifiers, sophisticated strategiestend to under-perform naive but simpler ones (Waterhouse2013, although we note that this rule of thumb seems not toextend to simple voting in this case!).

Assuming Poisson statistics for the fluctuations in thenumbers of recovered lenses, the uncertainty in the measuredStage 2 TPR values is around 8%, but the online and o✏inesamples are highly correlated, such that the uncertainty onthe di↵erence between the ROC curves is somewhat less thanthis. Still, a larger validation set is needed to test these algo-rithmic choices more rigorously. At Stage 1, high TPR canbe measured to better than 1%.

Adopting the online Stage 1 analysis, and the o✏ineStage 2 analysis, we show in Figure 7 a plot of the morefamiliar (to astronomers) quantities of completeness versuspurity, again for the two stages. As in Figure 6, the detectionthreshold varies along the curves. Completeness is defined asthe number of correctly detected sims divided by the totalnumber of sims in the training set, while purity is the num-ber of correct detections divided by the total number of de-tections.7 If the training set samples from the same systemsas the test set, then the completeness of the training setwill be equivalent to the completeness of the test set. Thepurity depends on the proportion of sims to duds, and sothe purity of the test set must be approximated by rescalingthe training set to the expected proportion of lens systemsto not-lens systems in the survey. First we compute the ex-pected number of false positives by multiplying the FPRby the expected number of non-lenses in the survey. Then

7 The completeness is equivalent to the TPR and is also known

elsewhere as the “recall.” The purity is also known as the “preci-sion.”

0.0 0.2 0.4 0.6 0.8 1.0

Completeness

0.0

0.2

0.4

0.6

0.8

1.0

Pur

ity

Stage 1, online, M0

kk = 0.5

Stage 2, o�ine, M0

kk = 0.5

0.90 0.92 0.94 0.96 0.98 1.00

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Figure 7. Completeness-estimated purity curves for theSpaceWarps system, using the CFHTLS training set. The curves

are truncated at the high purity end by the “detection” limit

subject probability (subjects in the online Stage 1 analysis weredeemed to be detected at p = 0.95, while at Stage 2 rounding

error led to an upper limit of p = 1.0).

we multiply the TPR by the expected number of lenses inthe survey, to get the expected number of true positives.The sum of the true positives and the false positives givesthe expected sample size; dividing the expected number oftrue positives by this sample size gives the purity. Note thatthe completeness is invariant to this transformation. TheStage 1 curves are truncated by the retirement of subjectsin this phase, which sets the minimum size of this sample.We see from the solid blue curve that over 90% completenesswas reached, albeit in a sample with 30% purity.

Does this performance level vary between the di↵erent

c� 2014 RAS, MNRAS 000, 1–21

TPR=#sims detected / #sims FPR=#duds missed / #duds sims: simulated lenses, duds: non-lenses Marshall,Verma, AM et al. (2016)

Page 13: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps Image separation distribution

• Image separation distribution (ISD) can put constraints on the statistical mass distribution of lenses

• Need to understand the selection function to account for any incompleteness

• The slope of the updated ISD (combining SW + old lens samples) is unchanged suggesting previous constraints from ISD (More et al. 2012) at group-scales are robust

More, et al. (2016)

Group-scale lenses

Page 14: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps 2 VICS82

IR+Radio

• Discovered a lensed hyper-luminous IR galaxy at z~2.5, visible in the radio with possible contribution from an AGN

Geach, AM et al. (2015)

• Launched at the BBC Stargazing Live TV program (lasted for 3 days)

• Targeted search VICS82 (opt+NIR): 40,000 candidates pre-selected with algorithms

• Million classifications in the first hour of its announcement

Page 15: SpaceWarps: crowd sourcing the discovery of gravitational ...home.strw.leidenuniv.nl/~gravlens2016/Talks/More1.pdf · SpaceWarps: crowd sourcing the discovery of gravitational lenses

Space Warps

• Space Warps is undergoing an upgrade (interface and analysis pipeline)

• Lens searches are officially happening with

• DES (5000 sq. deg, r=24.1)

• HSC (1400 sq. deg, r=26)

• Possible collaboration with KiDS is being discussed

• To do lens search in your favourite survey - Contact us!

• Stay tuned - coming soon with new lens searches !!

Looking ahead

spacewarps.org [email protected]