High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II...

24
High Data Rate Macromolecular Crystallography at NSLS-II Martin Fuchs (BNL Photon Sciences, NSLS-II) Workshop on real-time data analysis challenges 2019-01-24

Transcript of High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II...

Page 1: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

High Data Rate Macromolecular Crystallography at NSLS-II

Martin Fuchs (BNL Photon Sciences, NSLS-II)Workshop on real-time data analysis challenges

2019-01-24

Page 2: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

• FMX and AMX beamlines

• Micro introduction:Macromolecular crystallography (MX)

• Raster scanning and serial crystallography:High data rate MX

• Data challenges• Rate

• Volume

• Analysis

• Storage

Overview

2

Page 3: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Soft X-Ray Scattering & Spectroscopy23-ID-1: Coherent Soft X-ray Scattering (2015)23-ID-2: Soft X-ray Spectro & Polarization (2015)21-ID: Photoemission-Microscopy Facility (2016)2-ID: Soft Inelastic X-ray Scattering (2017)22-BM: Magneto, Ellips, High-P Infrared (2018) Complex Scattering10-ID: Inelastic X-ray Scattering (2015)11-ID: Coherent Hard X-ray Scattering (2015)11-BM: Complex Materials Scattering (2016)12-ID: Soft Matter Interfaces (2016) Diffraction & In Situ Scattering28-ID-1: X-ray Powder Diffraction (2015)28-ID-2: X-ray Powder Diffraction (2017)4-ID: In-Situ & Resonant X-Ray Studies (2016)27-ID: High Energy X-ray Diffraction (2020)Hard X-Ray Spectroscopy8-ID: Inner Shell Spectroscopy (2016)7-BM: Quick X-ray Absorption and Scat (2017)8-BM: Tender X-ray Absorption Spectros (2016)7-ID-1: Spectroscopy Soft and Tender (2017)7-ID-2: Spectroscopy Soft and Tender (2017)6-BM: Beamline for Mater. Measurement (2017) Imaging & Microscopy3-ID: Hard X-ray Nanoprobe (2015)5-ID: Sub-micron Resolution X-ray Spectro (2015)4-BM: X-ray Fluorescence Microscopy (2017)18-ID: Full-Field X-ray Imaging (2018) Structural Biology17-ID-1: Frontier Macromolec Cryst (2016)17-ID-2: Automated Macromolec Cryst (2016)16-ID: X-ray Scattering for Biology (2016)17-BM: X-ray Footprinting (2016)19-ID: Microdiffraction Beamline (2017)

• 19 Operating/Commissioning

• 10 Under Developmenthttp://www.bnl.gov/ps/nsls2/beamlines/map.php

NSLS-II Suite of Beamlines

Page 4: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

4

Experiment description: Protein Crystallography

Crystallization(cubic Insulin)

Diffraction experiment

Electron density map

Structure (Insulin)

M CL FT W1 W2 W3 W4 E1 E2

R. Bingel-Erlenmeyer

Protein expression,purification

Phasing

Model building, refinement

From protein to structure

Page 5: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

FMX and AMX

AMX

FMX

AMXFMX

5 10 15 20 25 30 keV

5 µm10 µm

1 µm

Two MX beamlineswith overlapping and complementary capabilities

50 µm

Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography, AMX: New opportunities for advanced data collection AIP Conf. Proc. SRI2015, 2016, 1741, 030006

5 – 30 keV

0.4 – 2.5 Å

3.5×1012 ph/s

1 × 1.5 µm2

1 – 20 µm

Eiger 16M

5 – 18 keV

0.7 – 2.5 Å

5×1012 ph/s

7 × 5 µm2

5 – 50 µm

Eiger 9M

Energy range

Wavelength range

Flux at focus at 12.7 keV

Focal spot min (H×V)

Focal spot range

Detector

FMXSpecifications AMXMicrofocus Automation

Page 6: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

MX Beamlines – Beam Size vs Flux Density

▪ Low synchrotron emittance translates into bright beamlines, high dose rate

▪ FMX current specs: At 12.7 keV full beam, the time to Garman limit is 30 ms!

Planned upgrades

Spot size ~ flux

NSLS-II Horizontal emittance: < 1 nm rad (Wang et al., 2016)

smal

ler

Page 7: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Multilayer upgrade for 100× higher flux

7

▪ 1% bandpass Horizontal-bounce Double Multilayer Monochromator for 100× flux increase, alternative to current Si111 HDCM

▪ Why? - Scanning & jet serial crystallography are still flux limited

▪ Integrate HDMM upstream of HDCM (small modifications to HDCM tank)

➢ Scanning & jet serial crystallography at max detector & sample delivery speeds

➢ The increased flux will support measurements in the µs timescale (now ms)

Page 8: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

FMX Experimental Station

Main goniometer

Sample microscope

Sample mounting robot

Sample position

Secondary goniometer

Eiger 16Mdetector

Page 9: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Detector Systems – single photon counting

Eiger 16M Detector at FMXHybrid pixel array detector

▪ Pixel size 75 μm

▪ 18 M pixel

▪ 311 × 328 mm2 area

▪ Frame rate

o 133 Hz (16M px)

o 750 Hz (4M px ROI)

▪ Continuous readout3 µs dead time

Eiger16M32 modules18 M pixels

133 Hz

Eiger 4M ROI8 modules

4.5 M pixels750 Hz

Eiger 9M at AMX▪ Frame rate

o 238 Hz (9M px)

o 750 Hz (4M px ROI)

Page 10: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Data reduction & ligand binding pipelines at AMX/FMX

Software Tuning

• New, fast, more reliable screeningmode for DOZOR spot finder

• aimless and pointless in fast_dp splitoff to designated workstation

• Hooks provided to allow XDS to useeiger2cbf instead of dectris-neggia

• New fast and faster modes providedfor dimple

Scalable Storage and Processing Nodes: sustaining growth

Rapid Data Analysis and Reduction on fast buffer (200 cores / BL).

Data are then written on disks (GPFS flexible policy).

Spot finders: DOZOR, DIALS spotfinder.Data reduction: XDS, dials, fast_dp,

fast_dp_NSLS-IIPipelines: fast_ep, fast_ep_NSLS-II, dimple,

dimple_NSLS-IIWe are also investigating dynamics using MX.

NSLS-II Computing Facility (May 2018)

x40 Gb/s

21 Compute Nodes

~ 750 cores

GPFS Storage / SSDBuffer

860 TB 180 HDs

20 TB/AMX20 TB/FMX

AMX / FMX

EIGER 9/16M(750 Hz 4M ROI)

Fast buffer (NFS)1day

Jakoncic, Bernstein et al., in preparation

Fast & automated to accelerate rate of drug discovery

Page 11: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Fast rastering• From 5 ms to 50 ms per frame • Up to 1 row / sec (accel./decel. time) • Dedicated nodes (DIALS find spot server/client | DOZOR)• Distributed fast_dp for data reduction …

Page 12: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

• Origin: Storage ring room temperature MX

• Renaissance: FEL diffraction before destruction

• Storage ring sample delivery• Jets, fixed targets, membranes, conveyor belts, …

• RT vs cryo-cooling, Rotation vs stills

• Acronyms:

Serial crystallography

Yamamoto 2017 IUCrJ

SFX, SMX, IMISX, SSX, MSS, …

Page 13: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Max. scanning Fq. –Piezo

> 100 Hz

Max speed – Piezo > 15 mm /s

Max speed – Coarse 0.8 mm /s

Resolution – Piezo 10 nm

Repeatability – Piezo 25 nm

Travel range – Piezo (200 µm)3

Travel range – Coarse 4 x 4 mm2

Mirror assembly mounted at sample location for metrology test

Cylinder for runout measurement

Laser interferometer from

• Scan frequency~100× higher than standard goniometer

• 2×higher precision

Customized ultrahigh-speed piezo scanner

Yuan Gao and Weihe Xu in Nanometrology lab

Page 14: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

2 µm step raster scan with 2.5 ms exposure (5 ~ 10 µm Proteinase K crystals)

Raster crystal location

Videos of 2 Hz and 20 Hz raster

over a 180x180 µm2 area (taken

with a calibration tungsten

needle)

2 Hz

20 Hz

1 µm step raster scans with 1.3 to 5 ms exposure (~5 µm Proteinase K crystals)

Raster data collection

Raster for location or for collectionHigh speed rastering

30 rasters for 1 dataset: Shutter-open time 18s!

Page 15: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Hierarchical cluster-analysis

Data processing

Proteinase K structure refined to 2.0 Å resolutionRwork=16.7% and Rfree=21.3% (500Hz dataset)

XDS: Unit Cell Orientation

XDS: Unit Cell Size

Automated clustering using Machine Learning algorithm

• ~200 partial datasets for structure solution

• Equally high data quality for detector frame rates from 200 – 750 Hz

Page 16: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Jet Serial Crystallography

Partner User: Spence Lab (ASU)▪ First successful run in 2018-01

▪ 500 Hz data collection from Proteinase K microcrystals

▪ Flow speed = 300 um/s: Experiment flux limited!

➢Optimal sample delivery for LCP crystallization

➢New jet media and temperature control

➢ N. Zatsepin, U. Weierstall, J. Martin-Garcia, E. Chun, S. Zaare, H. Hu, J. Spence

Page 17: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Jet Serial Crystallography

Partner User: Spence Lab (ASU)▪ First successful run in 2018-01

▪ 500 Hz data collection from Proteinase K microcrystals

▪ Flow speed = 300 um/s: Experiment flux limited!

➢Optimal sample delivery for LCP crystallization

➢New jet media and temperature control

➢ N. Zatsepin, U. Weierstall, J. Martin-Garcia, E. Chun, S. Zaare, H. Hu, J. Spence

Page 18: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Analysis challenges shared for Scanning and Jet SMX

• Large data volumes: The Dectris Eiger 16M detector operates either in full frame mode with frame rates up to 133 Hz, or in the 4M mode up to the frame rate the samples’ diffraction power allows

• Signal to noise: For the smallest crystals, the S/N will be extremely low. Under a certain threshold, indexing will be impossible and the data unusable for merging

Average Experiment Duration

Both methods will continue to collect data, until the completeness goal under given boundary conditions like resolution are met.

Serial crystallography data analysis challenges

18

Page 19: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Average data acquisition rate• 16M frames with sustained rate of 70 Hz: 120 MB/s – 2.1 GB/s

• Max file size 30 MB (dep on compression development, still improving, 20 MB realistic)

• Min file size 1.5 MB (dep on signal)

• 16M frames with burst rate of 133 Hz for max 30 s• Data volume scales with frame rate

• 4M frames with sustained rate of ~400 Hz: 250 MB/s – 3.2 GB/s• Max file size 8 MB

(dep on compression development, still improving, 5 MB realistic)• Min file size 0.5 – 1 MB (dep on signal)

• 4M frames with burst rate of 750 Hz for max 30 s• Data volume scales with frame rate

Max data rates• Current: 5 GB/s

• Future: 5k frame/s detector, 100x flux increase – 50 GB/s

Data Rate

19

Page 20: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Individual event

• One image, streamed at up to 750 images/s

• Can be processed independently

Total data volume / day

• ~100 mounts of 300 s data collection per dayEach mount = ~100 GB : 10 TB

• Assume small frame size of ~300 MB due to low signal levels

• Wide variation of volumes and throughput possible

• Current, standard MX: 10 TB

• Future throughput, serial: 100 TB (1 PB with new detector?)

Total data volume

20

Page 21: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

Time Limitations for Analysis Results

• Processing should provide information about data quality (resolution and completeness) in less than in few minutes to enable optimized strategies for a limited number of mounts.

Current Analysis Pipeline Requirements

• Software development strongly in flux

• Spot finding: Discard empty frames

• Indexing: Statistics on crystal morphology, cluster analysis

• Integration, scaling merging• Rotation data collection: XDS, DIALS• Still frame data collection: CrystFEL, DIALS

• Structure solution pipelines follow standard MX procedures

High Data Rate Analysis Pipeline

21

Page 22: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

• In serial crystallography, a large fraction of the images can be empty: 10-99% without data

• There are methods to extract data from sparse data frames, i.e. simple spot finding will not suffice in some cases to reject frames.Lan et al. ”Solving protein structure from sparse serial microcrystal diffraction data at a storage-ring synchrotron source” IUCrJ, 2018, Vol. 5(5), pp. 548-558. DOI: 10.1107/S205225251800903X

• Short term storage: Fast storage required for parallel data processing

• Long term storage: Main concern is storage space and data transfer to users.

Storage, retention considerations

22

Page 23: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

• Diffraction simulations using MLFSOM and fastBragg (J. Holton, 2015)

• Background vs 1 µm size crystal

• All files are 16 Mpx, or 64 MB in 32 bit and 16 MB in 16 bit and 16 MB in byte-offset

Better compression: Background subtraction?

bzip2 bslz4 NibOff

No water 0.009 0.4 4

With 20 um water 11.4 13.7 12.4

Background (diff) 11.4 13.7 12.4

Background average 2 5.9 4

<BG> subtracted ? 9.7 16.9 15.7

BG subtracted ? 7.7 10 11.7

No waterWith BG BG only Avg BG subtracted

Image size

➢ J. Jakoncic, H. Bernstein (2016)

CatB at FMX

Page 24: High Data Rate Macromolecular Crystallography at NSLS-II · Fuchs, M. R. et al. “NSLS-II biomedical beamlines for micro-crystallography, FMX, and for highly automated crystallography,

• Storage ring experiments can reach• Data rates up to 50 GB/s

• Data volumes – up to 100 TB/day

• The MX experiment is not event triggered

• Data culling currently happens in post processing

• Real time challenge• Determine data completeness and quality

• Real time means seconds and minutes

• Spot finding• Speed-up is challenging

• Bottleneck for culling

Summary

24