Introduction to data collection at XFELs and serial femtosecond crystallography … · Introduction...

Post on 09-Sep-2018

231 views 1 download

Transcript of Introduction to data collection at XFELs and serial femtosecond crystallography … · Introduction...

Introduction to data collection at XFELs and serial femtosecond crystallography data analysis

Jose M. Martin GarciaAssistant Research Scientist

The Biodesign Center for Applied Structural DiscoveryArizona State University

jose.martingarcia@asu.edu

Macromolecular Crystallography Workshop CSIC, Madrid, May 10th, 2017

1

2

Acknowledgements

$$$$$.......

Petra Fromme(CASD, ASU)

John Spence (CASD, ASU)

Henry Chapman(CFEL, DESY)

Anton Barty(CFEL, DESY)

Thomas White (CFEL, DESY)

Valerio Mariani(CFEL, DESY)

Nadia Zatsepin(CASD, ASU)

Rick Kirian(CASD, ASU)

Oleksandr Yefanov(CFEL, DESY)

Marius Schmidt Geroge Phillips

Lois Pollack

Collaborators

Robert Fischetti

Thomas Grant(HWMRI)

Outline

1. Introduction to XFELs and SFX

2. SFX: New challenges

3. Cheetah

4. CrystFEL

1. Introduction to XFELs and SFX

The Concept of a Free Electron Laser

5

6

The Linac Coherent Light Source (LCLS)

7

https://www.youtube.com/watch?v=RG-PYmeq2XE

8

Inside View: The Undulator Tunnel

9

Main Entrance to FEH

10

LCLS FEH

Control Room @ CXI

Downstream View: Hutch @ CXI

13

Chambers @ CXI

X-ray free electron lasers (XFELs)

World’s most powerful photon sources

Extremely bright X-ray beams

Ultra-short duration

1010

Beam Properties XFELs Synchrotrons

High flux 1013 1013

Peak brilliance 1033 1023

Max. electron energy 6 - 20 GeV 1 – 7 GeV

Repetition rate 60 - 120 Hz (27,000 Hz @ EU-XFEL) ~ 10 Hz

Pulse duration 10 – 300 fs < 100 ps

14

Review: Martin-Garcia JM, et al. “Serial Femtosecond Crystallography: A Revolution in Structural Biology”. Archives of Biochemistry and Biophysics, 602, 32-47 (2016)

Serial femtosecond crystallography (SFX)

15

One diffraction pattern per crystal per X-ray pulse

Nano / micro-crystals are delivered in a serial manner

in random orientations

Diffraction before destroy

Plasma from femtosecond X-rays

X-ray Diffraction Pattern

Photosystem II

17

Crystallography @ synchrotrons

Crystallography @ XFELs

Need large crystals

Crystals are usually frozen

Crystals are on a goniometer

Many images per crystal

Up to 1000 images per data set

1-10 crystals per data set

Full reflections

Nano / micro-crystals (0.1 - 10 µm)

Room temperature

Crystals are in liquid or highly-viscous jet, not on a goniometer

One image per crystal per pulse

At least 10,000 images per data set

Hundreds of thousands of crystals

Partial reflections

18

goals• extract kinetics and dynamics• kinetic mechanism• rate coefficient• barriers of activation and• molecular structures of reaction intermediates

for• reversible reactions and• irreversible (catalyzed) reactions

fromcrystallographic data ONLY!

Time-Resolved Crystallography at XFELs

19

The ideal combination for TR-SFX: small crystals + XFELs

20

• Data collected in December 2015

• CXI instrument @ LCLS

• Protein: β-lactamase C from M. tuberculosis

• Substrate: Ceftriaxone

• Crystal size 1-3 µm

• Sample delivery: Mixing-jet injector

• Time delay: 2s

• Photon energy: 9 keV

• Pulse duration: 40 fs

• Repetition rate: 120 Hz

Study of enzymatic reactions

Petra Fromme, John Spence, Uwe Weierstall

Marius Schmidt

Geroge Phillips Lois Pollack

Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003

The proof-of-principle

21

Study of enzymatic reactions

Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003

22Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003

23

Study of enzymatic reactions

Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003

24

2. SFX: New Challenges

25

Sample delivery

Detector technology

CSPAD (Cornell-SLAC Pixel Array Detector) @ CXI, LCLS

• 4 quadrants independently movable to change the size of the hole in the center (no beam stop)

• 32 modules tiled to fill 1700 x 1700 pixels with gaps between modules

• 2.3 x 106 pixels

• Pixel size 110 µm x 110 µm

• Dynamic range of about 350 photons at 9.4 keV

• 120 Hz frame rate

26Philipp, H. T., et al. (2011). Pixel array detector for X-ray free electron laser experiments. Nucl. Instrum. Methods Phys. Res. A, 649, 67–69.

Why do we need so many patterns?

Crystal size

Crystal shape

Crystal orientation

Crystal quality

XFEL beam position

XFEL beam energy spectrum

XFEL beam intensity

Partially recorded reflections (no crystaloscillation, monochromatic beam)

• In SFX every pulse is like a new experiment

• Need 10,000 - 100,000 indexed patterns (individual crystals) for one data set (up to 1,000 from one crystal at synchrotrons)

27

The need of new software

LCLS pulse structure (120 Hz; 7,200pulses/min)

CSPAD detector @ CXI, LCLS : 2.3 x 106

pixels

4.5 MB/frame => 2 TB/hour => 120 TB /

experiment (5 shifts, 1 shift = 12 h)

New type of data

Large amount of data

New, complicated detectors (hybrid pixel

array detectors)

100 msec fs

Data Handling

28

SFX data analysis pipeline

OnDA

1.Online monitoring2.Live hit rate and resolution estimate3.Live saturating pixel tracking

Cheetah

1.Hit finding (data reduction)2.Background subtraction3.Clean diffraction patterns and meta data saved as HDF5 or CXI4. Statistics and preliminary analysis

CrystFEL

1.Indexing2.Integration3.Merging4.Post refinement

DAQ: raw XTC files containing X-ray pulse parameters, pump laser signal, diagnostics, motor positions, etc.

CSPAD detector

2. Cheetah

30

http://www.desy.de/~barty/cheetah/Cheetah/Cheetah_GUI.html

What does Cheetah do?

Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.

2. Rapid feedback

Hit rate, resolution, diffraction qualityQuickly viewing images

3. Data reduction

Keeps only useful events crystals(ie: frames with crystal diffraction)

4. Data translation

XTC data is converted to a facilityindependent format (HDF5 or CXI)

5. Data organization

Summarises what is in each run; easy togroup data by sample; summarisesstatistics

31Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.

Cheetah: Background subtraction

After ‘running’ background subtraction

After ‘local’ background subtraction

• Local background subtraction is advisable for samples deliveredin a liquid or viscous jet.

• Uses the data from the current frame to estimate the backgroundof the current frame.

• Background = median of all pixels values in a box of side length2r+1.

• The area of the box is at least twice the area of any potentialBragg peaks and contains at least three times the number or pixelin the peak.

1- Running background subtraction

2- Local background subtraction

• Uses the many blank frames interleaved between hits to providean up-to-date estimate of background signal in the data.

• Background = median of pixels in the entire blank data set.

32

Cheetah: Hit finding

1- Identification of possible Bragg peaks.

• Threshold (pixel intensity applied over the entire image)

• Min_Number_pixels

• Max_Number_pixels

• SNR (weak peaks relative to background)

• Peakmask (pixel mask identifying regions to exclude from peak searching)

2- Identification of sample hits.

• npeaks > 15 (minimum number required by CrystFEL)

Hit rates depend on the experiment and sample delivery techniques. Nanocrystal diffraction in solution typically has hit rates of 10–15% (30-40 % high-viscosity injector), although extrems as low as 1% have been observed for dilute samples.

Current sample delivery techniques are far from achieving the goal of 100% useful data, and thus frame rejection strategies are currently very effective in reducing data volumes.

Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.

Cheetah ‘quick start’

33

Cheetah ‘flash start’

34

Cheetah GUI

Newly collected data (new runs) appear automatically ready to process

Status of data collection

35

Cheetah GUI

One-click to start the processing of data sets

36

Cheetah GUI

Status of processing is continuallyupdated

Contents of each run and associated data directory

Cheetah GUI

38

Cheetah GUIHit rate

Resolution

Cheetah GUI

Virtual powder patterns

Hits

Blanks

40

3. CrystFEL

41White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.

White, T. A., et al. (2016). “Recent developments in CrystFEL”. J. Appl. Cryst. 49, 680-689.

http://www.desy.de/~twhite/crystfel/index.html

• Suite of programs for processing serial crystallography data acquired at XFELs (and synchrotrons too!!).

• CrystFEL does……IndexingIntegrating MergingScalingViewingHit finding (too!)

• CrystFEL final output files (mtz files) can be fed into Phenix, CCP4, etc.

• Unlike Cheetah, CrystFEL uses command lines or scripts. A CrystFELGUI is on its way!!!

What is CrystFEL?

Latest version: CrystFEL version 0.6.2

42White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.

White, T. A., et al. (2016). “Recent developments in CrystFEL”. J. Appl. Cryst. 49, 680-689.

indexamajig

Rapid indexing, integration and data reduction program.

pattern_sim

A diffraction pattern simulation tool.

process_hkl

A tool merging and scaling intensities from many patterns into a single reflection list, via the Monte Carlo method.

partialator

Full scaling and post-refinement process for accurate merging of data and outlier rejection.

ambigator

A tool for resolving indexing ambiguities.

get_hkl

A tool for manipulating reflection lists, such as performing symmetry expansion.

cell_explorer

A tool for examining the distributions of unit cell parameters.

compare_hkl and check_hkl

Tools for calculating figures of merit, such as completeness and R-factors.

partial_sim

A tool for calculating partial reflection intensities, perhaps for testing the convergence of Monte Carlo merging.

hdfsee

A simple viewer for images stored in HDF5 format.

render_hkl

A tool for rendering slices of reciprocal space in two dimensions.

geoptimiser

A program to refine and optimize detector geometry.

CrystFEL core programs

43

CrystFEL: Overall pipeline

44

Flow diagram of diffraction pattern processing in indexamajig

• Two peak search methods: peaks=hdf5 (Cheetah’s output), peaks=zaef (internal algorithm)• Input files: Diffraction patterns (HDF5 or CXI formats)• Output file: “stream” file (long plain text)• Geometry file (plain text file)• Unit cell parameters (text file containing the “CRYST1” line in PDB files)

White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.

45

Indexing methods

$ indexamajig -i |....| --indexing=method1,method2,... |....|

mosflm-raw-nolatt-nocell

Invoke Mosflm. To use this option, 'ipmosflm' must be in

your shell's search path.

Do not check the resulting unit cell with the target cell. This

option is useful when you need to determine the unit cell ab initio.

Do not use lattice type information to guide

the indexing.

Do not use unit cell parameters as prior

information for the core indexing algorithm

This also applies to other indexing methods such as DirAx, and XDS

46

Indexing methods

$ indexamajig -i |....| --indexing=method1,method2,... |....|

mosflm-comb-latt-cell

Invoke Mosflm. To use this option, 'ipmosflm' must be in

your shell's search path.

Use lattice type information to guide

the indexing.

Use unit cell parameters as prior information for the

core indexing algorithm

This also applies to other indexing methods such as DirAx, and XDS

Check linear combinations of the unit cell basis vectors to see if a cell can be produced which looks like your unit cell

47

Indexing methods

$ indexamajig -i |....| --indexing=method1,method2,... |....|

mosflm-axes-latt-cell

Invoke Mosflm. To use this option, 'ipmosflm' must be in

your shell's search path.

Use lattice type information to guide

the indexing.

Use unit cell parameters as prior information for the

core indexing algorithm

This also applies to other indexing methods such as DirAx, and XDS

Check permutations of the axes for correspondence with your cell, but do not check linear combinations. This is

useful to avoid a potential problem when one of the unit cell axis lengths is close to a multiple of one of the others

48

Indexing methods

$ indexamajig -i |....| --indexing=method1,method2,... |....|

mosflm-axes-latt-cell-retry

Invoke Mosflm. To use this option, 'ipmosflm' must be in

your shell's search path.

Use lattice type information to guide

the indexing.

Use unit cell parameters as prior information for the

core indexing algorithm

This also applies to other indexing methods such as DirAx, and XDS

Check permutations of the axes for correspondence with your cell, but do not check linear combinations. This is

useful to avoid a potential problem when one of the unit cell axis lengths is close to a multiple of one of the others

49

Indexamajig1. Create a list of filenames to process:

$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name

'cxin5016-r0052-c00.cxi' -print > tutorial.lst

50

Indexamajig1. Create a list of filenames to process:

$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name

'cxin5016-r0052-c00.cxi' -print > tutorial.lst

2. Rough estimation of the unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream

$ grep "Cell parameters" tutorial.stream

$ grep "centering" tutorial.stream

$ cell_explorer tutorial.stream

51

mosflm-raw-nolatt-nocell

Invoke Mosflm. To use this option, 'ipmosflm' must be in

your shell's search path.

Do not check the resulting unit cell with the target cell. This

option is useful when you need to determine the unit cell ab initio.

Do not use lattice type information to guide

the indexing.

Do not use unit cell parameters as prior

information for the core indexing algorithm

52

53

The Unit Cell Explorer tool

54

Indexamajig1. Create a list of filenames to process:

$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name

'cxin5016-r0052-c00.cxi' -print > tutorial.lst

2. Rough estimation of the unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream

$ grep "Cell parameters" tutorial.stream

$ grep "centering" tutorial.stream

$ cell_explorer tutorial.stream

3. Index the patterns using Bravais lattice information only:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream

55

56

57

Indexamajig1. Create a list of filenames to process:

$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name

'cxin5016-r0052-c00.cxi' -print > tutorial.lst

2. Rough estimation of the unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream

$ grep "Cell parameters" tutorial.stream

$ grep "centering" tutorial.stream

$ cell_explorer tutorial.stream

3. Index the patterns using Bravais lattice information only:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream

4. Index the patterns using actual unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --int-radius=3,4,5 -

o tutorial.stream --indexing=mosflm-axes-latt-cell -p blaC.cell –integration=rings-

sat –tolerance=2,2,2,1.5

58

59

Indexamajig1. Create a list of filenames to process:

$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name

'cxin5016-r0052-c00.cxi' -print > tutorial.lst

2. Rough estimation of the unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream

$ grep "Cell parameters" tutorial.stream

$ grep "centering" tutorial.stream

$ cell_explorer tutorial.stream

3. Index the patterns using Bravais lattice information only:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-

raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream

4. Index the patterns using actual unit cell parameters:

$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --int-radius=3,4,5 -

o tutorial.stream --indexing=mosflm-axes-latt-cell -p blaC.cell –integration=rings-

sat –tolerance=2,2,2,1.5

5. Evaluate the quality of indexing:

$ ./check-near-bragg tutorial.stream -g cxil2316-nz1.geom

$ ./check-peak-detection --not-indexed tutorial.stream -g cxil2316-nz1.geom

$ ./check-peak-detection --indexed tutorial.stream -g cxil2316-nz1.geom

60

CrystFEL: Overall pipeline

White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.

61

1- Process_hkl

Takes a data stream, such as that from indexamajig, and merges the many individual intensities together toform a single list of reflection intensities which are useful for crystallography. Merging is done by theMonte Carlo method, otherwise known as taking the mean of the individual values

$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-

highres=2.0 –nshells=25

$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-

highres=2.0 –nshells=25 –-even=only

$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-

highres=2.0 –nshells=25 –-odd=only

2- Partialator is the alternative to process_hkl.

Merging and scaling the intensities

62

Symmetry Classification for SFX experiments

63

64

Reflections quality check

1- Check_hkl

It calculates figures of merit for reflection data, such as completeness and average signal strengths, inresolution shells. check_hkl accepts a single reflection list in CrystFEL's format, and you must alsoprovide a unit cell (in a PDB file or CrystFEL unit cell format).

$ check_hkl tutorial.hkl -y 2/m -p blaC.cell --lowres=40 --highres=3 --shells=25

65

66

Reflections quality check

1- Check_hkl

It calculates figures of merit for reflection data, such as completeness and average signal strengths, inresolution shells. check_hkl accepts a single reflection list in CrystFEL's format, and you must alsoprovide a unit cell (in a PDB file or CrystFEL unit cell format).

$ check_hkl tutorial.hkl -y 2/m -p blaC.cell --lowres=40 --highres=3 --shells=25

2- Compare_hkl

It compares two sets of reflection data and calculates figures of merit such as R-factors or CC1/2.Reflections will be considered equivalent according to your choice of point group. You need to provide aunit cell, as a PDB file or a CrystFEL unit cell file.

$ compare_hkl tutorial.hkl1 tutorial.hkl2 -y 2/m -p blaC.cell --lowres=40 --

highres=3 --shells=25

67