Processing serial crystallographic data from XFELs or...

18
22 Computational Crystallography Newsletter (2019). 10, 22–39 ARTICLES Processing serial crystallographic data from XFELs or synchrotrons using the cctbx.xfel GUI Aaron S. Brewster a , Iris D. Young a,b , Artem Lyubimov c , Asmit Bhowmick a , and Nicholas K. Sauter a a Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 b Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158 c Stanford Synchrotron Radiation Laboratory, SLAC National Accelerator Laboratory, Menlo Park, CA 94025 Correspondence email: [email protected], [email protected] Introduction The most important question when conducting a serial crystallographic (SX) experiment is “Have I collected enough data?” SX experiments at X-ray free electron lasers (XFELs) can produce millions of images with hundreds of thousands of diffraction patterns. Each pattern needs to be indexed to determine the crystal orientation matrix and unit cell dimensions, and then integrated to produce intensities for the observed Miller reflections. These images are ‘stills’, meaning they are collected without rotating the crystal in the beam due to short pulse length, usually 10s of femtoseconds long (reviewed in Bergmann et al. (2017)). Similarly, serial crystallography experiments at synchrotrons can produce hundreds of thousands images, also all stills, using fixed-target mounting systems on chips or loops and then raster-scanning through the crystals without rotating the sample in the beam (reviewed in Sierra et al. (2018)). The amount of data produced by either approach can make it difficult to determine whether sufficient data have been collected. While estimates of completeness, multiplicity of measurements, signal vs. noise, cross-correlation statistics, and unit cell isomorphism can all give useful insights as to dataset quality, knowing when a dataset answers a particular scientific question can only come from examining the electron density maps. The specifics generally depend on the type of experiment being performed, but they usually rely on examining difference density in the maps, often comparing two time points, two mixing conditions or other such treatments against each other. Time at an XFEL is scarce, therefore feedback as to sample quality and data completeness as determined from electron density maps needs to be available as fast as possible. Challenges to rapid data processing include ensuring accurate detector calibration, aggregating and visualizing hit rates, crystal quality, automated processing job submission, monitoring the available computing systems, keeping samples and associated metadata organized, and rapidly merging data to create electron density maps. Particularly at XFELs where experimental teams can involve 20+ scientists, results need to be communicated effectively to all parties, including sample preparation teams, beamline scientists, sample injection specialists and data analysts. Finally, processing needs to be automated to allow scientists to spend time studying the data itself rather than focusing on the mechanics of submitting many jobs and monitoring their state. To solve these challenges we have developed the cctbx.xfel GUI (graphical user interface). This program, under active development, allows users to rapidly move through all phases of serial crystallographic data reduction in an organized matter, taking advantage of whatever local computing resources are available. The GUI is open source and part of the cctbx and DIALS software packages (Grosse-Kunstleve et al., 2002, Hattne et al., 2014, Winter et al., 2018). In this article we detail two tutorials for processing XFEL data using the cctbx.xfel GUI, including refining the geometry. In the first tutorial we use data from Nakane et al. (2016a). This is an iodine derivatized bacteriorhodopsin protein sample (HAD13a) collected on the octal- sensor MPCCD detector at SACLA. This dataset is useful for a tutorial since the initial geometry is good enough to index out of the box and it doesn't require further software libraries. In the second tutorial we process a thermolysin dataset collected on the CSPAD detector at LCLS, following the approaches shown in Brewster et al. (2018). We also show here how the GUI could be used to process data from synchrotrons, where the output files are typically stored in directories filled with

Transcript of Processing serial crystallographic data from XFELs or...

Page 1: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

22ComputationalCrystallographyNewsletter(2019).10,22–39

ARTICLES

1

Processing serial crystallographicdata fromXFELsor synchrotronsusingthecctbx.xfelGUI

AaronS.Brewstera,IrisD.Younga,b,ArtemLyubimovc,AsmitBhowmicka,andNicholasK.SauteraaMolecularBiophysicsandIntegratedBioimagingDivision,LawrenceBerkeleyNationalLaboratory,Berkeley,CA94720bDepartmentofBioengineeringandTherapeuticSciences,UniversityofCalifornia,SanFrancisco,CA94158cStanfordSynchrotronRadiationLaboratory,SLACNationalAcceleratorLaboratory,MenloPark,CA94025

Correspondenceemail:[email protected],[email protected]

2

IntroductionThemost important questionwhen conducting aserial crystallographic (SX)experiment is “Have Icollected enough data?” SX experiments at X-rayfreeelectron lasers (XFELs) canproducemillionsof images with hundreds of thousands ofdiffraction patterns. Each pattern needs to beindexed to determine the crystal orientationmatrix and unit cell dimensions, and thenintegratedtoproduceintensitiesfortheobservedMiller reflections. These images are ‘stills’,meaning they are collected without rotating thecrystal in the beam due to short pulse length,usually 10s of femtoseconds long (reviewed inBergmann et al. (2017)). Similarly, serialcrystallography experiments at synchrotrons canproduce hundreds of thousands images, also allstills, using fixed-target mounting systems onchips or loops and then raster-scanning throughthe crystals without rotating the sample in thebeam (reviewed in Sierra et al. (2018)). Theamount of data producedby either approach canmake it difficult to determine whether sufficientdatahavebeencollected.

While estimates of completeness, multiplicity ofmeasurements, signal vs. noise, cross-correlationstatistics, and unit cell isomorphism can all giveuseful insights as to dataset quality, knowingwhen a dataset answers a particular scientificquestion can only come from examining theelectron density maps. The specifics generallydepend on the type of experiment beingperformed, but they usually rely on examiningdifference density in the maps, often comparingtwo time points, twomixing conditions or othersuch treatments against each other. Time at anXFEL is scarce, therefore feedback as to samplequality and data completeness as determinedfromelectrondensitymapsneeds tobeavailableasfastaspossible.

Challenges to rapid data processing includeensuring accurate detector calibration,

3

aggregating and visualizing hit rates, crystalquality, automated processing job submission,monitoring the available computing systems,keeping samples and associated metadataorganized, and rapidly merging data to createelectron density maps. Particularly at XFELswhere experimental teams can involve 20+scientists, results need to be communicatedeffectively to all parties, including samplepreparation teams, beamline scientists, sampleinjection specialists and data analysts. Finally,processing needs to be automated to allowscientists to spend time studying the data itselfrather than focusing on the mechanics ofsubmittingmanyjobsandmonitoringtheirstate.

To solve thesechallengeswehavedeveloped thecctbx.xfel GUI (graphical user interface). Thisprogram, under activedevelopment,allowsusersto rapidly move through all phases of serialcrystallographic data reduction in an organizedmatter, taking advantage of whatever localcomputing resources are available. The GUI isopen source and part of the cctbx and DIALSsoftwarepackages (Grosse-Kunstleve etal.,2002,Hattneetal.,2014,Winteretal.,2018).

In this article we detail two tutorials forprocessing XFEL data using the cctbx.xfel GUI,including refining the geometry. In the firsttutorialwe use data from Nakane et al. (2016a).This is an iodine derivatized bacteriorhodopsinprotein sample (HAD13a) collected on the octal-sensorMPCCDdetector atSACLA. This dataset isuseful for a tutorial since the initial geometry isgoodenoughtoindexoutoftheboxanditdoesn'trequire further software libraries. In the secondtutorial we process a thermolysin datasetcollectedontheCSPADdetectoratLCLS,followingtheapproachesshowninBrewsteretal.(2018).

WealsoshowherehowtheGUIcouldbeusedtoprocessdatafromsynchrotrons,wheretheoutputfiles are typically stored in directories filledwith

Page 2: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

23

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

4

single image files. The tutorials here provide ademo designed to run on a small Linux nodeoutsideofthefacilitiesofinterest,buttheycanbeadapted easily to run at full scale either at thesefacilitiesorothercomputingenvironments.

SXdataprocessingworkflowsWe re-state herewhatwewrote in our previousnewsletterarticle(Brewsteretal.,2016),updatedtobemoregeneralacrossSXexperimentsoutsideofLCLS.Ithasbeenourexperiencethatanalyzingdata collected using serial crystallography (SX)typicallyrequiresthreedistinctprocessingstageslabeled here calibration, discovery, and batch.Calibrationreferstorefiningthegeometryoftheexperiment, but also includes some pre-processingsteps,suchascreatingbadpixelmasks.Using theseinputs, initialparametersarederivedthat describe the experiment, such as detectordistance,anybeamcorrectionparametersneededandsoforth.Duringdiscovery,theuserexaminesindividual diffraction patterns and searches forappropriate parameters for data reduction,including hitfinding parameters if used,spotfinding parameters, target unit celldimensions, crystal symmetry and an optimalmerging strategy. Finally,whenoptimal softwareconfigurationisestablished,theuserentersbatchprocessing mode, endeavoring to maximize theparallel computing options offered and, duringlive experiments, attempting to provideconstructivefeedbacktobeamlineoperatorsinasclose to real-time as possible. After theexperiment, theuserwilloftenneedto reprocessthe runs collected in batch mode. During batchprocessing, the user will continue to refineprocessing parameters as the results areevaluated, perhaps even revising initialexperimental geometry estimates. Thus the threestagesaresomewhat fluidas feedback from laterstagesmaycallforrepeatingearlierstages.

ProcessingatscaleusingMySQLAggregating feedback from SX experiments hasoftenbeendonebysearchingthroughlogfilesorresult files and creating plots and tables. Asexperimentsget large,with thousandstomillionsof images being processed acrossmany datasets,thescaleofthedatacomplicatesthiskindofdatascraping. To solve this problem, we used adatabase system implemented inMySQL to store

5

and retrieve information about every frameprocessed.Thesystemallowsustousestructuredqueries to quickly sort, aggregate and visualizecrystalindexingresultsandintegrationquality.

At LCLS, the facility staff has provided a MySQLserverforgeneralusers.Accessistrivialtoobtainby emailing thestaff.Forother facilitieswehaveprovidedaprogram,cctbx.xfel.ui_serverthatwrapsMySQL and initializes the database from scratch.Users at any facility can run this program, asdescribed below, either locally or on a computercluster. The cctbx.xfel GUI will connect to thisserver and use it to track jobs, processingparametersandsamplequalityforrapidfeedback.Because MySQL is designed as an enterprisesolution for managing large amounts of data, asexperiments expand in scope, the backend formanagingthelargeamountsofmetadatawillscaleaswell.

DataprocessingtutorialsThetwotutorialspresentedheredescribehowtoprocess datasets from SACLA and LCLS.We firstexplainhowto installandconfigure thecctbx.xfelsoftware and how to acquire the tutorial data,then,afterinitialcalibration,wedemonstratehowto use theGUI itself to submit andmonitor jobs,and visualize the processing results. Merging isdescribed, but this section is under activedevelopment and is likely to change after thisarticleispublished.

InstallationThe cctbx.xfel GUI comes with DIALS and Phenixinstallations andwill runnatively after installingMySQL. What follows are directions for astandalone (non-LCLS) installation.After that aredirections for an LCLS installation that includespsana, the package needed to read the LCLS fileformat (XTC) natively. Both procedures assumetheuserisonasinglenodesystem,withoutaccessto the original facility’s computers, thoughqueuing support using multiple nodes for largebatch processing is also described. ThesedirectionsareverifiedtoworkonLinuxCentos7.

Let $WORKING be a new, empty directory. Note,here $WORKINGalways refers to the full path tothatdirectory.Also let$NPROCbe thenumberofprocessorsavailableonyoursystem,forexample,32.

Page 3: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

24

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

6

StandalonebuildsDownloadtheinstallationscript:wgethttps://raw.githubusercontent.com/cctbx/cctbx_project/master/xfel/util/standalone_xfelgui_installer.sh

Runthescript,providingthedestinationfolder:

chmod+xstandalone_xfelgui_installer.sh

./standalone_xfelgui_installer.sh$WORKING

Thescriptwilldownloadthe latestversionofDIALS, install it inthe$WORKINGfolder, installMySQL,andcreateasetup.shscriptthatyoucanusetoputDIALSinyourpath:

source$WORKING/setup.sh

LCLSbuilds

Thecctbx.xfelGUI isavailable forallusers at LCLSat /reg/g/cctbx.However, for thepurposesof thistutorial,weassumenoaccess to the LCLScomputingsystems.To use thepsana libraries,weneed tobuildthesoftwaremanually;wecannotuseapre-builtDIALSbundle.Theinstallationscriptdoesthis.

Downloadthescript:

wgethttps://raw.githubusercontent.com/cctbx/cctbx_project/master/xfel/util/lcls_xfelgui_installer.sh

Runthescript,providingthedestinationfolderandthenumberofprocessors:

chmod+xlcls_xfelgui_installer.sh

./lcls_xfelgui_installer.sh$WORKING$NPROC

ThescriptwilldownloadthelatestversionofDIALS,psana,MySQL,andotherdependencies,buildthesoftware,andcreateasetup.shscriptthatyoucanusetoputDIALSinyourpath:

source$WORKING/setup.sh

FormoreinformationondeveloperbuildsofDIALS,seehttps://dials.github.io/documentation/installation_developer.html

DownloadandpreparetutorialdataThecctbx.xfelGUIrequiresaruntohavefinishedbeingcollectedbeforeprocessingbegins.Thereforeithasmultiplemodesformonitoringfornewdata,dependingonthefacility.LCLShasawebservicethatcanbeusedtoqueryifdataisavailable.SACLAusesCheetahtoprepareHDF5filesfromtherawdata,and indicates it is finished using a status file (Barty et al., 2014, Nakane et al., 2016b). Standalonefacilitiessuchassynchrotronsoftencollectaconstantnumberof filesinasingleraster. Ifallelsefails,timestampsandfilesizescanbemonitored.Wedownloadthedatafromcxi.db(Maia,2012):SACLAtutorialdata

• cd$WORKING;mkdir-pdata/run1;cddata/run1• GetarunfromHAD13afromhere:https://www.cxidb.org/data/43/HAD13a/.Hereweusethe

firstrunwhichhadthefewesthitsandisthesmallestfilesize:wgethttp://portal.nersc.gov/archive/home/projects/cxidb/www/43/HAD13a/run371999-0.h5

• CreateaCheetahstatus.txtfilesotherunwillbeseenbytheGUI:echoStatus=Finished>status.txt

LCLStutorialdataThesedirectionsassumetheuserisnotworkingontheLCLSinteractivenodes.

• cd$WORKING;mkdirlcls;cdlcls

Page 4: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

25

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

7

• LCLSusesa.datfilethatmapsexperimentnamestoexperimentnumber.Createa.datfilewithonlythethermolysinexperimentinit:

o mkdirExpNameDbo echo"280CXIcxi78513">ExpNameDb/experiment-db.dat

• Downloadarunfromcxi.db,entry81(seehttps://www.cxidb.org/id-81.html):o mkdir-pCXI/cxi78513/xtc;cdCXI/cxi78513/xtco wget

http://portal.nersc.gov/archive/home/projects/cxidb/www/81/cxi78513/xtc/e280-r0013-s00-c00.xtc

o Note,thisis81.8GB!Andit’sonly1/5thofrun13!• Buildthesmalldatafileneededtoreadthextcfile:

o mkdirsmalldatao smldata-fe280-r0013-s00-c00.xtc-osmalldata/e280-r0013-s00-c00.smd.xtc

• Download the calibration folder. This includes the CSPAD geometry file refined in Brewster2018 (see also https://github.com/phyy-nx/dials_refinement_brewster2018) and the CSPADdarkpedestalfiles.

o cd$WORKING/lcls/CXI/cxi78513o wget

http://portal.nersc.gov/archive/home/projects/cxidb/www/81/cxi78513/calib.tar.gzo tar-xvfcalib.tar.gz;rmcalib.tar.gz

• Export variables instructing psana where the data are (note you can add these lines to yoursetup.shscriptfromtheinstallationsectionafterthesourcecommands).

export SIT_DATA=$WORKING/lcls export SIT_ROOT=$SIT_DATA export SIT_PSDM_DATA=$SIT_DATA

IftheuserisrunningatLCLSontheirowndata,allofthesestepscanbeskippedaspsanahasdefaultsthatcanfindthedatain/reg/d/psdm,thedata’sdefaultlocation.

StartMySQLTheMySQLserver iswrappedbytheprogramcctbx.xfel.ui_server.Thisprogramtakesasanargumentthedirectoryinwhichthedatabasewillbeinitialized,whichdirectorymustbeemptythefirsttimetheprogramisran.Thefirsttimetheprogramisran,arootpasswordwillberequestedwhichwillbeusedfortherootdatabaseaccountthattheprogramwillcreateandsetup.Thisshouldnotbeyoursystemrootpassword.Subsequently,theprogramcanberunonaclusterorlocally,asneeded.

• cd$WORKING• cctbx.xfel.ui_serverdb.port=3307db.server.basedir=$WORKING/MySQLdb.user=guidemo2019

db.name=guidemo2019o Note,thedb.useranddb.namefieldscreateaMySQLuserandaMySQLdatabasewithin

theMySQL/datafolder.ApasswordcanalsobeprovidedbutitwouldbestoredasunencryptedtextintheGUIsettingsfilesothisisnotrecommended.Hereweleavethepasswordblank.

• Providearootpasswordandwaituntil"Raisingmaxconnections"appears• Backgroundtheprocess(CTRL-Z,thentypebg)• Note,whendoneprocessing,shuttheserverdownusingfgfollowedbyCTRL-C

ThisstepcanbeskippedwhenrunningatLCLSitself,providedthatfacilitystaffhasgrantedaccesstopsdb-user.slac.stanford.edu.

InitialcalibrationandmaskingInthecaseofHAD13a,theinitialgeometryissufficientfor indexing. If itwerenot, theinitialdetectorposition could be determinedusing a powder pattern froma known sample, such as silver behenate

Page 5: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

26

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

8

(AgBeh).Todothis,usingtheaveragingcommandsbelowontheAgBehdataset,rundials.image_viewerand use the Actions: show unit cell tool to determine the new beam center and detector distance byfittingtheoverlaidrings.

Given a good initial geometry, follow these steps to create anuntrustedpixelmaskusing an averageimage.ThisisoptionalbecausefortheHAD13adataset,thebeamstopisasimplecircleinthemiddleofthe imageandit canbemaskedoutusinga low-resolution filterduringprocessing.However,youcanusedials.image_viewertooltocreateacustommaskifneeded.

• cd$WORKING• mkdiraverages;cdaverages• dxtbx.image_average../data/run1/run371999-0.h5-v-n$NPROC• dials.image_viewer*.cbf• Pagetoavg.cbfifit'snotalreadydisplayed• Actions:showmasktool• Thegoalistomakealowresolutionmaskaroundthebeamstop.Ifthiswereasinglepanel

image,thecircletoolwouldwork,butbecauseitismultipanel,eachinnertileneedsitsownmask.Usethepolygontoolfourtimes.Whendone,clicksavemask.Apixels.maskfilewillbecreated.

• Testthemask:dials.image_viewer*.cbfmask=pixels.mask.Clicktheshowmaskbuttoninthesettingsdialog.Themaskedpixelswillturnred.

For theLCLS thermolysindata, the detector has alreadybeen calibrated (seeBrewster etal. (2018)).Follow the instructions at https://github.com/phyy-nx/dials_refinement_brewster2018/wiki/Averages-and-maskingtogenerateanuntrustedpixelmask.

Runandconfigurethecctbx.xfelGUITostartthecctbx.xfelGUI,onthecommandline,run:

• cd$WORKING• cctbx.xfel

At this point several settings dialogswill be used to configure the processing environment. In theseexamples, localprocessingisused(singlenode),butalternativesarediscussedbelow includingmulti-nodeclusters.StandaloneGUI(HAD13aexample)UsethefollowingsettingsfortheHAD13adataset:

• Loginwindow:o Experiment Tag: common. The experiment tag is a way to group processing results

together. We tend to use ‘common’ to indicate a set of processing results that areavailableforallcontributorstoanexperiment,butanystringofcharacterscanbeused.

o Facility:Standaloneo Output:$WORKING/results

• DBCredentialswindow:o DBHostname:127.0.0.1o DBPortnumber:3307o DBname:guidemo2019o DBusername:guidemo2019

• Facilityoptionswindow:o Foldertomonitor:$WORKING/datao Monitorfor:folders

Page 6: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

27

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

9

o Filematchingtemplate:run*.h5o Checkthe‘Filesarecomposite’box

• Advancedsettings:o Multiprocessing:localo Totalnumberofprocessors:$NPROCo Processing back end: cctbx.xfel (standalonemode).Note there are other options here,

includingthepossibilityofrunningotherprograms.Contacttheauthorsifinterested.• Note,theDBhostnamewillvarydependingonwhatsystemtheMySQLdatabaseisrunningon.

Forthistutorial,runningonalocalnode,weusetheIPaddressof‘localhost’.• Tip:theGUIsavesyoursettingsto~/.cctbx.xfel/settings.phil,whichwillbeautomaticallyused

thenexttimeyouruntheGUI.Note,ifrunningtheGUIatSACLAduringanexperiment,setthemultiprocessingsystemtoPBS,thenusethe blX-occupancy queue (where X is the beamline number) and specify the appropriate number ofcores per node (currently 28 cores) and the total number of cores desired. If that number is >28,multiplenodeswillbeusedperjob.MultipleuserscanruntheGUIatthesametimeusingthesameexperimenttaganddatabase.TheywillseethesamesetofprocessingresultsasthesingleMySQLbackendisqueried.However,itisadvisedtoonlymonitorfornewrunsandsubmitjobsfromasingleGUIinstance.LCLSGUI(thermolysinexample)

• Loginwindow:o ExperimentTag:common.o Facility:LCLSo Experiment:cxi78513o Output:$WORKING/results

• DBCredentialswindow:o DBHostname:127.0.0.1o DBPortnumber:3307o DBname:guidemo2019o DBusername:guidemo2019

• Facilityoptionswindow:o LCLSusername:<Leaveblankorgetfromstaff>o LCLSpassword:<Leaveblankorgetfromstaff>o ThesecredentialsarefortheLCLSwebserviceformonitoringfornewruns.Theservice

doesnotusethesamecredentialsasthefacility’sUnixaccounts.Ifthecredentialsarenotprovided,theGUIinsteadlooksforrunsinthextcfolderfortheexperiment.

• Advancedsettings:o Multiprocessing:localo Totalnumberofprocessors:$NPROCo Processingbackend:cctbx.xfel(LCLSmode).

Note these instructions are for running outside of LCLS. If using the LCLS provided MySQL server,specifytheDBhostnameaspsdb-user.slac.stanford.edu, leavetheportblank(itwilldefaultto3306),and specify the DB name and DB user name provided to you by the facility staff. Additionally, formultiprocessing select LSF and then select either the psanaq (offline processing) or a high priorityqueueifprocessingduringanexperiment.

GUItab:RunsTheGUIisorganizedintoaseriesoftabs.Thefirsttab,runs,showstherunsdiscoveredbytheGUI.Clickthe‘Watchfornewruns’buttonandafteramomenttheavailablerunswillappear.Arunrepresentsa

Page 7: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

28

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

10

continuous time of data collection where all theparameters will the same. At a synchrotron, thiswill often representa single raster scan.TheGUIwill continue tomonitor for new data every fewseconds. You can disable the monitoring byclickingthe‘Watchfornewruns’buttonagain.

We found it difficult to keep track of which runnumbers correspond to which sample and thesampleconditions.Wetypically logmetadatainaspreadsheetduringthebeamtime,butwewantedtobeabletosortrunsbythismetadatawithintheGUi. To this end, runs can be tagged withdescriptiveterms,suchas ‘thermolysin’, ‘batch5’,‘timepoint 3’, and so forth. Click ‘Manage tags’ tocreate,rename,andremovetags.Clickaruntotagit,orclick ‘Changetagsonmultipleruns’toworkwithmanytagsatonce.Duringdatacollection,usethe ‘Manage persistent tags’ feature toautomatically tagnewrunswitha tagsetas theyarrive.Usethesetagstogrouprunstogetherusingwords appropriate to your experiment. You canuse these tags in the subsequent plots to quicklyswitchbetweenwhichdataisbeingexamined.

GUItab:TrialsThe list of parameters available to the coreprocessing program dials.stills_process isextensive,butmostofthetimeonlyafewdefaults

11

needtobechanged.Wehavefoundthatduringanexperiment the same data needs to be re-processed several times as it calibrated andexplored, changing geometry, spotfinding andindexing parameters. We group processingattempts that change parameters that generallyrefer to the crystal together into trials. Forexample, trial 0 may be our initial indexing trialbased on the published unit cell, but afterexamining the results, we see the unit cell isslightly different, so we resubmit the jobs into anew trial 1 with a better unit cell estimate. Theoutputfoldersareorganizedbyrunsandtrialstokeepthedataorganized.

Further, we have often found that sequentialgroups of runs tend to have the same set ofdetector-specific parameters, such as geometry.Therefore we create run groups (or run blocks)that identify sets of runs that shouldbesimilarlybehaved. Thus, a trial has sample and crystal-specificparameters,andatrialcomprisesasetofrun groups. Note thatwhile tags and run groupsbothorganizerunsintologicalgroupings,theyareused differently. Tags use user-definedmetadataspecific to the sample and experimentwhile rungroups organize runs into sets with similarprocessing parameters. Use the Trials tab tomanagethesefeatures:

12

Had13Trial0

• Gototrialstabandclicknewtrial.Settings:o Comment:asyoulike,forexample'HAD13a,initialindexing'o Changeresolutioninbottomrightto1.7.Thisresolutionisonlyusedinaggregatingtheresults,

notinindexingorintegratingthedata.o Inthecentralwindow,usetheseparameters:

spotfinder { filter { d_max = 19 min_spot_size = 2 } } indexing { known_symmetry { space_group = C2221 unit_cell = 46.2, 103.0, 128.7, 90, 90, 90 } } prediction.d_max=19

o Note prediction.d_max and spotfinder.filter.d_max can be removed if you created a static lowresolutionmaskabove.

Page 8: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

29

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

13

o Choosing good spotfinding parameters is critical. In this case the gain is close to 1 sospotfinder.threshold.dispersion.gain is not modified (unlike for the CSPAD, see below). TheDIALS tutorials online have further guidance for choosing these parameters usingdials.image_viewer(seehttps://dials.github.io/documentation/tutorials).

o If theunit cell andsymmetry areunknown, the indexingparameters canbe omitted inwhichcase theP1will be used. Clustering tools such as those in Zeldin etal. (2015) andGildea andWinter(2018)canbeusedtodeterminetheunitcellandsymmetry.

• Clickok.• Clicknewblock.Ifyoumadeamask,putthepathtoitinthe"Untrustedpixelmask"box(shouldbe$WORKING/averages/pixels.mask).Clickok.

• ChecktheActiveTrialbox

Additional rungroupscanbecreated asneeded,andcanbeaddedor removed froma trialusing the‘Selectblocks’button.Note,bydefault,asetofprocessingresultswillbecreatedforeachimage.Tosaveon file system usage, these can be combined using output.composite_output=True in the trialparameters.

ThermolysinTrial0

FortheCSPADthermolysindata,usetheaboveprocedurewiththesetrialparameters:

spotfinder { filter.min_spot_size=2 threshold.dispersion.gain=25 threshold.dispersion.global_threshold=100 } indexing { known_symmetry { space_group = P6122 unit_cell = 92.9 92.9 130.4 90 90 120 } refinement_protocol.d_min_start=1.7 }

14

Here,aglobalthresholdof100isusedtoremovenoisy pixels. Note, the LCLS version by defaultwrites CBF versions of each indexed image toassistdebuggingtherawXTCstreams.Tosaveonfile system usage, this can be disabled withdispatch.dump_indexed=False. Also, for LCLSprocessing,composite_output=Trueisthedefault.

For the run group, use CxiDs1.0:Cspad.0 for thedetector address, 580.119 for DetZ, and providethepathtotheuntrustedpixelmaskifyoucreatedone.

GUItab:JobsThe jobs tabdisplaysprocessing job information.A job consists of the output from a singleprocessingattempt, and isassociatedwitha trialnumber,arungroupnumber,andarun.Clickthe‘Auto-submit jobs` button. For local processingmode, each job not yet processed is ran insequence.ForLSF,PBS,orotherqueuingsystems,

15

all jobs not yet processed are submitted to thequeue.Asnewdataarrives,theyareautomaticallysubmitted. Under the hood, the programcxi.mpi_submit is used to submit the jobs (seeBrewsteretal.(2016)).

Processing results are logged to the MySQLdatabase, and are created as DIALS reflectiontables and experiment files in the output folder$WORKING/results, as configured above.The fullsetofcommandsandprocessingparametersusedforthejobarecopiedtothisfolder,injob-specificsubfolders,forarchivalpurposes.

Jobs can be terminated, deleted, and restartedusing the jobs tab. Deleting a job deletes all theresultsfromtheMySQLdatabaseandfromthefilesystemintheresultsfolder.

GUItab:RunStatsThe Run Stats tab displays a variety of hit rate

Page 9: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

30

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

16

information.Selecta trial anda run,and theplotwill be generated. See figure 1 for a thermolysinexample.

Everypartof thehit rateplot isadirect resultofissues encountered during SX experiments. Weneedtoknowavarietyofinformationataglance,such as whether we are hitting the sample withthe beam, are there crystals, are there multiplelattices,canweindexthem(ifnot,why),andwhat

N

str

ong s

pots

indexe

d/n

ot

indexe

d

Time (s)

% indexe

d%

2 latt

ices

% s

olv

ent

Reso

luti

on (

A)

% >

2.5

A

A

B

D

C

Figure1:RunStatstabshowingtwothermolysinruns.Note,thetutorialusesonly1/5thofrun13,whereashereweshowallofrun13plusrun14.Top:XFELGUI.Bottomrow:zoomsofthreesectionsofthemainplot.

16

information.Selecta trial and a run, and theplotwill be generated. See figure 1 for a thermolysinexample.

Everypartof thehit rateplot isadirect resultofissues encountered during SX experiments. Weneedtoknowavarietyof informationataglance,such as whether we are hitting the sample withthe beam, are there crystals, are there multiplelattices,canweindexthem(ifnot,why),andwhat

Page 10: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

31

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

1Thebackgroundringdefault2θisconfiguredforkaptontapescatterastypicallyseeninSXexperimentsusedwithdropontapemethods(seeFuller2017),andassuchthedefaultratioof1.0istoolowforinjectionorrasterexperimentswithoutakaptonsignal.Inthiscasetheratioshouldbesettoahighervalues,suchas1.5inthisexample,wherethethermolysinwas

17

is theirquality.Further,weneed the informationupdatedliveastheexperimentprogressed,butwealso need to be able to quickly compare runs toeach other, even if they were collected daysearlier.Tothisend,thedatabaseisqueriedandhitrates displayed as you select specific runs, auto-plot the last five runs, or plot the entireexperiment(allrunsatonce).

The top panel is split into four horizontal plots,labeled infigure1asA-D.In thehitrateplot(A),every imagehasonedot.Theheightof thedot ishow many reflections were found duringspotfinding on the image. The dot is blue if itindexedandgrayifitdidnot.

In the indexing plot (B), moving averages arecomputed for the solvent hit rate (green, right-handY-axis), theindexingrate (blue, left-handY-Axis) and themultiple lattice rate (magenta, left-hand Y-Axis). These rates are the percentage ofthe total number of shots in themovingwindowthat contained solvent, indexed successfully, orhadmultiplelatticespresent,respectively.

The solvent rate is computed by examining ratioofthewaterringintensitytothebackground.Thisis done by specifying two 2θ values in the rungroup,ahigh valueand a lowvalue.Thedefaultsare 22.8° for the water ring and 12.5° for thebackground ring (note these are wavelengthdependent followingBragg’s law). Radial averagevaluesarecomputedatthesetwopositionsandiftheir ratio is higher than the default of 1.0, thentheshotisconsideredhavingsolvent.1

Thediffractionqualityplot(C)showsonedotperimage inyellow,where theheightof thedot(leftY-axis) is the resolution the image diffracted to,usingameanI/σratiocutoffof1.0bydefault.Theyellowlineisamovingaverageofthepercentageof indexed images that diffract to at least 2.5Å(right Y-axis). This is the high-quality rate. Notethat the resolution estimates are usuallyoptimistic. After scaling and post-refinement,many reflections have reduced intensity due topartiality correction, since most reflections arepartial.ThereforetheusermaywishtoadjusttheI/σratiocutofftosomethingmorestringenttogetamorerealisticresolutionestimate.

18

Thestatistics bar (D) showsper run statistics. Inthecaseofthermolysinrun13,10312shotsoutof41382 images were considered hits, where a hithas at least 40 reflections. 8971 images indexed,6317ofwhichwere high quality. 44.0% of shotshadsolvent,and24.9%ofshotshadcrystals(thisisthehitrate).21.7%ofshotsindexed,and15.3%ofshotsindexedto2.5Å.Thehighqualityrate(notshown) would be (6317/8971) = 70.4% for thisrun.

Most of the parameters listed above, such as I/σratiocutoff,areconfigurableinthelower-lefthandcorneroftheRunStatstab.Thetwoimagedisplayoptions,‘Strongestindexedimages’and‘Strongestimages that didn’t index’ open the DIALS imageviewer and allowusers to lookat their best dataand their most problematic data, respectively.(Note this feature is only available for the LCLSfacility,butisindevelopmentforextantfacilities).

Some tipsonusage.A flat line in thehitratesbar(A) indicates none of the shots contain Braggdiffraction.Thiscouldmeanthereisnodatabutitcould also mean the spot finding thresholds aretoostringent.Amixofblueandgraydotsindicatesmany images are not indexing. The unit cellparameters or geometry might not be welloptimized. Many gray dots consistently higherthanthebluedotscouldindicatemultiplelattices,asindexingtendstofalloffiftoomanycrystalsarehitinashot.

CombiningAandB,ifthesolventrateiszeroandthehitrate is zero, then there isnosolvent in thebeam, indicatingthejet ismissingortherasterismissing, etc. If the solvent rate is high but thenumberofspotsislow,thecrystalsconcentrationistoolow.Ifthesolventrateandnumberofspotsare both high but the indexing rate is low, theindexingparametersorexperimentalgeometryiswrong,or thespotfindingparametersarepickingupnoisyreflections.

The spotfinder should find few reflections onimages considered misses, but poor spotfindingparameters can result in noisy mis-identifiedreflections, especially near a beam stop or othershadowing.Ifthebackgroundnumberofspotsperimage is too high (around 20-40), consider first

Page 11: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

32

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

19

adjusting the spotfinding parameters and if thatisn’t working, settingdispatch.hit_finder.minimum_number_of_reflections to a number slightly higher than thebackground of your spotfinding results. Thisparameter will skip indexing obvious misses,savingprocessingtime.

GUItab:UnitCellTheUnitCelltabusestagsetstodisplayunitcelldistributionsin2D(figure2top)and1D(figure2middleandbottom).Runsmustalreadybetaggedtobedisplayedinthisplot.Selectatrial,thenselectatagortagsandclick‘Addselection’.ThetagsetwillbeshownintheTagsetswindowand

Figure2:Unitcellplotforthermolysin.Note,thisisforallthethermolysindatafrom(Brewsteretal.,2018).

20

canberemovedfromtheplotusingthe‘Removeselection’button.

A tag set is one or more collections of runs alltaggedwithtagsintheset.Thesetcanbeaunionor an intersection, meaning display any imagesfromrunstaggedwithanyofthesetags(union),ordisplayonly images fromrunswithall these tags(intersection).

Forexample,infigure3,Runs11-22aretaggedas‘Group1’,andruns26-29aretaggedas‘Group2’.Inthiscaseitcanbeseenthatgroups1and2donotoverlapeventhoughtheyarebothmeasurementsof the same protein. In Brewster 2018, thegeometryforeachrunisrefinedseparately,which

Page 12: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

33

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

21

helps to correct this pattern by slightly adjustingthe detector distance in a time-dependentrefinement, causing these populations to betteroverlap(notshown).

Geometryrefinementandre-indexingThese instructions, shown here for the HAD13adatabutgenerallyapplicable,aresimilartothosegiven here:http://cci.lbl.gov/xfel/index.php/2017_dials.stills_process. For the CSPAD data, follow thedirections at https://github.com/phyy-nx/dials_refinement_brewster2018/wiki, underthe metrology refinement and metrology

Figure3:Unitcellplotfortwogroupsofrunsforthermolysin.Note,thisisforallthethermolysindatafrom(Brewsteretal.,2018).

22

evaluation sections (see also Brewster et al.(2016)andBrewsteretal.(2018)).

Recall that the DIALS file formats are twofold: aset of experimentalmodels linked together in anexperiment list and a list of reflections in areflectiontable.Anexperimentrepresentsasinglediffractionresultfromasinglecrystal,andforstillimages includes a detector model (with positionand orientation for each panel), a beam model(includingbeamdirectionandwavelength),andacrystalmodel(comprisingtheunitcellandcrystalorientation as well as mosaic parameters. In anexperiment list file, each experiment can refer to

Page 13: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

34

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

23

the same detector, so in the first step of geometry refinement, we create a list of experiments andreflectionssuchthateachexperimentpointstothesamedetectormodel.Wewillthenjointlyrefineallthemodels, such that all reflectionswill inform theposition of the detectorwhile the position of thedetectorwillaffecteachcrystalmodel(Watermanetal.,2016,Brewsteretal.,2018).

AlsonotethattheoctalMPCCDdetectoratSACLAhas8panels.Wegrouptheseintohierarchylevels,where level 0 represents the detector as a whole, and level 1 represents each panel individually. Adetector can have arbitrarily many hierarchy levels (for example, the CSPAD has 4: the detector,quadrants,2x1modulesandeachASIC).

• cd$WORKING;mkdir-pmetrology/t000;cdmetrology/t000• Combineandfilterexperiments:o dials.combine_experiments../../results/run1_run371999-0/000_rg001/out/*indexed.refl../../results/run1_run371999-0/000_rg001/out/*refined.exptreference_from_experiment.detector=0

o This will create combined.refl (a set of indexed reflections from all indexed images) andcombined.expt(asetofcrystallographicmodelsincluding1detectormodel,NbeammodelsandNcrystalmodels,whereNisthenumberofindexedimage.Inthetestforthisarticle,therewere92imagesindexed).

o cctbx.xfel.filter_experiments_by_rmsdcombined.*o Thiswill remove any images that have a particularly badRMSD,whereRMSD is the rootmeansquareddeviationoftheobservationsfromthepredictions. Inthiscase,3 imageswereremovedwiththiscommand.

• Refinethedetectorasablock(level0):o dials.refinerefine_level0.philfiltered.*output.experiments=refined_level0.exptoutput.reflections=refined_level0.refl

oWherethefilerefine_level0.philcontains:refinement { parameterisation { auto_reduction { min_nref_per_parameter = 3 action = fail fix *remove } beam { fix = *all in_spindle_plane out_spindle_plane wavelength } detector { fix_list = Tau } } refinery { engine = SimpleLBFGS LBFGScurvs GaussNewton LevMar *SparseLevMar } reflections { outlier { algorithm = null auto mcd tukey *sauter_poon separate_experiments = False separate_panels = True } } }

o These parameters adjust the refiner for SX experiments as opposed to the defaults, which aregearedtowardstraditionalrotationexperimentsatsynchrotrons.Descriptionoftheparameters:§ auto_reduction: for stills, there are few reflections per shot, and thus few reflections perparameter.Welowerthecutoffto3ofhowmanyreflectionsperparametertousecomparedtothedefaultof5andweallowexperimentswithtoofewreflectionstoberemoved.

Page 14: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

35

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

24

§ We fix the beam so it does not refine. As the beam, the detector distance, and the unit celldimensionsarecorrelated,oneparametermustbeusedasaruler,andweassumethebeamlinefacilityhascorrectlycalibratedthewavelength.

§ Wefixtherotationofthedetectorarounditsnormalandarounditsfastandslowaxes.Theusershould feel encouraged to try other refinement strategies including letting the detector tiltrefine.Formore,seeBrewsteretal.(2018).

§ We use the sparse LevMar refiner which uses sparse algebra to handle the large matrix ofparametersneededforrefinement.Formore,seeBrewster2018.

§ We use the Sauter/Poon approach for outlier rejection (Sauter & Poon, 2010). To determineoutliers,foreachpanelweconsiderallthereflectionsonthatpanelacrossallexperiments.

o In the refinement summary you'll see the RMSD_X, RMSD_Y and RMSD_DeltaPsi will all havedecreased.

• Refinetheindividualpanels(level1):o dials.refinerefine_level1.philrefined_level0.*output.experiments=refined_level1.exptoutput.reflections=refined_level1.refl

oWherethefilerefine_level1.philcontains(differencefromlevel0highlightedinred):refinement { parameterisation { auto_reduction { min_nref_per_parameter = 3 action = fail fix *remove } beam { fix = *all in_spindle_plane out_spindle_plane wavelength } detector { fix_list = Group1Tau1,Tau2,Tau3 hierarchy_level = 1 } } refinery { engine = SimpleLBFGS LBFGScurvs GaussNewton LevMar *SparseLevMar } reflections { outlier { algorithm = null auto mcd tukey *sauter_poon separate_experiments = False separate_panels = True } } }

o Note,hereweallow the panel to tilt around their normal axes (Tau1),but still fix theother tiltaxes.Onepanelhastobefixedtolimitthedegreesoffreedom,henceGroup1Tau1isstillfixed(seeBrewsteretal.(2018)).

o RMSDswillfallevenfurtherafterthisrefinement.• Tovisualizerefinementresults:o cctbx.xfel.detector_residualshierarchy=1plot_max=0.3tag=Filteredfiltered.*&o cctbx.xfel.detector_residualshierarchy=1plot_max=0.3tag=Level0refined_level0.*&o cctbx.xfel.detector_residualshierarchy=1plot_max=0.3tag=Level1refined_level1.*&

Fromthetextoutputofcctbx.xfel.detector_residuals:

Filtered: RMSD (microns) 116.314413579 Overall radial RMSD (microns) 82.32334385 Overall transverse RMSD (microns) 82.1700058634

Page 15: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

36

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

25

Refined level 0: RMSD (microns) 66.0757037266 Overall radial RMSD (microns) 48.6571169644 Overall transverse RMSD (microns) 44.7044023747 Refined level 1: RMSD (microns) 41.3719691901 Overall radial RMSD (microns) 33.8076040347 Overall transverse RMSD (microns) 23.8471328277

Displacementplots fromcctbx.xfel.detector_residuals for the3stagesarein figure4.Toprow: filtered,middle row: refined level 0, bottom row: refined level 1. Each dot is a reflection. The color is thedisplacement between the observed and predicted reflection centroids. Left column: overalldisplacement.Middlecolumn:radialdisplacement(alongvectorfromthereflectiontothebeamcenter).Rightcolumn:transversedisplacement(alongvectornormaltobeamvectorandradialvector).

Figure4:PositionaldisplacementplotsfortheoctalMPCCDdetectoratSACLAbeforeandafterrefinement.

Page 16: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

37

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

26

Howmuchdidthedetectormove?Wecanusethecommand cspad.detector_shifts filtered.exptfiltered.refl refined_level1.exptrefined_level1.refl to evaluate this (note, thelegacynamecspad.detector_shiftswillberenamedat some point). In the ‘Detector shifts summary’table, level 0 moved 116.8 microns in XY and142.5 in Z. Level 1, where each panel movedindividually, moved on average 92.3+/-46.9micronsinXYand-208.6+/-43.5micronsinZ.

From these results we have the followingobservations:

• The cross-hatching in the unrefined datausuallymeansabadbeamcenter.

• ComparingRMSDsbetween theunrefinedandrefineddatasets isnotcompletelyfairduetooutlierrejection.SeeBrewster2018foradiscussion.

• After level 1 refinement, the upper rightpanelgetsmuchbetter.

• During level 0 refinement, the detectormovesinZ+93.3microns,butduringlevel1refinement,thepanelsmovedinZback-208.6 microns. This has been observedbefore (see Brewster et al. (2018) for adiscussion).

• The refined detector will have slightlydifferent Z values for each panel, whichmight not be physically realistic. Analternative refinement approach wouldconstrainall thepanelstomove thesameamount during level 1 refinement byaddingdetector.constraints.parameter=Dist.

To prepare the new metrology for use byindexing:

• mkdirsplit;cdsplit• dials.split_experiments../refined_level1*• cd..• mvsplit/split_00.expt.• rm-rfsplit/

Thistakesthefilerefined_level1.exptandremovesall but one of the experiments, creatingsplit_00.ext.Wusethisinsteadofthefullsetfromthe combined experiments list. We can thenreprocess the HAD13a data using the newmetrology:

27

• Back in the GUI tab click on new trial,

change the resolution to 1.7, change thecomment to 'HAD13a, refinedmetrology',andclickOK

• Clicknewblock,intheextraphilblockaddinput.reference_geometry=$WORKING/metrology/t000/split_00.expt

• Click Active trial and the new jobwill besubmitted

• Using the new metrology increased thenumberofimagesindexedfrom93to114

GUItab:MergingAsstated in the Introduction, themost importantquestion to answer during an SX experiment is“Have I collected enough data?” Therefore,merging data rapidly to create datasets forscientists to examine is vital. In this tab, tag setscanbeselected to list theoutput folders foreachruninthetagset.Thiscanbecopiedintomergingscripts for merging independently from the GUI.Currently, merging directly from the GUI isn’tsupported, but functionality for this will beimplementedbyOct2019.Fordetailsonmergingthe thermolysin data, seehttps://github.com/phyy-nx/dials_refinement_brewster2018/wiki/Merging.

IntegratingtheGUIintonewfacilitiesThecctbx.xfelGUIcanberunstandaloneusingSXdata in most environments as currentlyimplemented. For example, at a synchrotron, theuser may want to monitor a folder where newrasteringrunswillappearasnewdirectoriesfilledwithimages.SimplyconfiguretheGUItomonitorthis folder in the GUI setup screens. Facility-specific features, including areaswhereadditionscanbemadetriviallyarelistedhere:

• DIALS natively supports raw data frommostfacilities.ThisincludesXTC(LCLS),PALHDF5,SACLA Cheetah HDF5, NeXus (Eiger, EuXFELand SwissFEL), and any single-image format,suchasCBFandSMV,currently supportedbythedxtbxdatamodelinglibrarythat ispartofcctbx (Parkhurstetal.,2014).Addingsupportfornewbeamlinesisstraightforward.

• New data monitors. At LCLS this is done byquerying a webservice and at SACLA this isdone by monitoring for a Cheetah status.txt

Page 17: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

38

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

28

file.Otherstandalonemethodsincludelookingfor a certain number of files or watching fortime stamps and/or file sizes to beunchanging. Implementing new monitorsspecifictoafacilityisstraightforwardandcanbedonewithafewlinesofPython.

• Support for clustering systems. In addition tolocal mode, where simple Pythonmultiprocessing is done on the current node,LSF and PBS queuing systems are availablethrough the GUI. cxi.mpi_submit, the programused by the GUI to submit jobs, additionallysupportsSLURMandcustomqueuingsystems,such as the NESRC shifter technology, whichwill bemade available in theGUI in the nearfeature.

Please contact the authors if youneed additionalsupportinanyoftheseareas.

ConclusionsandoutlookThecctbx.xfelGUIhasbeensuccessfullyused liveduringexperimentsatLCLS,SACLAandPAL,andwill be made available at DLS, EuXFEL, andSwissFELinthecomingyear.TheGUIcanandhasbeen used remotely through X-windowsforwardingorvirtualsystemssuchasNoMachineorVNC.

It may be that facilities wish to integratecomponentsandfeaturesofthecctbx.xfelGUIintotheir pipelines without using the GUI itself, forexampletheMySQLdatabaseloggingandtheRun

29

Stats graphics. These programs are all based onPython object-oriented design and havedocumentedAPIsthatcanbemergedintoexistingframeworks. Again, please contact the authors ifthereisinterest.

Finally, as XFELs and synchrotronsmove SX intothe kilohertz and megahertz regimes, dataprocessingsolutionsdesignedfromthegroundupto scale with the experimental size will be vital.The MySQL and multiprocessing approachesdetailedherearedesignedexactlywiththescalingproblems in mind. We hope to work withbeamline scientists and facilities to incorporatethese methods into their systems, ensuring thefast-feedback and monitoring needed duringprecious SX beamtime to enable answering thatmostimportantquestionasfastaspossible,“HaveI collected enough data to answer my scientificquestions?”

AcknowledgementsThis research was supported by NIH grantGM117126fordata-processingmethods.Portionsof this research were carried out at LCLS at theSLACNationalAccelerator Laboratory, supportedby the US DOE, Office of Science, OBES underContract No. DE-AC02-76SF00515. Dataprocessingwasperformed inpartat theNationalEnergy Research Scientific Computing Center,supportedby theDOEOffice of Science, ContractNo.DE-AC02-05CH11231.

30

ReferencesBarty,A.,Kirian,R.A.,Maia,F.R.N.C.,Hantke,M.,Yoon,C.H.,White,T.A.&Chapman,H.(2014).Cheetah:

softwareforhigh-throughputreductionandanalysisofserialfemtosecondX-raydiffractiondata.J.Appl.Crystallogr.47,1118-1131.

Bergmann,U.,Yachandra,V.&Yano,J.(2017).X-RayFreeElectronLasers.London:TheRoyalSocietyofChemistry.

Brewster,A.S.,Waterman,D.G.,Parkhurst,J.M.,Gildea,R.J.,Michels-Clark,T.,Young,I.D.,Bernstein,H.J.,Winter,G.,Evans,G.&Sauter,N.K.(2016).ProcessingXFELdatawithcctbx.xfelandDIALS.ComputationalCrystallographyNewsletter7,32-53.

Brewster,A.S.,Waterman,D.G.,Parkhurst,J.M.,Gildea,R.J.,Young,I.D.,O'Riordan,L.J.,Yano,J.,Winter,G.,Evans,G.&Sauter,N.K.(2018).ImprovingsignalstrengthinserialcrystallographywithDIALSgeometryrefinement.ActaCrystallographicaSectionDStructuralBiology74,877-894.

Gildea,R.J.&Winter,G.(2018).DeterminationofPattersongroupsymmetryfromsparsemulti-crystaldatasetsinthepresenceofanindexingambiguity.ActaCrystallographicaSectionDStructuralBiology74,405-410.

Grosse-Kunstleve,R.W.,Sauter,N.K.,Moriarty,N.W.&Adams,P.D.(2002).TheComputationalCrystallographyToolbox:crystallographicalgorithmsinareusablesoftwareframework.J.Appl.Crystallogr.35,126-136.

Page 18: Processing serial crystallographic data from XFELs or ...cci.lbl.gov/publications/download/CCN_2019_p22_Brewster.pdf · Computational Crystallography Newsletter (2019). 10, 22–39

39

ARTICLES

ComputationalCrystallographyNewsletter(2019).10,22–39

31

Hattne,J.,Echols,N.,Tran,R.,Kern,J.,Gildea,R.J.,Brewster,A.S.,Alonso-Mori,R.,Glockner,C.,Hellmich,J.,Laksmono,H.,Sierra,R.G.,Lassalle-Kaiser,B.,Lampe,A.,Han,G.,Gul,S.,DiFiore,D.,Milathianaki,D.,Fry,A.R.,Miahnahri,A.,White,W.E.,Schafer,D.W.,Seibert,M.M.,Koglin,J.E.,Sokaras,D.,Weng,T.C.,Sellberg,J.,Latimer,M.J.,Glatzel,P.,Zwart,P.H.,Grosse-Kunstleve,R.W.,Bogan,M.J.,Messerschmidt,M.,Williams,G.J.,Boutet,S.,Messinger,J.,Zouni,A.,Yano,J.,Bergmann,U.,Yachandra,V.K.,Adams,P.D.&Sauter,N.K.(2014).AccuratemacromolecularstructuresusingminimalmeasurementsfromX-rayfree-electronlasers.Nat.Methods11,545-548.

Maia,F.R.N.C.(2012).TheCoherentX-rayImagingDataBank.Nat.Methods9,854-855.Nakane,T.,Hanashima,S.,Suzuki,M.,Saiki,H.,Hayashi,T.,Kakinouchi,K.,Sugiyama,S.,Kawatake,S.,Matsuoka,

S.,Matsumori,N.,Nango,E.,Kobayashi,J.,Shimamura,T.,Kimura,K.,Mori,C.,Kunishima,N.,Sugahara,M.,Takakyu,Y.,Inoue,S.,Masuda,T.,Hosaka,T.,Tono,K.,Joti,Y.,Kameshima,T.,Hatsui,T.,Yabashi,M.,Inoue,T.,Nureki,O.,Iwata,S.,Murata,M.&Mizohata,E.(2016a).MembraneproteinstructuredeterminationbySAD,SIR,orSIRASphasinginserialfemtosecondcrystallographyusinganiododetergent.ProceedingsoftheNationalAcademyofSciences113,13039-13044.

Nakane,T.,Joti,Y.,Tono,K.,Yabashi,M.,Nango,E.,Iwata,S.,Ishitani,R.&Nureki,O.(2016b).DataprocessingpipelineforserialfemtosecondcrystallographyatSACLA.J.Appl.Crystallogr.49,1035-1041.

Parkhurst,J.M.,Brewster,A.S.,Fuentes-Montero,L.,Waterman,D.G.,Hattne,J.,Ashton,A.W.,Echols,N.,Evans,G.,Sauter,N.K.&Winter,G.(2014).dxtbx:thediffractionexperimenttoolbox.J.Appl.Crystallogr.47,1459-1465.

Sauter,N.K.&Poon,B.K.(2010).Autoindexingwithoutlierrejectionandidentificationofsuperimposedlattices.J.Appl.Crystallogr.43,611-616.

Sierra,R.G.,Weierstall,U.,Oberthuer,D.,Sugahara,M.,Nango,E.,Iwata,S.&Meents,A.(2018).X-rayFreeElectronLasers,pp.109-184.

Waterman,D.G.,Winter,G.,Gildea,R.J.,Parkhurst,J.M.,Brewster,A.S.,Sauter,N.K.&Evans,G.(2016).Diffraction-geometryrefinementintheDIALSframework.Actacrystallographica.SectionD,Structuralbiology72,558-575.

Winter,G.,Waterman,D.G.,Parkhurst,J.M.,Brewster,A.S.,Gildea,R.J.,Gerstel,M.,Fuentes-Montero,L.,Vollmar,M.,Michels-Clark,T.,Young,I.D.,Sauter,N.K.&Evans,G.(2018).DIALS:implementationandevaluationofanewintegrationpackage.Actacrystallographica.SectionD,Structuralbiology74,85-97.

Zeldin,O.B.,Brewster,A.S.,Hattne,J.,Uervirojnangkoorn,M.,Lyubimov,A.Y.,Zhou,Q.,Zhao,M.,Weis,W.I.,Sauter,N.K.&Brunger,A.T.(2015).DataExplorationToolkitforserialdiffractionexperiments.ActaCrystallogr.DBiol.Crystallogr.71,352-356.