The Revolution in Experimental and Observational...
Transcript of The Revolution in Experimental and Observational...
![Page 1: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/1.jpg)
The Revolution in Experimental and Observational Science
The Convergence of Data-Intensive and Compute-Intensive Infrastructure
Professor Tony Hey
Chief Data Scientist
STFC
![Page 2: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/2.jpg)
The Background
![Page 3: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/3.jpg)
X-Info• The evolution of X-Info and Comp-X for each discipline X• How to codify and represent our knowledge
• Data ingest • Managing a petabyte• Common schema• How to organize it • How to reorganize it• How to share with others
• Query and Vis tools • Building and executing models• Integrating data and Literature • Documenting experiments• Curation and long-term
preservation
The Generic Problems
Experiments &Instruments
Simulations
Literature
Other Archives
facts
facts
facts
facts
Questions
Answers
Slide thanks to Jim Gray
![Page 4: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/4.jpg)
What X-info Needs from Computer Science (not drawn to scale)
Science Data & Questions
Scientists
DatabaseTo store data
ExecuteQueries
Systems
Data MiningAlgorithms
Miners
Question & Answer
Visualization
Tools
Slide thanks to Jim Gray
![Page 5: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/5.jpg)
Thousand years ago – Experimental Science
• Description of natural phenomena
Last few hundred years – Theoretical Science• Newton’s Laws, Maxwell’s Equations…
Last few decades – Computational Science
• Simulation of complex phenomena
Today – Data-Intensive Science
• Scientists overwhelmed with data sets
from many different sources
• Data captured by instruments
• Data generated by simulations
• Data generated by sensor networks
e-Science and the Fourth Paradigm
2
2
2.
3
4
a
cG
a
a
eScience is the set of tools and technologiesto support data federation and collaboration
• For analysis and data mining• For data visualization and exploration• For scholarly communication and dissemination
With thanks to Jim Gray
![Page 6: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/6.jpg)
Artificial Neural Networks
Input Layer Hidden Layer Output Layer
![Page 7: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/7.jpg)
Machine Learning• Neural networks are one example of a
Machine Learning (ML) algorithm
• Deep Neural Networks are now exciting the whole of the IT industry since they enable us to:
• Build computing systems that improve with experience
• Solve extremely hard problems
• Extract more value from Big Data
• Approach human intelligencee.g. natural language processing
• The change in the Word Error Rate (WER) with time for the NIST “Switchboard” data.
• This shows the dramatic improvement made in the last few years using Deep Neural Networks
![Page 8: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/8.jpg)
![Page 9: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/9.jpg)
![Page 10: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/10.jpg)
![Page 11: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/11.jpg)
Data Science and the UKScience and Technology
Facilities Council
![Page 12: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/12.jpg)
UK Science and Technology Facilities Council (STFC)
Daresbury LaboratorySci-Tech Dasresbury CampusWarrington, Cheshire
![Page 13: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/13.jpg)
Big Data and Cognitive Computing:Hartree Centre collaboration with IBM Research
![Page 14: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/14.jpg)
Central Laser FacilityISIS (Spallation
Neutron Source)
Diamond Light Source
LHC Tier 1 computingJASMIN Super-Data-Cluster
Rutherford Appleton Lab and the Harwell Campus
![Page 15: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/15.jpg)
![Page 16: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/16.jpg)
• Assist universities in developing, maintaining and distributing computer programs
• Promoting the best computational methods
• Each focuses on a specific area of research
• Funded by the UK's EPSRC, PPARC and BBSRC Research Councils
Collaborative Computational Projects: The CCP's
![Page 17: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/17.jpg)
![Page 18: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/18.jpg)
![Page 19: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/19.jpg)
The Diamond Synchrotron
![Page 20: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/20.jpg)
Diamond Light Source
![Page 21: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/21.jpg)
Science Examples
Pharmaceutical manufacture & processing
Casting aluminium
Structure of the Histamine H1 receptor
Non-destructive imaging of fossils
![Page 22: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/22.jpg)
• 2007 No detector faster than ~10 MB/sec
• 2009 Pilatus 6M system 60 MB/s
• 2011 25Hz Pilatus 6M 150 MB/s
• 2013 100Hz Pilatus 6M 600 MB/sec
• 2013 ~10 beamlines with 10 GbEdetectors (mainly Pilatus and PCO Edge)
• 2016 Percival detector 6GB/sec
1
10
100
1000
10000
2007 2012
Detector Performance (MB/s)
Data Rates
Thanks to Mark Heron
![Page 23: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/23.jpg)
Thanks to Mark Heron
Cumulative Amount of Data Generated By Diamond
0
1
2
3
4
5
6
Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16
Cumulative Amount of Data Generated By Diamond
Data
Siz
e i
n P
B
![Page 24: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/24.jpg)
Nucleous
Cryo-SXT Data
● Noisy data, missing wedge artifacts, missing
boundaries
● Tens to hundreds of organelles per dataset
● Tedious to manually annotate
● Cell types can look different
● Few previous annotations available
● Automated techniques usually fail
Segmentation
Neuronal-like mammalian cell line; single slice
Nucleous
Cytoplasm
Challenges:
Data
● B24: Cryo Transmission X-ray Microscopy beamline at DLS
● Data Collection: Tilt series from ±65° with 0.5° step size
● Reconstructed volumes up to 1000x1000x600 voxels
● Voxel resolution: ~40nm currently
● Total depth: up to 10μm
● GOAL: Study structure and morphological changes of whole cells
3D Volume Data
Segmentation of Cryo-soft X-ray Tomography (Cryo-SXT) data
Computer VisionLaboratory
B24 beamlineData Analysis Software Group
![Page 25: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/25.jpg)
Data Preprocessing
Raw Slice Gaussian Filter Total Variation
Data Representation
SuperVoxels (SV) SV Boundaries
SuperVoxels: ● Groups of similar and adjacent voxels in 3D● Preserve volume boundaries● Reduce noise when representing data● Reduce problem complexity several orders of magnitude● Use Local clustering in {xyz + λ * intensity} space
Nucleous
Workflow
Data Preprocessing
Data Representation
Feature Extraction
User’s Manual Segmentations
Classification
Refinement
![Page 26: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/26.jpg)
Data Representation
Voxel Grid Supervoxel Graph
946 x 946 x 200 = 180M voxels 180M / (10x10x10) = 180K supervoxels
Initial Grid with uniformlysampled seeds
Local k-means in a small window around seeds
Nucleous
Workflow
Data Preprocessing
Data Representation
Feature Extraction
User’s Manual Segmentations
Classification
Refinement
![Page 27: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/27.jpg)
Nucleous
Workflow
Data Preprocessing
Data Representation
Feature Extraction
User’s Manual Segmentations
Classification
Feature ExtractionFeatures are extracted from voxels to represent their appearance:
● Intensity-based filters (Gaussian Convolutions)
● Textural filters (eigenvalues of Hessian and Structure Tensor)
User Annotation + Machine Learning
Refinement
User AnnotationsPredictions Refinement
Using a few user annotations along the volume as an input:
● A machine learning classifier (i.e. Random Forest) is trained to
discriminate between different classes (i.e. Nucleus and Cytoplasm)
and predict the class of each SuperVoxel in the volume.
● A Markov Random Field (MRF) is then used to refine the [email protected]
![Page 28: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/28.jpg)
SuRVoS Workbench(Su)per-(R)egion (Vo)lume (S)egmentation
Coming soon: https://github.com/DiamondLightSource/SuRVoS
Imanol Luengo <[email protected]>, Michele C. Darrow, Matthew C. Spink, Ying Sun, Wei Dai, Cynthia Y. He, Wah Chiu, Elizabeth Duke, Mark Basham,
Andrew P. French, Alun W. Ashton
![Page 29: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/29.jpg)
The ISIS Neutron and Muon Facility
![Page 30: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/30.jpg)
ISIS
![Page 31: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/31.jpg)
• 30 neutron instruments
• 3 muon instruments
• 1400 individual users per year making 3000 visits
• 800 experiments per year resulting in 450 publications
• Diverse science
• Fundamental condensed matter physics
• Functional materials e.g. multiferroics, spintronics
• Chemical spectroscopy e.g. catalysis and hydrogen storage
• Engineering e.g. stress and fatigue in power plants and
transportation
• Solvents in industry
• Structure of pharmaceutical compounds, biological
membranes
ISIS
![Page 32: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/32.jpg)
Peak Assignment in Inelastic Neutron Scattering
• Vibrational motion of atoms crucial for many properties of a material -e.g., how well it conducts electricity or heat
• Peaks in INS spectrum correspond to specific atomic vibrations
• Peak assignment: what specific vibrational motions of atoms give rise to specific peaks ?
INS Spectrum of crystalline benzene
S. Parker and S. Mukhopadhyay (ISIS)
![Page 33: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/33.jpg)
Modelling & Simulation forINS Peak Assignment Calculated INS Spectrum of crystalline benzene
• INS spectra can be computed for a given atomic structure
• Calculations allow us to see what specific vibrational motion of atoms occur, and at what frequency
L. Liborio
![Page 34: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/34.jpg)
Materials Workbench
K. Dymkowski
![Page 35: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/35.jpg)
The Central Laser Facility
![Page 36: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/36.jpg)
• National imaging facility with peer-reviewed, funded access
• Located in Research Complex at Harwell
• Cluster of microscopes and lasers and expert end-to-end multidisciplinary support
• Operations and some development funded by STFC
• Key developments funded through external grant – BBSRC, MRC
OCTOPUS Facility in the CLF
With thanks to Dan Rolfe
![Page 37: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/37.jpg)
Example: EGFR cell signalling in cancer• Driven OCTOPUS single molecule
developments
• User in plant cell imaging now catching up in scale of challenge
• Part of a PhD project:
• 1 experimental technique
• 50 experimental conditions
• 30 datasets for each condition
• 1000 single molecule tracks for each condition
• Multiple properties & events of interest in each track
• Comparison of just one property…
With thanks to Dan Rolfe
![Page 38: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/38.jpg)
![Page 39: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/39.jpg)
Large scale comparisons
With thanks to Dan Rolfe
![Page 40: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/40.jpg)
Multidimensional single molecule tracking
• Automated registration & tracking in multiple channels
• Computer vision
• Bayesian feature detection from astronomical galaxy detection
• Instrumental metadata from acquisition
• Flexible specification of many instrument configurations
Rolfe et al 2011, Euro Biophys J, 2011With thanks to Dan Rolfe
![Page 41: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/41.jpg)
The JASMIN Environmental ScienceSuper Data Cluster
![Page 42: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/42.jpg)
![Page 43: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/43.jpg)
![Page 44: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/44.jpg)
Large data sets: satellite observations
![Page 45: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/45.jpg)
![Page 46: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/46.jpg)
Why JASMIN?
• Urgency to provide better environmental predictions
• Need for higher-resolution models
• HPC to perform the computation
• Huge increase in observational capability/capacity
But…
• Massive storage requirement: observational data transfer, storage, processing
• Massive raw data output from prediction models
• Huge requirement to process raw model output into usable predictions (post-processing)
Hence JASMIN…
ARCHER supercomputer (EPSRC/NERC)
JAMSIN (STFC/Stephen Kill)
![Page 47: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/47.jpg)
JASMIN infrastructurePart data store, part HPC cluster, part private cloud…
![Page 48: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/48.jpg)
JC2-LSW1 JC2-LSW1 JC2-LSW1JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1JC2-LSW1 JC2-LSW1 JC2-LSW1
48 * 16 = 768 10GbE Non-blocking16 x 12 x 40GbE = 192 40GbE ports
S1036 = 32 x 40GbE
JC2-LSW1JC2-LSW1
JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1
16 x MSX1024B-1BFS48x10GBE + 12 40 GbE
16 x 12 40GbE = 192 Ports / 32 = 6Total 192 40 GbE Cable
1,104 x 10GbE Ports CLOS L3 ECMP OSPF
• ~1,200 Ports expansion
• Max 36 leaf switches :1,728 Ports @ 10GbE
• Non-Blocking, Zero Contention (48x10Gb = 12x 40Gb uplinks)
• Low Latency (250nS L3 / per switch/router) 7-10uS MPI
954 Routes
954 Routes
Non-blocking, low latency, CLOS Tree Network
![Page 49: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/49.jpg)
JASMIN “Science DMZ” Architecture
Supercomputer CenterSimple Science DMZ
http://fasterdata.es.net/science-dmz-architecture
![Page 50: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/50.jpg)
The UK Met Office UPSCALE campaign
10
01
00
10
00
01
11
0101
5 TB
per
day
Data conversion
& compression
2.5
TB JASMINData transfer
HERMIT @ HLRS
Automation controller
Clear data from HPC once successfully transferred and
data validated
![Page 51: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/51.jpg)
Example Data Analysis
• Tropical cyclone tracking has become routine; 50 years of N512 data can be processed in 50 jobs in one day
• Eddy vectors; analysis we would not attempt on a server/workstation (total of 3 months of processor time and ~40 GB memory needed) completed in 24 hours in 1,600 batch jobs
• JASMIN/LOTUS combination has clearly demonstrated the value of cluster computing to data processing and analysis.
M Roberts et al: Journal of Climate 28 (2), 574-596
![Page 52: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/52.jpg)
The Ada Lovelace Center
![Page 53: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/53.jpg)
The Experimental Data Challenge?• Data rates are increasing, facilities science more data intensive
• Handling and processing data has become a bottleneck to produce science
• Need to compare with complex models and simulations to interpret the data
• Computing provision at home-institution highly variable• Consistent access to HTC/HPC to process and interpret experimental data
• Computational algorithms more specialised
• More users without the facilities science background
Need access to data, compute and software services• Allow more timely processing of data
• Use of HPC routine not “tour de force”
• Generate more and better science
![Page 54: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/54.jpg)
Ada Lovelace Centre
The ALC will significantly enhance our capability to support the Facilities’ science programme:
• Theme 1: Increases capacity in advanced software development for data analysis and interpretation
• Theme 2: Develop new generation of scientific data experts and scientific software engineers who can interact with science domain experts
• Theme 3: Provide significant compute infrastructure for managing, analysing and simulating the data generated by the facilities and for designing next generation Big-Science experiments
Focus is the science drivers and computational needs of Facilities
![Page 55: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/55.jpg)
ALC Pathfinder: Tomographic Reconstruction• Support in-experiment and post-
experiment tomographic reconstruction
• Round-trip the data to HPC CPU/GPU clusters in experiment time
• Tomographic image reconstruction toolbox with different algorithms
• High throughput image reconstruction framework – time scheduled
• Visualisation on the beamline or remote • An integral component of IMAT’s in-experiment
data analysis capability through the ISIS Mantidsoftware suite
• Goal is to maximise the science from data collected on facility instruments
Image
Processing
Experiment
Visualisation
STFC Scientific Computing: Erica Yang, Srikanth Nagella, Martin Turner, Derek Ross
STFC ISIS: Winfried Kockelmann, Genoveva Burca, Federico Montesino Pouzols
DLS: Mark Basham
IMAT
SCARF
MANTID
![Page 56: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/56.jpg)
ALC Pathfinder: CCP4-DAaaS
SCD Cloud
Network
Storage
Servers
Virtualisation
OS
Run-time
Data
Applications-
-
Saa
S
-
-
Paa
S -
-
IaaS
NFSSCARF
CCP4
DLS
CCP4 – Macro-Crystallography suite• proteins, viruses and nucleic acids• determine macromolecular
structures by X-ray crystallography• Used by DLS users
• But need post-experimental access
Data Analysis as a Service- Remote access to data and
compute via SCD Cloud - CCP4 s/w maintained on Cloud
via VM packaging and distribution (CVMFS)
- User Portal provides access to right data and compute and workflows
Frazer Barnsley, Shirley Crompton, CCP4, et al
![Page 57: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/57.jpg)
The ALC - Towards a “Super-facility”?
“A network of connected facilities, software and expertiseto enable new modes of discovery”
Katie Antypas, Inder Monga, Lawrence Berkeley National Laboratory
Infrastructure + Software + Expertise
With Common Interfaces and Transparent Access
Data
Catalogue
Petabyte
Data storage
Parallel
File system
HPC
CPU+GPU
VisualisationData
Catalogue
Petabyte
Data storageParallel
File systemHPC
CPU+GPUVisualisationSoftware
Data
Acquisition
![Page 58: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/58.jpg)
New Opportunities: Reproducible Science
• Traceable science• Preservation• Provenance• Publishing
• A tool for the user• Tracking progress
• ‘RARE’ research• Robust• Accountable• Reproducible• Explainable
ALC can build in support for reproducible science
SXD Single crystal
diffractometer
ISIS Beamtime Application:SXD Round: 2010 1
RB1010274 Reversible B-H Bond Activation at Cationic Rh(III) Centres: Structural Characterization of Key Hydrogen-containing Intermediates
PI: Aldridge Dr S University of OxfordDepartment of Chemistry
Data DOIs
Perform research & gather data
Analyse collected data
Manage & curate
research data
Publish results
Crystallise research idea
Seek & gain
funding
![Page 59: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/59.jpg)
Jim Gray’s Vision: All Scientific Data Online
• Many disciplines overlap and use data from other sciences.
• Internet can unify all literature and data
• Go from literature to computation to data back to literature.
• Information at your fingertips –For everyone, everywhere
• Increase Scientific Information Velocity
• Huge increase in Science Productivity
(From Jim Gray’s last talk)
Literature
Derived and recombined data
Raw Data
![Page 60: The Revolution in Experimental and Observational Scienceescience-2016.idies.jhu.edu/wp-content/uploads/2016/11/Hey-slides.… · The Revolution in Experimental and Observational Science](https://reader034.fdocuments.in/reader034/viewer/2022051904/5ff5aae05582c4189a0f5f44/html5/thumbnails/60.jpg)
Acknowledgements:
With thanks to Mark Basham, David Corney, Jonathan Churchill, Imanol Luengo, Barbara Montanari, Brian
Matthews and Dan Rolfe