IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)
description
Transcript of IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)
![Page 2: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/2.jpg)
Themes
• How is the brain wired?
• How did the Universe start?
![Page 3: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/3.jpg)
How is the brain wired?The Connectome Project
![Page 4: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/4.jpg)
Connectome Team• Harvard Center for Brain Science
– Jeff Lichtman & Clay Reid
• Microsoft Research / UW– Michael Cohen
• Kitware Inc.– Will Schroeder, Charles Law, Rusty Blue
• VRVis Vienna– Markus Hadwiger, Johanna Beyer
• IIC– Amelio Vazquez, Eric Miller (Tufts)– Won-Ki Seung, Hanspeter Pfister
![Page 5: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/5.jpg)
The Scientific Challenge
composite from Roe et al. 1989, Sutton and Brunso-Bechtold 1991
![Page 6: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/6.jpg)
Confocal Microscopy:Brainbow
Adapted from OlympusConfocal.com
![Page 7: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/7.jpg)
Electron Microscopy: ATLUM
![Page 8: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/8.jpg)
Serial Sectioning
...Section i, i (1, …,N)
Adapted from http://parasol.tamu.edu Texas A&M University
z
x y
![Page 9: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/9.jpg)
40,000x40,000 pixels1.6 GB
120x120 µm (3 nm/pixel)
Here shown 40x undersampled
6 15mu EM big view
![Page 10: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/10.jpg)
5 8mu rlp
![Page 11: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/11.jpg)
4 3mu rlp
![Page 12: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/12.jpg)
3 1mu rlp
![Page 13: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/13.jpg)
2 300 nm rlp
![Page 14: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/14.jpg)
The Data Challenge• 1 mm3 ~= mouse thalamus ~= 1 petabyte
• 1 cm3 ~= mouse brain ~= 1 exabyte
• 1000 cm3 ~= human brain ~= 1 zettabyte
All of Google’s world-wide storage today ~= 1 exabyte
![Page 15: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/15.jpg)
Addressing the Data Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Computing
• Visualization
• Segmentation
• Analysis
![Page 16: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/16.jpg)
Addressing the Data Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Computing
• Visualization
• Segmentation
• Analysis
![Page 17: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/17.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Direct Volume Rendering
![Page 18: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/18.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Ray Casting• Image-order ray shooting
•Interpolate•Assign color & opacity•Composite
•Simple to implement•Very flexible
(adaptive sampling, …)•Correct perspective
![Page 19: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/19.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Transfer Functions• Mapping of density to optical properties• Simplest: color table with opacity over density
![Page 20: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/20.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Connectome: EM Data
![Page 21: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/21.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Single-Pass Ray Casting• Enabled by conditional loops • Substitute multiple passes with single loop and early
loop exit
• Volume rendering examplein NVIDIA CUDA SDK(procedural ray setup)
![Page 22: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/22.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Basic Ray Setup / Termination•Two main approaches:
•Procedural ray/box intersection[Röttger et al., 2003], [Green, 2004]
•Rasterize bounding box[Krüger and Westermann, 2003]
![Page 23: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/23.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Procedural Ray Setup / Term.•Procedural ray / box intersection
•Everything handled infragment shader
• Ray given by camera positionand volume entry position
• Exit criterion needed
• Pro: simple and self-contained• Con: full load on fragment shader
![Page 24: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/24.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
- =
"Image-Based" Ray Setup / Term.
• Rasterize bounding boxfront faces and back faces
• Ray start positions:front faces
• Direction vectors:back faces − front faces
• Independent of projection (orthogonal/perspective)
![Page 25: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/25.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Kernel• Image-based
ray setup• Ray start image• Direction image
• Ray-cast loop• Sample volume• Accumulate
color and opacity
• Terminate
• Store output
![Page 26: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/26.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Standard Ray Casting Optim. (1)
Early ray termination•Isosurfaces:
stop when surface hit•Direct volume rendering:
stop when opacity >= threshold
•Several possibilities•Current GPUs: early loop exit works well
![Page 27: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/27.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Standard Ray Casting Optim. (2)
Empty space skipping•Skip transparent samples•Depends on transfer function•Start casting close to first hit
• Several possibilities•Per-sample check of opacity (expensive)•Hierarchical data store (e.g., octree with stack-less
traversal [Gobbetti et al., 2008] )
•These are image-order:what about object-order?
![Page 28: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/28.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Object-Order Empty Space Skip. (1)
•Modify initial rasterization step
rasterize bounding box rasterize “tight" bounding geometry
![Page 29: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/29.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Object-Order Empty Space Skip. (2)
• Store min-max values of volume blocks• Cull blocks against transfer function or isovalue• Rasterize front and back faces of active blocks
![Page 30: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/30.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Connectome: Fluorescence Data
![Page 31: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/31.jpg)
MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Connectome: Implicit Surfaces
![Page 32: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/32.jpg)
Addressing the Data Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Computing
• Visualization
• Segmentation
• Analysis
![Page 33: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/33.jpg)
Active Ribbons
Active Ribbon:A set of two non-intersecting and coupled Active Contours
Active Contour: Deformable closed curve that can be used to segment objects in an image
Inner Active Contour
Outer Active Contour
Active Ribbon
![Page 34: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/34.jpg)
Results (Matlab)
![Page 35: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/35.jpg)
Axon Segmentation
![Page 36: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/36.jpg)
Interactive Analysis
![Page 37: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/37.jpg)
How did the Universe start?
The MWA Project
Kevin Dale, Richard Edgar, Daniel Mitchell, Randall Wayth, Lincoln Greenhill, and Hanspeter Pfister
![Page 38: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/38.jpg)
MWA CfA / IIC Team• Harvard Center for Astrophysics /
Smithsonian Astrophysical Observatory– Lincoln Greenhill– Daniel Mitchell– Randall Wayth– Stephen Ord
• IIC / SEAS– Richard Edgar– Kevin Dale, Hanspeter Pfister
![Page 39: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/39.jpg)
The Scientific Goals• Epoch of Re-
Inonisation (EOR)
• Heliospheric and Ionospheric
• Transient detection
• Pulsars, Surveys, Interstellar Medium, Galactic Magnetic Field, …
ionized
neutral
( H )
ionized
Th
e “G
ap
”
![Page 40: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/40.jpg)
The Murchison Widefield Array (MWA)
• Located in the remote Australian outback
• Extremely wide fields of view for radio astronomy in the 80-300 MHz band
• 512 tiles, each a 4x4 array of dipoles, scattered over ~ 1.5 km
• Data center for real-time processing co-located with the array
http://www.haystack.mit.edu/ast/arrays/mwa/index.html
![Page 41: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/41.jpg)
© Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
![Page 42: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/42.jpg)
© Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
![Page 43: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/43.jpg)
![Page 44: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/44.jpg)
Ionospheric offsets
Ungridded visibilities with bright sources
peeledImaging
Calibration
![Page 45: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/45.jpg)
FFT
Averaging ( !)
GriddingVector Rotation
16 GB/s
0.5s cadence
(1) GB/s
8s cadence
Mapping
Science
v. parallel computation
entangled Calibration Loop
The Data Rate Challenge
![Page 46: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/46.jpg)
Implementation• Hardware
• 2.4 GHz dual-core AMD Opteron, 4GB RAM
• NVIDIA Quadro FX 5600
• Software
• AMD Core Math Library (ACML)
• NVIDIA CUDA (CUBLAS, CUFFT)
• OpenGL
![Page 47: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/47.jpg)
Single-GPU SpeedupCPUGPU speedup
0 10 20 30 40 50 60 70
RotateAndAccumulateVisibilities
MeasureIonosphericOffset
MeasureTileResponse
ReRotateVisibilities
PeelTileResponse
UnpeelTileResponse
Gridding *
Imaging
� � � � � � �
������
Image Formation
Calibration Loop
Mostly OpenGL
![Page 48: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/48.jpg)
Example Results
GPU Reference
• Noisy images from test data
![Page 49: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/49.jpg)
Scaling to a Cluster
• 1000 frequency channels, 65 sources every 8 seconds, and 16002 output image
• 20-40 frequencies / GPU
• 32-64 GPUs, i.e., 16 Tesla S1070s
• Need MPI for internal data transfer
![Page 50: IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)](https://reader033.fdocuments.in/reader033/viewer/2022042813/547bc195b379597b2b8b4e08/html5/thumbnails/50.jpg)
Conclusions
• GPUs enable high-throughput scientific computing
• Performance gains of 10-100x
• CUDA makes life easier (but not perfect)
• Rasterization / OpenGL still useful
• Need CUDA MPI for clusters