Bolt: Building A Distributed ndarray

40
Bolt: Building a distributed ndarray Jason Wittenbach Janelia Research Campus (HHMI) Freeman Lab

Transcript of Bolt: Building A Distributed ndarray

Bolt: Building a distributed ndarray

Jason Wittenbach Janelia Research Campus (HHMI) Freeman Lab

t, (x, y, z)

time ~ 104

space ~ 107

~ 1011 elements

~ 1 TB

(x, y, t)

(x, y, z, t)

(x, y, z, c, t)

(n, k)

time

time

time

x

y

neuroscienceastronomy

geospatial

climate science

• a distributed ndarray• built on PySpark• conforms to NumPy API

data.mean(axis=0)

data[2, 4:10]

data.T

(t, x, y, z) (x, y, z, t)

(v, w, x, y, z)

(v,w) yx z

(v,w,x)

(v, w, x, y, z)

yz

(v, w, x, y, z)

(v)

indexing slicingapply-along-axis

indexing

transpose

slicingapply-along-axis

reshape

indexing

transpose

map

slicingapply-along-axis

reshape

reduce filter

chunking padding

indexingfiltermap

(u,v) x

y

apply-along-axis

reduceByKey

(u,v) x

ymap

transpose

(u,v) x

y

map

transpose

(u,v) x

y

map

transpose

(u,v) x

y

shuffle

(t x,y,z) (x,y,z t)

(t x, y, z)

(t)x y

z

(t x, y, z)

(t)x y

z

(t x, y, z)

(t,chunk) (t,chunk)

(t,chunk)

chunking

transpose+ = shuffle

optimization

???

thanks join us!

Jeremy FreemanNicholas SofroniewAndrew Osheroff

Freeman Lab

Ken CarlileRobert Lines

Janelia ScientificComputing

bolt-project thunder-project

GitHub

@jsonWittenbach

Twitter