Stream Execution with Clojure and Fork/join
-
Upload
alex-miller -
Category
Technology
-
view
2.126 -
download
2
description
Transcript of Stream Execution with Clojure and Fork/join
![Page 1: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/1.jpg)
Stream Execution with Clojure and Fork/JoinAlex Miller - @puredangerRevelytix - http://revelytix.com
![Page 2: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/2.jpg)
Contents• Query execution - the problem• Plan representation - plans in our program• Processing components - building blocks• Processing execution - executing plans
2
![Page 3: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/3.jpg)
Query Execution
![Page 4: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/4.jpg)
Relational Data & Queries
SELECT NAMEFROM PERSONWHERE AGE > 20
4
NAME AGE
Joe 30
![Page 5: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/5.jpg)
RDF"Resource Description Framework" - a fine-grained graph representation of data
5
http://data/Joe
30
"Joe"
http://demo/age
http://demo/name
Subject Predicate Object
http://data/Joe http://demo/age 30
http://data/Joe http://demo/name "Joe"
![Page 6: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/6.jpg)
SPARQL queriesSPARQL is a query language for RDF
6
PREFIX demo: <http://demo/>SELECT ?nameWHERE { ?person demo:age ?age. ?person demo:name ?name. FILTER (?age > 20) }
A "triple pattern"
Natural join on ?person
![Page 7: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/7.jpg)
PREFIX demo: <http://demo/>SELECT ?nameWHERE { ?person demo:age ?age. ?person demo:name ?name. FILTER (?age > 20) }
Relational-to-RDF• W3C R2RML mappings define how to virtually
map a relational db into RDF
7
NAME AGEJoe 30
http://data/Joe
30
"Joe"
http://demo/age
http://demo/name
SELECT NAMEFROM PERSONWHERE AGE > 20
![Page 8: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/8.jpg)
Enterprise federation• Model domain at enterprise level• Map into data sources• Federate across the enterprise (and beyond)
8
Enterprise
SPARQL
SPARQLSPARQLSPARQL
SQLSQLSQL
![Page 9: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/9.jpg)
Query pipeline• How does a query engine work?
9
Parse Plan Resolve Optimize Process
SQL
Results!
AST Plan
Plan
Plan
Metadata
![Page 10: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/10.jpg)
Trees!
10
Parse Plan Resolve Optimize Process
SQL
Results!
AST Plan
Plan
Plan
Metadata
Trees!
![Page 11: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/11.jpg)
Plan Representation
![Page 12: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/12.jpg)
SQL query plans
12
Person
Dept
join filter project
DeptID Age > 20 Name, DeptName
DeptIDDeptName
NameAgeDeptID
SELECT Name, DeptNameFROM Person, DeptWHERE Person.DeptID = Dept.DeptID AND Age > 20
![Page 13: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/13.jpg)
SPARQL query plans
13
TP1
TP2
join filter project
?Person ?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
SELECT ?NameWHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) }
![Page 14: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/14.jpg)
Common modelStreams of tuples flowing through a network of processing nodes
14
node
node
node node node
![Page 15: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/15.jpg)
What kind of nodes?• Tuple generators (leaves)
– In SQL: a table or view– In SPARQL: a triple pattern
• Combinations (multiple children)– Join– Union
15
• Transformations– Filter– Dup removal– Sort– Grouping
– Project– Slice (limit / offset)– etc
![Page 16: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/16.jpg)
RepresentationTree data structure with nodes and attributes
16
TableTableNode
joinTypejoinCriteria
JoinNode
childNodesPlanNode
criteriaFilterNode
projectExpressionsProjectNode
limitoffset
SliceNode
Java
![Page 17: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/17.jpg)
s-expressionsTree data structure with nodes and attributes
17
(* (+ 2 3) (- 6 5) )
![Page 18: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/18.jpg)
List representationTree data structure with nodes and attributes
18
(project+ [Name DeptName] (filter+ (> Age 20) (join+ (table+ Empl [Name Age DeptID]) (table+ Dept [DeptID DeptName]))))
![Page 19: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/19.jpg)
Query optimizationExample - pushing criteria down
19
(project+ [Name DeptName] (filter+ (> Age 20) (join+ (project+ [Name Age DeptID] (bind+ [Age (- (now) Birth)] (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName]))))
![Page 20: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/20.jpg)
Query optimizationExample - rewritten
20
(project+ [Name DeptName] (join+ (project+ [Name DeptID] (filter+ (> (- (now) Birth) 20) (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName])))
![Page 21: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/21.jpg)
Hash join conversion
21
first+
let+
preduce+
join+
left tree
right tree
hash-tupleshashes
mapcat tuple-matches
left tree
right tree
![Page 22: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/22.jpg)
Hash join conversion
22
(join+ _left _right)
(let+ [hashes (first+ (preduce+ (hash-tuple join-vars {} #(merge-with concat %1 %2)) _left))] (mapcat (fn [tuple] (tuple-matches hashes join-vars tuple)) _right)))
![Page 23: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/23.jpg)
Processing trees
23
• Compile abstract nodes into more concrete stream operations:
– map+– mapcat+ – filter+
– first+ – mux+
– let+– let-stream+
– pmap+– pmapcat+ – pfilter+– preduce+
– number+– reorder+– rechunk+
– pmap-chunk+– preduce-chunk+
![Page 24: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/24.jpg)
Summary• SPARQL and SQL query plans have essentially
the same underlying algebra• Model is a tree of nodes where tuples flow from
leaves to the root• A natural representation of this tree in Clojure is
as a tree of s-expressions, just like our code• We can manipulate this tree to provide
– Optimizations– Differing levels of abstraction
24
![Page 25: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/25.jpg)
Processing Components
![Page 26: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/26.jpg)
PipesPipes are streams of data
26
Producer Consumer
Pipe
(enqueue pipe item)(enqueue-all pipe items)(close pipe)(error pipe exception)
(dequeue pipe item)(dequeue-all pipe items)(closed? pipe)(error? pipe)
![Page 27: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/27.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
![Page 28: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/28.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)
callback-fn
![Page 29: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/29.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)
callback-fn
![Page 30: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/30.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")
callback-fn
![Page 31: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/31.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")
callback-fn
![Page 32: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/32.jpg)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")3. (callback-fn "foo") ;; during enqueue
callback-fn
![Page 33: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/33.jpg)
PipesPipes are thread-safe functional data structures
28
![Page 34: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/34.jpg)
PipesPipes are thread-safe functional data structures
28
callback-fn
![Page 35: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/35.jpg)
Batched tuples• To a pipe, data is just data. We actually pass
data in batches through the pipe for efficiency.
29
[ {:Name "Alex" :Eyes "Blue" } {:Name "Jeff" :Eyes "Brown"} {:Name "Eric" :Eyes "Hazel" } {:Name "Joe" :Eyes "Blue"} {:Name "Lisa" :Eyes "Blue" } {:Name "Glen" :Eyes "Brown"}]
![Page 36: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/36.jpg)
Pipe multiplexerCompose multiple pipes into one
30
![Page 37: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/37.jpg)
Pipe teeSend output to multiple destinations
31
![Page 38: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/38.jpg)
Nodes• Nodes transform tuples from the input pipe and
puts results on output pipe.
32
fnInput Pipe Output PipeNode
•input-pipe•output-pipe•task-fn•state •concurrency
![Page 39: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/39.jpg)
Processing Trees• Tree of nodes and pipes
33
fn
fnfn
fn
fn
fn
Data flow
![Page 40: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/40.jpg)
SPARQL query example
34
TP1
TP2
join filter project
?Person ?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
SELECT ?NameWHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) }
(project+ [?Name] (filter+ (> ?Age 20) (join+ [?Person] (triple+ [?Person :Name ?Name]) (triple+ [?Person :Age ?Age]))))
![Page 41: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/41.jpg)
Processing tree
35
TP1
TP2
filter project
?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
first+
preduce+ hash-tuples
hashes
mapcat tuple-matches
let+
![Page 42: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/42.jpg)
Mapping to nodes• An obvious mapping to nodes and pipes
36
fn
fn
fnfnfn fn
fn project+filter+let+
triple pattern
triple pattern
triple pattern
first+
preduce+
![Page 43: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/43.jpg)
Mapping to nodes• Choosing between compilation and evaluation
37
eval
triple pattern
project
?Age > 20 ?Name
filterfn
fn
fnfnfn
fn let+
triple pattern
first+
preduce+
![Page 44: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/44.jpg)
Compile vs eval• We can evaluate our expressions
– Directly on streams of Clojure data using Clojure– Indirectly via pipes and nodes (more on that next)
• Final step before processing makes decision– Plan nodes that combine data are real nodes– Plan nodes that allow parallelism (p*) are real nodes– Most other plan nodes can be merged into single eval– Many leaf nodes actually rolled up, sent to a database– Lots more work to do on where these splits occur
38
![Page 45: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/45.jpg)
Processing Execution
![Page 46: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/46.jpg)
Execution requirements• Parallelism
– Across plans – Across nodes in a plan– Within a parallelizable node in a plan
• Memory management– Allow arbitrary intermediate results sets w/o OOME
• Ops– Cancellation– Timeouts– Monitoring
40
![Page 47: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/47.jpg)
Event-driven processing• Dedicated I/O thread pools stream data into plan
41
fn
fnfn
fn
fn
fn
Compute threadsI/O threads
![Page 48: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/48.jpg)
Task creation1.Callback fires when data added to input pipe2.Callback takes the fn associated with the node
and bundles it into a task3.Task is scheduled with the compute thread pool
42
fncallback Node
![Page 49: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/49.jpg)
Fork/join vs Executors• Fork/join thread pool vs classic Executors
– Optimized for finer-grained tasks– Optimized for larger numbers of tasks– Optimized for more cores– Works well on tasks with dependencies– No contention on a single queue– Work stealing for load balancing
43
Compute threads
![Page 50: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/50.jpg)
Task execution1.Pull next chunk from input pipe2.Execute task function with access to node's state3.Optionally, output one or more chunks to output
pipe - this triggers the upstream callback4.If data still available, schedule a new task,
simulating a new callback on the current node
44
42
fncallback
![Page 51: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/51.jpg)
Concurrency
• Delicate balance between Clojure refs and STM and Java concurrency primitives
• Clojure refs - managed by STM– Input pipe– Output pipe– Node state
• Java concurrency– Semaphore - "permits" to limit tasks per node– Per-node scheduling lock
• Key integration constraint– Clojure transactions can fail and retry!
45
![Page 52: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/52.jpg)
Concurrency mechanisms
Blue outline = Java lockall = under Java semaphoreGreen outline = Cloj txnBlue shading = Cloj atom
Acquire sempahore Yes Dequeue
inputInput
message Data
Close
set closed = true
empty
closed && !closed_done
Create task
acquire all semaphores
Yesrun-task
w/ nil msg
set closed_done = true
close output-
pipe
release all
semaphores
Yes
invoke task
Result message
release 1 semaphore
No
No
Input closed?
enqueue data on
output pipe
set closed = true
Closes output?
empty
Data
Yes Yes
Close
run-taskclose-output
process-input
![Page 53: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/53.jpg)
Memory management• Pipes are all on the heap• How do we avoid OutOfMemory?
47
![Page 54: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/54.jpg)
Buffered pipes• When heap space is low, store pipe data on disk• Data is serialized / deserialized to/from disk• Memory-mapped files are used to improve I/O
48
fnfn
fn
fn
0100 ….
![Page 55: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/55.jpg)
Memory monitoring• JMX memory beans
– To detect when memory is tight -> writing to disk• Use memory pool threshold notifications
– To detect when memory is ok -> write to memory• Use polling (no notification on decrease)
• Composite pipes– Build a logical pipe out of many segments– As memory conditions go up and down, each segment
is written to the fastest place. We never move data.
49
![Page 56: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/56.jpg)
Cancellation• Pool keeps track of what nodes belong to which
plan• All nodes check for cancellation during execution• Cancellation can be caused by:
– Error during execution – User intervention from admin UI– Timeout from query settings
50
![Page 57: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/57.jpg)
Summary• Data flow architecture
– Event-driven by arrival of data– Compute threads never block– Fork/join to handle scheduling of work
• Clojure as abstraction tool– Expression tree lets us express plans concisely– Also lets us manipulate them with tools in Clojure– Lines of code
• Fork/join pool, nodes, pipes - 1200• Buffer, serialization, memory monitor - 970• Processor, compiler, eval - 1900
• Open source? Hmmmmmmmmmmm……. 51
![Page 58: Stream Execution with Clojure and Fork/join](https://reader035.fdocuments.in/reader035/viewer/2022062312/555c2574d8b42a09438b4c29/html5/thumbnails/58.jpg)
Thanks...Alex Miller
@puredangerRevelytix, Inc.