Post on 01-Nov-2014
description
ReducersA library and model for collection processing in Clojure
Leonardo Borges@leonardo_borgeshttp://www.leonardoborges.comhttp://www.thoughtworks.com
Thursday, 30 August 12
ReducersA library and model for collection processing in Clojure
Leonardo Borges@leonardo_borgeshttp://www.leonardoborges.comhttp://www.thoughtworks.com
...in 20 mins or le
ss
Thursday, 30 August 12
Reducers huh? Here’s the gist
Thursday, 30 August 12
You get parallel versions of reduce, map and filter
Reducers huh? Here’s the gist
Thursday, 30 August 12
You get parallel versions of reduce, map and filter
Reducers huh? Here’s the gist
Ta-da! I’m done!
Thursday, 30 August 12
You get parallel versions of reduce, map and filter
Reducers huh? Here’s the gist
Ta-da! I’m done!
and well under my 20 min limit :)
Thursday, 30 August 12
Alright, alright I’m kidding
Thursday, 30 August 12
How do reducers make parallelism possible?
Thursday, 30 August 12
• JVM’s Fork/Join framework• Reduction Transformers
How do reducers make parallelism possible?
Thursday, 30 August 12
Java requirements
• Fork/Join framework• Java 7 [1] or• Java 6 + the JSR166 jar [2]
Clojure requirements
• 1.5.0-* (this is still MASTER on github [3] as of 30/08/2012)
[1] - http://jdk7.java.net/[2] - http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar[3] - https://github.com/clojure/clojure
Before we start - this is bleeding edge stuff
Thursday, 30 August 12
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm•Uses deques - double ended queues.
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm•Uses deques - double ended queues.•Progressively divides the workload into tasks, up to a threshold
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm•Uses deques - double ended queues.•Progressively divides the workload into tasks, up to a threshold•Once it finished one task, it pops another one form its deque
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm•Uses deques - double ended queues.•Progressively divides the workload into tasks, up to a threshold•Once it finished one task, it pops another one form its deque•After at least two tasks have finished, results can be combined/joined
The Fork/Join Framework
Thursday, 30 August 12
•Based on divide and conquer•Work stealing algorithm•Uses deques - double ended queues.•Progressively divides the workload into tasks, up to a threshold•Once it finished one task, it pops another one form its deque•After at least two tasks have finished, results can be combined/joined•Idle workers can pop tasks from the deques of workers which fall behind
The Fork/Join Framework
Thursday, 30 August 12
Text is boring
Thursday, 30 August 12
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Fork/Join algorithm - simplified view
Workload is put in “deques”
Thursday, 30 August 12
Fork/Join algorithm - simplified view
...and progressively halved
Thursday, 30 August 12
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Fork/Join algorithm - simplified view
...up to a configured threshold
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1
Combine
Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Idle workers can “steal” items from other workersThursday, 30 August 12
Worker 1 Worker 2
Combine Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Combine
Fork/Join algorithm - simplified view
Thursday, 30 August 12
Worker 1 Worker 2
Fork/Join algorithm - simplified view
Final result
Thursday, 30 August 12
Let’s talk about Reducers
Thursday, 30 August 12
Let’s talk about Reducers
Motivations
• Performance• via less allocation• via parallelism (leverage Fork/Join)
Thursday, 30 August 12
Let’s talk about Reducers
Motivations
• Performance• via less allocation• via parallelism (leverage Fork/Join)
Issues
• Lists and Seqs are sequential• map / filter implies order
Thursday, 30 August 12
A closer look at what map does
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order• Laziness (not shown)
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order• Laziness (not shown)• Consumes List
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order• Laziness (not shown)• Consumes List• Builds List
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order• Laziness (not shown)• Consumes List• Builds List
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Oh, and it also applies the functionto each item before putting the result into the new list
Thursday, 30 August 12
A closer look at what map does
• Recursion• Order• Laziness (not shown)• Consumes List• Builds List
;; a naive map implementation(defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '()))
Oh, and it also applies the functionto each item before putting the result into the new list
This is what mapping means!
Thursday, 30 August 12
Reduction Transformers
Thursday, 30 August 12
Reduction Transformers
• Idea is to build map / filter on top of reduce to break from sequentiality
Thursday, 30 August 12
Reduction Transformers
• Idea is to build map / filter on top of reduce to break from sequentiality• map / filter then builds nothing and consumes nothing
Thursday, 30 August 12
Reduction Transformers
• Idea is to build map / filter on top of reduce to break from sequentiality• map / filter then builds nothing and consumes nothing• It changes what reduce means to the collection by transforming the reducing functions
Thursday, 30 August 12
What map is really all about
(defn mapping [f] (fn [f1] (fn [result input] (f1 result (f input)))))
Thursday, 30 August 12
But wait! If map doesn’t consume the list any longer, who does?
• reduce does!• Since Clojure 1.4 reduce lets the collection reduce itself (through the CollReduce / CollFold protocols)• Think of what this means for tree-like structures such as vectors• This is key to leveraging the Fork/Join framework
Thursday, 30 August 12
Now we can use mapping to create reducing functions
(reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14
Thursday, 30 August 12
Now we can use mapping to create reducing functions
(reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14
(fn [result input] (+ result (inc input)))
Thursday, 30 August 12
Now we can use mapping to create reducing functions
(reduce ((mapping inc) conj) [] [1 2 3 4]);; [2 3 4 5]
Thursday, 30 August 12
Now we can use mapping to create reducing functions
(reduce ((mapping inc) conj) [] [1 2 3 4]);; [2 3 4 5]
(fn [result input] (conj result (inc input)))
Thursday, 30 August 12
Now we can use mapping to create reducing functions
(reduce ((mapping inc) conj) [] [1 2 3 4]);; [2 3 4 5]
(fn [result input] (conj result (inc input)))
But it feels awkward to use it in this form
Thursday, 30 August 12
What do we have so far?
• Performance has been improved due to less allocations• No intermediary lists need to be built (see Haskell’s StreamFusion [4])• However reduce is still sequential
[4] - http://bit.ly/streamFusionThursday, 30 August 12
Enters fold
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)• Reduce/Combine strategy (think Fork/Join Framework)
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)• Reduce/Combine strategy (think Fork/Join Framework)• Segments the collection
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)• Reduce/Combine strategy (think Fork/Join Framework)• Segments the collection• Runs multiple reduces in parallel
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)• Reduce/Combine strategy (think Fork/Join Framework)• Segments the collection• Runs multiple reduces in parallel• Uses a combining function to join/reduce results
Thursday, 30 August 12
Enters fold
• Takes the sequentiality out or foldl, foldr and reduce• Potentially parallel (fallsback to standard reduce otherwise)• Reduce/Combine strategy (think Fork/Join Framework)• Segments the collection• Runs multiple reduces in parallel• Uses a combining function to join/reduce results
(defn fold [combinef reducef coll] ...)
Thursday, 30 August 12
The combining function is a monoid
• A binary function with an identity element• All the following functions are equivalent monoids
Thursday, 30 August 12
The combining function is a monoid
• A binary function with an identity element• All the following functions are equivalent monoids
+(+ 2 3) ; 5(+) ; 0
Thursday, 30 August 12
The combining function is a monoid
• A binary function with an identity element• All the following functions are equivalent monoids
(defn my-+ ([] 0) ([a b] (+ a b)))
(my-+ 2 3) ; 5(my-+) ; 0
Thursday, 30 August 12
The combining function is a monoid
• A binary function with an identity element• All the following functions are equivalent monoids
(require ‘[clojure.core.reducers :as r])
(def my-+ (r/monoid + (fn [] 0)))
(my-+ 2 3) ; 5(my-+) ; 0
Thursday, 30 August 12
fold by examples
;; all examples assume the reducers library is available as r(ns reducers-playground.core (:require [clojure.core.reducers :as r]))
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector))))
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs
(time (reduce + (r/map inc (r/filter even? my-vector))))
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs
(time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs
(time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs
(time (r/fold + (r/map inc (r/filter even? my-vector))))
Thursday, 30 August 12
fold by examples:increment all even positive integers up to 10 million
and add them all up;; these were taken from Rich’s reducers talk(def my-vector (into [] (range 10000000)))
(time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs
(time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs
(time (r/fold + (r/map inc (r/filter even? my-vector)))) ;; 130msecs
Thursday, 30 August 12
fold by examples:standard word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Thursday, 30 August 12
fold by examples:standard word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
(time (count-words wiki-dump)) ;; 45 secs
Thursday, 30 August 12
fold by examples:parallel word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Thursday, 30 August 12
fold by examples:parallel word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Combining fn
Thursday, 30 August 12
fold by examples:parallel word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Will be called at the leaves to merge the partial computations
Thursday, 30 August 12
fold by examples:parallel word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Will be called with no arguments to provide a seed value
Thursday, 30 August 12
fold by examples:parallel word count
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Thursday, 30 August 12
fold by examples:parallel word count
(time (p-count-words wiki-dump)) ;; 30 secs
(def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
(defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"\w+" text)))))
Thursday, 30 August 12
fold by examples:Load 100k records into PostgreSQL
(def records (into [] (line-seq (BufferedReader. (FileReader. "dump.txt")))))
Thursday, 30 August 12
fold by examples:Load 100k records into PostgreSQL
(time (doseq [record records] (let [tokens (clojure.string/split record #"\t" )] (insert users/users (values { :account-id (nth tokens 0) ... })))))
Thursday, 30 August 12
fold by examples:Load 100k records into PostgreSQL
(time (doseq [record records] (let [tokens (clojure.string/split record #"\t" )] (insert users/users (values { :account-id (nth tokens 0) ... })))))
;; 90 secsThursday, 30 August 12
fold by examples:Load 100k records into PostgreSQL in parallel
(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"\t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)))
Thursday, 30 August 12
fold by examples:Load 100k records into PostgreSQL in parallel
;; 50 secs
(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"\t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)))
Thursday, 30 August 12
When to use it
Thursday, 30 August 12
When to use it
• Exploring decision trees
Thursday, 30 August 12
When to use it
• Exploring decision trees• Image processing
Thursday, 30 August 12
When to use it
• Exploring decision trees• Image processing• As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators)
Thursday, 30 August 12
When to use it
• Exploring decision trees• Image processing• As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators)• Basically any list intensive program
Thursday, 30 August 12
When to use it
• Exploring decision trees• Image processing• As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators)• Basically any list intensive program
But the tools are available to anyone so be creative!
Thursday, 30 August 12
Resources
• The Anatomy of a Reducer - http://bit.ly/anatomyReducers• Rich’s announcement post on Reducers - http://bit.ly/reducersANN• Rich Hickey - Reducers - EuroClojure 2012 - http://bit.ly/reducersVideo (this presentation was heavily inspired by this video)• The Source on github - http://bit.ly/reducersCore
Leonardo Borges@leonardo_borgeshttp://www.leonardoborges.comhttp://www.thoughtworks.com
Thursday, 30 August 12
Thanks!
Questions?
Leonardo Borges@leonardo_borges
http://www.leonardoborges.comhttp://www.thoughtworks.com
Thursday, 30 August 12