Scala user-group-19.03.2014

23
Collections in Clojure Jan Herich 2014-03-19 Mon Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23

description

Intro to clojure collections & collection abstractions

Transcript of Scala user-group-19.03.2014

Collections in Clojure

Jan Herich

2014-03-19 Mon

Jan Herich Collections in Clojure 2014-03-19 Mon 1 / 23

Outline

1 Basic Clojure collection types

2 Persistent characteristics of Clojure collections

3 Sequence abstraction and laziness

4 Reducers - better performance and parallelism

Jan Herich Collections in Clojure 2014-03-19 Mon 2 / 23

Basic Clojure collection types

Lists

List data-structureimplemented asordinarysingle-linked listLists are specialbecause they areused to composeClojure programsUnquoted lists aretreated as functioncalls by Clojureenvironment

;; list literal representation’(1 2 :id (3 4) "name");; unquoted list interpreted;; as function call(= (+ 1 2) 3);; get the first element(= (peek ’(1 2 3)) 1);; new vector from old one(= (pop ’(1 2 3)) ’(2 3))(= (conj ’(3 2 1) 4)

’(4 3 2 1))

Jan Herich Collections in Clojure 2014-03-19 Mon 3 / 23

Basic Clojure collection types

Sets

Sets are collectionsof unique elementsAs every collectionin Clojure, sets canbe heterogeneousFast membershiptest

;; set literal representation#{1 :id :type "name"};; testing membership(= true (contains? #{1 2} 2));; new set from old one(= (disj #{1 2 3} 2) #{1 3})(= (conj #{1 3} 2) #{1 2 3})

Jan Herich Collections in Clojure 2014-03-19 Mon 4 / 23

Basic Clojure collection types

Maps

Maps is a basicconstruct forholding structuredinformationDefaultimplementationuses a well-knownhash-mapmechanismFast look-up

;; map literal representation{:id 1 :name "John"};; Optional comma delimiters{:id 1, :name "John"};; lookup(= (get {:id 1 :name "John"} :id)

1);; new map from old one(= (dissoc {:id 1 :name "John"}

:name){:id 1})

(= (assoc {:id 1} :name "John"){:id 1 :name "John"})

Jan Herich Collections in Clojure 2014-03-19 Mon 5 / 23

Basic Clojure collection types

Vectors

Vector is the rightstructure forordered data whererandom look-up isnecessaryFast look-up byindexMaintains orderingof elements

;; vector literal representation[1 2 3 4 5];; lookup by zero based index(= (get [1 2 3] 2) 3);; new vector from old one(= (subvec [1 2 3 4 5] 2)

[3 4 5])(= (conj [1 2 3] 4)

[1 2 3 4])(= (assoc [1 3] 0 2) [2 3])

Jan Herich Collections in Clojure 2014-03-19 Mon 6 / 23

Persistent characteristics of Clojure collections

Non-destructive updates

All Clojure persistent collections support functional,non-destructive updates, instead of in-place mutation of dataTo guarantee that updates with such semantics will be fast andmemory efficient, it’s obvious that simple defensive copyingwon’t workLuckily, there is a technique called structural sharing, which canhelp us

Jan Herich Collections in Clojure 2014-03-19 Mon 7 / 23

Persistent characteristics of Clojure collections

Example of structural sharing

Before update After update

Jan Herich Collections in Clojure 2014-03-19 Mon 8 / 23

Sequence abstraction and laziness

Sequence as a powerful abstraction for collections

Sequence is a logical list, persistent and immutable view of thecollectionAll core Clojure collections provide sequence implementationsMost core Clojure transformation functions for manipulatingcollections like filter or map are defined in terms of sequencesThis is very handy when composing collection transformations

Jan Herich Collections in Clojure 2014-03-19 Mon 9 / 23

Sequence abstraction and laziness

Sequences explained

You can call seq on any Clojure collection, which yields sequenceimplementation appropriate to the collection. This implementationprovides following basic guarantees (which are defined in terms of theISeq interface under the hood):;; Returns the first item in the collection. Calls seq;; on its argument. If coll is nil, returns nil(first coll);; Returns a sequence of the items after the first.;; Calls seq on its argument. If there are no more items,;; returns a logical sequence for which seq returns nil(rest coll);; Returns a new seq where item is the first element;; and seq is the rest(cons item seq)

Jan Herich Collections in Clojure 2014-03-19 Mon 10 / 23

Sequence abstraction and laziness

How Clojure leverages sequences

As already mentioned, many Clojure functions are defined in terms ofsequences, for example, have a look at greatly simplified mapimplementation:(defn map [f coll]

(when-let [s (seq coll)](cons (f (first s)) (map f (rest s)))))

This enable the map function to operate on any collection whichsatisfies sequence interface, because the map function calls seq on itssecond (coll) argument. Notice that the map returns sequence aswell, with the consequence, that functions operating on sequencescan be easily composed together.

Jan Herich Collections in Clojure 2014-03-19 Mon 11 / 23

Sequence abstraction and laziness

Composing collection transformations

;; filter countries, calculate densities and sort them(->> ’({:code "SK" :area 49035 :population 5415949}

{:code "CZ" :area 78866 :population 10513209}{:code "AT" :area 83855 :population 8414638}{:code "HU" :area 93030 :population 9908798})

(filter (fn [country](> (get country :area) 80000)))

(map (fn [country](assoc country :density

(double (/ (get country :population)(get country :area))))))

(sort-by (fn [country](get country :density))))

Jan Herich Collections in Clojure 2014-03-19 Mon 12 / 23

Sequence abstraction and laziness

Laziness

As it turns out, it’s very easy to express infinite sequences, justby defining some recursive relations between sequence elementsClojure gives us many functions for infinite sequences, such asiterate;; infinite stream of ascending numbers from zero(iterate inc 0);; to avoid blocking the consuming thread, use take(take 10 (iterate inc 0))To be able to express such infinite sequences, we need to expresslazinessIn fact, most Clojure core functions (for example map) aredefined as lazy so they can consume and produce lazy sequences

Jan Herich Collections in Clojure 2014-03-19 Mon 13 / 23

Sequence abstraction and laziness

How to express laziness in Clojure

;; define fibonacci number as lazy sequence with;; the help of lazy-seq macro(defn fib [a b]

(cons a (lazy-seq (fib b (+ a b)))));; consume first ten numbers from sequence(take 10 (fib 0 1));; map is lazy as well(take 10 (map (fn [x] (* 3 x)) (fib 0 1)))

Jan Herich Collections in Clojure 2014-03-19 Mon 14 / 23

Reducers - better performance and parallelism

Reducers, or another useful collection abstraction

Why another abstraction if we already have sequences ?1 Laziness is great when we need it, but not always2 Sequence is fundamentally serial3 Those two points are problems if we want high-performing

solution which can easily exploit parallelism

Therefore, we need to find some new notion of collection, evensimpler one than sequence abstractionThe new, minimalist notion of collection is something which isreducible

Jan Herich Collections in Clojure 2014-03-19 Mon 15 / 23

Reducers - better performance and parallelism

How is reducible defined

It’s important to understand the reduce function:;; this is a simplified definition of reduce(defn reduce [f init coll]

(if-let [s (seq coll)](reduce f (f init (first s)) (rest s))init))

;; this is how we call reduce with reducing function(reduce (fn [accumulator item]

(* accumulator item))1’(1 2 3 4 5 6 7))

Reducible is something which can reduce itself, and we are notinterested in actual mechanism

Jan Herich Collections in Clojure 2014-03-19 Mon 16 / 23

Reducers - better performance and parallelism

Digging deeper into reducers

Reducers are about transformation of reducing functions;; new simplified definition of map(defn mapping [f]

(fn [f1](fn [accumulator item]

(f1 accumulator (f item)))))Reducers library offer alternatives to sequence functions definedsimilar to mapping above => as a higher order functions whichtransform the reducing step to include the logic of mapping,filtering, etcWhat’s particularly nice, is that those functions consist only ofthe core logic of their operations

Jan Herich Collections in Clojure 2014-03-19 Mon 17 / 23

Reducers - better performance and parallelism

Applying reducers

If we keep the definition of mapping from previous slide, ourcode would be little strange;; our sequence based code(reduce + 0 (map (fn [x] (* x 3)) ’(1 2 3)));; and equivalent reducers based code(reduce ((mapping (fn [x] (* x 3))) +) 0 ’(1 2 3))Luckily, we are in a LISP land, so reducers library handles suchdetails with the help of macros and we are working withfunctions which have the same shape as before;; require reducers library(require ’[clojure.core.reducers :as r]);; use it(reduce + 0 (r/map (fn [x] (* x 3)) 0 ’(1 2 3)))

Jan Herich Collections in Clojure 2014-03-19 Mon 18 / 23

Reducers - better performance and parallelism

What we gain and what we loose

Reducers are faster and more memory efficient then theirsequence based counterparts, specially when moretransformations are chained (have a look at slide 12), becauseno intermediate sequences are producedThis is because composing reducers functions merely creates arecipe for future reduction, no work is done until reduce is calledWe loose laziness in the process, so we can’t write thisexpression with reducers anymore(take 10 (r/map (fn [x] (* 3 x)) (fib 0 1)))(compiler will complain, because unlike normal map, r/mapdoesn’t return a sequence)

Jan Herich Collections in Clojure 2014-03-19 Mon 19 / 23

Reducers - better performance and parallelism

Enter parallelism

With reducers, core collection operations are freed from lazinessand representation, but we are stuck with reduce function whichis serial as wellBut we can parallelize reduction by using independentsub-reductions and combining their resultsThere is a function which does just that: foldfold takes an combining function, reducing function andcollection and returns the result of combining the results ofreducing sub-segments of the collection, potentially in parallel

Jan Herich Collections in Clojure 2014-03-19 Mon 20 / 23

Reducers - better performance and parallelism

Fold example

(require ’[clojure.core.reducers :as r]);; we use the same combine and reduce function(r/fold + + [1 2 3 4 5 6]);; when this is the case, it’s enough to supply;; just reducing function and fold will use it;; to combine the the sub-reductions(r/fold + [1 2 3 4 5 6])

Jan Herich Collections in Clojure 2014-03-19 Mon 21 / 23

Reducers - better performance and parallelism

Conclusion

Fold will take advantage of collections which are amenable toparallel subdivision, ideal candidates are trees, such as Clojurevectors and mapsParallel implementations of fold for those collections are basedupon Java ForkJoin frameworkIf the underlying collection is not suited for parallel subdivision(as is the case with sequence), fold just devolves into reduce

Jan Herich Collections in Clojure 2014-03-19 Mon 22 / 23

Reducers - better performance and parallelism

The End

Thank you for your attentionI hope this presentation sparked yourinterest in Clojure, in which case, visitwww.clojure.org and learn more !

Jan Herich Collections in Clojure 2014-03-19 Mon 23 / 23