Scala Parallel Collections Aleksandar Prokopec EPFL.

Scala Parallel Collections

Aleksandar ProkopecEPFL

Scala collections

for { s <- surnames n <- names if s endsWith n} yield (n, s)

McDonald

Scala collections


1040 ms

Scala parallel collections



for { s <- surnames.par n <- names.par if s endsWith n} yield (n, s)



2 cores

575 ms



4 cores

305 ms

for comprehensions

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s))}

for comprehensionsnested parallelized bulk operations

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s))}

Nested parallelism

Nested parallelismparallel within parallel

composition

surnames.par.flatMap { s => surnameToCollection(s) // may invoke parallel ops}

Nested parallelismgoing recursive

def vowel(c: Char): Boolean = ...


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield

recursive algorithms


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, Array(""))


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, Array(""))

1545 ms


def vowel(c: Char): Boolean = ...def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s

gen(5, ParArray(""))



gen(5, ParArray("")) 1 core

1575 ms



gen(5, ParArray("")) 2 cores

809 ms



gen(5, ParArray("")) 4 cores

530 ms

So, I just use par and I’m home free?

How to think parallel

Character countuse case for foldLeft

val txt: String = ...txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

6543210


txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

going left to right - not parallelizable!

A B C D E F

_ + 1


txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

going left to right – not really necessary

3210 A B C

_ + 1

3210 D E F

_ + 1

_ + _6

Character countin parallel

txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1}

Character countin parallel


3211 A B C

_ + 1

3211 A B C

: (Int, Char) => Int

Character countfold not applicable


3213 A B C

_ + _ 33

3213 A B C

! (Int, Int) => Int

Character countuse case for aggregate

txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1}, _ + _)

3211 A B C



_ + _ 33

3213 A B C

_ + 1


aggregation element

3211 A B C

_ + _ 33

3213 A B C


B

_ + 1


aggregation aggregation aggregation element

3211 A B C

_ + _ 33

3213 A B C


B

_ + 1

Word countanother use case for foldLeft

txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false)}

Word countinitial accumulation


0 words so far last character was a space

“Folding me softly.”

Word counta space



last seen character is a space

Word counta non space



last seen character was a space – a new word

Word counta non space



last seen character wasn’t a space – no new word

Word countin parallel

“softly.““Folding me “

P1 P2



wc = 2; rs = 1 wc = 1; ls = 0

P1 P2



wc = 2; rs = 1 wc = 1; ls = 0wc = 3

P1 P2

Word countmust assume arbitrary partitions

“g me softly.““Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

P1 P2

Word count must assume arbitrary partitions

“g me softly.““Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

P1 P2

wc = 3

Word countinitial aggregation

txt.par.aggregate((0, 0, 0))



# spaces on the left # spaces on the right#words



# spaces on the left # spaces on the right#words

””

Word countaggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res

“““Folding me“ “softly.“““

Word count aggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs)

“e softly.“ “Folding m“

Word count aggregation aggregation

...}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)

“ softly.““Folding me”

Word count aggregation element

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)

”_”

0 words and a space – add one more space each side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0)

” m”

0 words and a non-space – one word, no spaces on the right side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)

” me_”

nonzero words and a space – one more space on the right side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0)

” me sof”

nonzero words, last non-space and current non-space – no change


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)

” me s”

nonzero words, last space and current non-space – one more word


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countusing parallel strings?

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countstring not really parallelizable

scala> (txt: String).par


scala> (txt: String).parcollection.parallel.ParSeq[Char] = ParArray(…)



different internal representation!




ParArray




ParArray

copy string contents into an array

Conversionsgoing parallel

// `par` is efficient for...mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet}immutable.{Vector, Range}immutable.{HashMap, HashSet}


// `par` is efficient for...mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet}immutable.{Vector, Range}immutable.{HashMap, HashSet}

most other collections construct a new parallel collection!


sequential parallel

Array, ArrayBuffer, ArraySeq mutable.ParArray

mutable.HashMap mutable.ParHashMap

mutable.HashSet mutable.ParHashSet

immutable.Vector immutable.ParVector

immutable.Range immutable.ParRange

immutable.HashMap immutable.ParHashMap

immutable.HashSet immutable.ParHashSet


// `seq` is always efficientParArray(1, 2, 3).seqList(1, 2, 3, 4).seqParHashMap(1 -> 2, 3 -> 4).seq”abcd”.seq

// `par` may not be...”abcd”.par

Custom collections

Custom collection

class ParString(val str: String)

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] {

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str)

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter: Splitter[Char]

Custom collection

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter = new ParStringSplitter(0, str.length)

Custom collectionsplitter definition

class ParStringSplitter(var i: Int, len: Int)extends Splitter[Char] {

Custom collectionsplitters are iterators

class ParStringSplitter(i: Int, len: Int)extends Splitter[Char] { def hasNext = i < len def next = { val r = str.charAt(i) i += 1 r }

Custom collectionsplitters must be duplicated

... def dup = new ParStringSplitter(i, len)

Custom collectionsplitters know how many elements remain

... def dup = new ParStringSplitter(i, len) def remaining = len - i

Custom collectionsplitters can be split

... def psplit(sizes: Int*): Seq[ParStringSplitter] = { val splitted = new ArrayBuffer[ParStringSplitter] for (sz <- sizes) { val next = (i + sz) min ntl splitted += new ParStringSplitter(i, next) i = next } splitted }

Word countnow with parallel strings

new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

Word countperformance


new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)}, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)})

100 ms

cores: 1 2 4time: 137 ms 70 ms 35 ms

Hierarchy

GenTraversable

GenIterable

GenSeq

Traversable

Iterable

Seq

ParIterable

ParSeq

Hierarchy

def nonEmpty(sq: Seq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {

if (s.nonEmpty) res += s } res}

Hierarchy

def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {


Hierarchy



side-effects!ArrayBuffer is not synchronized!

Hierarchy



side-effects!ArrayBuffer is not synchronized!

ParSeq

Seq

Hierarchy

def nonEmpty(sq: GenSeq[String]) = { val res = new mutable.ArrayBuffer[String]()for (s <- sq) {

if (s.nonEmpty) res.synchronized { res += s } } res}

Accessors vs. transformerssome methods need more than just splitters

foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …

map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …




These return collections!




Sequential collections – builders




Sequential collections – buildersParallel collections – combiners

Buildersbuilding a sequential collection

1 2 3 4 5 6 7 Nil2 4 6

Nil

ListBuilder

+= += +=

result

How to build parallel?

Combinersbuilding parallel collections

trait Combiner[-Elem, +To]extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo]}



CombinerCombiner Combiner



Should be efficient – O(log n) worst case



How to implement this combine?

Parallel arrays

1, 2, 3, 4 5, 6, 7, 82, 4 6, 8 3, 1, 8, 0 2, 2, 1, 98, 0 2, 2

merge merge

mergecopy

allocate

2 4 6 8 8 0 2 2

Parallel hash tables

ParHashMap


ParHashMap0 1 2 4 5 7 8 9

e.g. calling filter



ParHashCombiner ParHashCombiner

e.g. calling filter

0 51 7 94



ParHashCombiner

0 1 4

ParHashCombiner

5 7 9



ParHashCombiner

0 1 4

ParHashCombiner

5 9

5 70 1 4

7

9


ParHashMap


How to merge?

5 70 1 4 9

5 7 8 91 40


buckets!ParHashCombiner ParHashCombiner

0 1 4 975

ParHashMap20 = 00002

1 = 00012

4 = 01002



0

1

4 9

7

5

combine



9

7

50

1

4

ParHashCombiner

no copying!


9

7

5

0

1

4

ParHashCombiner


9750 1 4

ParHashMap

Custom combinersfor methods returning custom collections

new ParString(txt).filter(_ != ‘ ‘)

What is the return type here?



creates a ParVector!



creates a ParVector!

class ParString(val str: String)extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i)...


class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...


class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...protected[this] override def newCombiner : Combiner[Char, ParString]


class ParString(val str: String)extends immutable.ParSeq[Char] with ParSeqLike[Char, ParString, WrappedString]{ def apply(i: Int) = str.charAt(i)...protected[this] override def newCombiner = new ParStringCombiner


class ParStringCombinerextends Combiner[Char, ParString] {


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0

size


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder)

size


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder)

size

chunks


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last

size

chunks


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last

size lastc

chunks


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last def +=(elem: Char) = { lastc += elem size += 1 this }


class ParStringCombinerextends Combiner[Char, ParString] { var size = 0 val chunks = ArrayBuffer(new StringBuilder) var lastc = chunks.last def +=(elem: Char) = { lastc += elem size += 1 this }

size lastc

chunks+1


... def combine[U <: Char, NewTo >: ParString] (other: Combiner[U, NewTo]) = other match { case psc: ParStringCombiner => sz += that.sz chunks ++= that.chunks lastc = chunks.last this }


... def combine[U <: Char, NewTo >: ParString] (other: Combiner[U, NewTo])

lastc

chunks

lastc

chunks


... def result = { val rsb = new StringBuilder for (sb <- chunks) rsb.append(sb) new ParString(rsb.toString) }...


... def result = ...

lastc

chunks

StringBuilder

Custom combinersfor methods expecting implicit builder factories

// only for big boys... with GenericParTemplate[T, ParColl]...

object ParColl extends ParFactory[ParColl] { implicit def canCombineFrom[T] = new GenericCanCombineFrom[T] ...

Custom combinersperformance measurement

txt.filter(_ != ‘ ‘)




106 ms




106 ms

1 core

125 ms




106 ms

1 core

125 ms2 cores

81 ms




106 ms

1 core

125 ms2 cores

81 ms4 cores

56 ms


1 core

125 ms2 cores

81 ms4 cores

56 ms

t/ms

proc

125 ms

1 2 4

81 ms56 ms


1 core

125 ms2 cores

81 ms4 cores

56 ms

t/ms

proc

125 ms

1 2 4

81 ms56 ms

def result

(not parallelized)


Custom combinerstricky!

• two-step evaluation– parallelize the result method in combiners

• efficient merge operation– binomial heaps, ropes, etc.

• concurrent data structures– non-blocking scalable insertion operation– we’re working on this

Future workcoming up

• concurrent data structures• more efficient vectors• custom task pools• user defined scheduling• parallel bulk in-place modifications

Thank you!

Examples at:git://github.com/axel22/sd.git

Scala Parallel Collections Aleksandar Prokopec EPFL.

Documents

Transcript of Scala Parallel Collections Aleksandar Prokopec EPFL.