Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

69
Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin

Transcript of Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Page 1: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Scala Collections PerformanceGS.com/EngineeringMarch, 2015Craig Motlin

Page 2: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Who am I?• Engineer at Goldman Sachs• Technical lead for GS Collections

– A replacement for the Java Collections Framework

Page 3: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Goals• Scala programs ought to perform as well as

Java programs, but Scala is a little slower

Page 4: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

GoalsFrom RedBlackTree.scala/* * Forcing direct fields access using the @inline annotation * helps speed up various operations (especially * smallest/greatest and update/delete). * * Unfortunately the direct field access is not guaranteed to * work (but works on the current implementation of the Scala * compiler). * * An alternative is to implement the these classes using plain * old Java code... */

Page 5: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Goals• Scala programs ought to perform as well as

Java programs, but Scala is a little slower• Highlight a few performance problems that

matter to me• Suggest fixes, borrowing ideas and technology

from GS Collections

Page 6: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

GoalsGS Collections and Scala’s Collections are similar• Mutable and immutable interfaces with common

parent• Similar iteration patterns at hierarchy root Traversable and RichIterable

• Lazy evaluation (view, asLazy)• Parallel-lazy evaluation (par, asParallel)

Page 7: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

AgendaGS Collections and Scala’s Collections are different• Persistent data structures• Hash tables• Primitive specialization• Fork-join

Page 8: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Page 9: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Mutable collections are similar in GS

Collections and Scala – though we’ll look at some differences

• Immutable collections are quite different• Scala’s immutable collections are persistent• GS Collections’ immutable collections are not

Page 10: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresWikipedia:

In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.

Page 11: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresWikipedia:

In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.

Operations yield new updated structures when

they are “modified”

Page 12: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Typical examples: List and Sorted Set

Page 13: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Typical examples: List and Sorted Set• “Add” by constructing O(log n) new nodes

from the leaf to a new root.

• Scala sorted set binary tree• GSC immutable sorted set covered later

Page 14: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresWikipedia:

Page 15: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Typical examples: List and Sorted Set• “Prepend” by constructing a new head in O(1)

time. LIFO structure.

• Scala immutable list linked list or stack• GSC immutable list array backed

Page 16: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Extremely important in purely functional

languages • All collections are immutable• Must have good runtime complexity• No one seems to miss them in GS Collections

Page 17: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Proposal: Mutable, Persistent Immutable, and

Plain Immutable• Mutable: Same as always• Persistent: Use when you want structural

sharing• (Plain) Immutable: Use when you’re done

mutating or when the data never mutates

Page 18: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresPlain old immutable data structures• Not persistent (no structural sharing)• “Adding” is much slower: O(n)• Speed benefits for everything else• Huge memory benefits

Page 19: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Immutable array-backed list• Immutable trimmed array• No nodes 1/6 memory• Array backed cache locality, should

parallelize well

Page 20: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Measured with Java 8 64-bit and compressed oops

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

5000000

10000000

15000000

20000000

25000000

30000000

Lists: Size in MB by number of elements

com.gs.collections.api.list.ImmutableList scala.collection.immutable.List scala.collection.mutable.ArrayBuffer

Size (num elements)

Size

(MB)

Page 21: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresPerformance assumptions:• Iterating through an array should be faster

than iterating through a linked list• Linked lists won’t parallelize well with .par• No surprises in the results – so we’ll skipgithub.com/goldmansachs/gs-collections/tree/master/jmh-testsgithub.com/goldmansachs/gs-collections/tree/master/memory-tests

Page 22: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Immutable array-backed sorted set• Immutable trimmed, sorted array• No nodes ~1/8 memory• Array backed cache locality

Page 23: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

50000001000000015000000200000002500000030000000350000004000000045000000

Sorted Sets: Size in MB by number of elements

com.gs.collections.api.set.sorted.ImmutableSortedSet com.gs.collections.impl.set.sorted.mutable.TreeSortedSetscala.collection.immutable.TreeSet scala.collection.mutable.TreeSet

Size (num elements)

Size

(MB)

Page 24: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Assumptions about contains:• May be faster when it’s a binary search in an

array (good cache locality)• Will be about the same in mutable /

immutable tree (all nodes have same structure)

Page 25: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Immutable Mutable0

1

2

3

4

5

6

Sorted Set contains() throughput (higher is better)

GSC Scala

Mill

ion

ops/

s

Page 26: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Iteration

Page 27: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structuresval scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE)

def serial_immutable_scala(): Unit ={ val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError }}

Page 28: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structuresval scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE)

def serial_immutable_scala(): Unit ={ val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError }}

Lazy evaluation so we don’t create

intermediate collections

Page 29: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structuresval scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE)

def serial_immutable_scala(): Unit ={ val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError }}

Lazy evaluation so we don’t create

intermediate collections

Terminating in .toList or .toBuffer would be

dominated by collection creation

Page 30: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structuresprivate final ImmutableSortedSet<Integer> gscImmutable = SortedSets.immutable.withAll(Interval.zeroTo(SIZE));@Benchmarkpublic void serial_immutable_gsc(){ int count = this.gscImmutable .asLazy() .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .count(each -> (each + 1) % 10_000 != 0); if (count != 999_800) { throw new AssertionError(); }}

Page 31: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Assumption: Iterating over an array is

somewhat faster

Page 32: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Serial / Immutable Serial / Mutable0

2

4

6

8

10

12

14

Sorted Set Iteration throughput (higher is better)

GSC Scala

Mill

ion

ops/

s

Page 33: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data StructuresNext up, parallel-lazy evaluationthis.scalaImmutable.view this.scalaImmutable.par

this.gscImmutable.asLazy() this.gscImmutable.asParallel(this.executorService, BATCH_SIZE)

Page 34: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Assumption: Scala’s tree should parallelize

moderately well• Assumption: GSC’s array should parallelize

very well

Page 35: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Measured on an 8 core Linux VMIntel Xeon E5-2697 v28x

Serial / Immutable Serial / Mutable Parallel / Immutable0

10

20

30

40

50

60

70

80

90

100

Sorted Set Iteration throughput (higher is better)

GSC Scala

Mill

ion

ops/

s

Page 36: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Scala’s immutable.TreeSet doesn’t override .par,

so parallel is slower than serial• Some tree operations (like filter) are hard to

parallelize• TreeSet.count() should be easy to parallelize using

fork/join with some work• New assumption: Mutable trees won’t parallelize well

either

Page 37: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures

Measured on an 8 core Linux VMIntel Xeon E5-2697 v28x

Serial / Immutable Serial / Mutable Parallel / Immutable Parallel / Mutable0

10

20

30

40

50

60

70

80

90

100

Sorted Set Iteration throughput (higher is better)

GSC Scala

Mill

ion

ops/

s

Page 38: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• GS Collections follow up:

– TreeSortedSet delegates to java.util.TreeSet. Write our own tree and override .asParallel()

• Scala follow up:– Override .par

Page 39: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Persistent Data Structures• Proposal: Mutable, Persistent, and Immutable• All three are useful in different cases• How common is it to share structure, really?

Page 40: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables

Page 41: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables• Scala’s immutable.HashMap is a hash array

mapped trie (persistent structure)• Wikipedia: “achieves almost hash table-like

speed while using memory much more economically”

• Let’s see

Page 42: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables• Scala’s mutable.HashMap is backed by an array

of Entry pairs• java.util.HashMap.Entry caches the hashcode• UnifiedMap<K, V> is backed by Object[],

flattened, alternating K and V• ImmutableUnifiedMap is backed by UnifiedMap

Page 43: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

1000000020000000300000004000000050000000600000007000000080000000

Maps: Size in MB by number of elements

com.gs.collections.api.map.ImmutableMap com.gs.collections.impl.map.mutable.UnifiedMapjava.util.HashMap scala.collection.immutable.HashMapscala.collection.mutable.HashMap

Size (num elements)

Size

(MB)

Page 44: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables@Benchmarkpublic void get(){ int localSize = this.size; String[] localElements = this.elements; Map<String, String> localScalaMap = this.scalaMap;

for (int i = 0; i < localSize; i++) { if (!localScalaMap.get(localElements[i]).isDefined()) { throw new AssertionError(i); } }}

Page 45: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables

0 250000

500000

750000

1000000

1250000

1500000

1750000

2000000

2250000

2500000

2750000

3000000

3250000

3500000

3750000

4000000

4250000

4500000

4750000

5000000

5250000

5500000

5750000

6000000

6250000

6500000

6750000

7000000

7250000

7500000

7750000

8000000

8250000

8500000

8750000

9000000

9250000

9500000

9750000

10000000

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Map get() throughput by map size (higher is better)

GscImmutableMapGetTest.get GscMutableMapGetTest.get ScalaImmutableMapGetTest.get ScalaMutableMapGetTest.get

Size (num elements)

Mill

ion

ops/

s

Page 46: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

5500000

6000000

6500000

7000000

7500000

8000000

8500000

9000000

9500000

100000000

5000000

10000000

15000000

20000000

25000000

30000000

Map put() throughtput by map size (higher is better)

com.gs.collections.impl.map.mutable.UnifiedMap scala.collection.immutable.Mapscala.collection.mutable.HashMap

Size (num elements)

Mill

ions

ops

/s

Page 47: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables@Benchmarkpublic mutable.HashMap<String, String> mutableScalaPut(){ int localSize = this.size; String[] localElements = this.elements;

mutable.HashMap<String, String> map = new PresizableHashMap<>(localSize);

for (int i = 0; i < localSize; i++) { map.put(localElements[i], "dummy"); } return map;}

Page 48: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash TablesHashMap ought to have a constructor for pre-sizing

class PresizableHashMap[K, V](val _initialSize: Int) extends scala.collection.mutable.HashMap[K, V]{ private def initialCapacity = if (_initialSize == 0) 1 else smallestPowerOfTwoGreaterThan( (_initialSize.toLong * 1000 / _loadFactor).asInstanceOf[Int])

private def smallestPowerOfTwoGreaterThan(n: Int): Int = if (n > 1) Integer.highestOneBit(n - 1) << 1 else 1

table = new Array(initialCapacity) threshold = ((initialCapacity.toLong * _loadFactor) / 1000).toInt}

Page 49: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables• scala.collection.mutable.HashSet is

backed by an array using open-addressing• java.util.HashSet is implemented by

delegating to a HashMap.• UnifiedSet is backed by Object[], either

elements or arrays of collisions• ImmutableUnifiedSet is backed by UnifiedSet

Page 50: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Hash Tables

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

50000001000000015000000200000002500000030000000350000004000000045000000

Sets: Size in MB by number of elements

com.gs.collections.api.set.ImmutableSet com.gs.collections.impl.set.mutable.UnifiedSet java.util.HashSetscala.collection.immutable.HashSet scala.collection.mutable.HashSet

Size (num elements)

Size

(MB)

Page 51: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Primitive Specialization

Page 52: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Primitive Specialization• Boxing is expensive

– Reference + Header + alignment• Scala has specialization, but none of the collections

are specialized• If you cannot afford wrappers you can:

– Use primitive arrays (filter, map, etc. will auto-box)– Use a Java Collections library

Page 53: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Primitive Specialization

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Sets: Size in MB by number of elements

com.gs.collections.impl.set.mutable.primitive.IntHashSet com.gs.collections.impl.set.mutable.UnifiedSetscala.collection.mutable.HashSet

Size (num elements)

Size

(MB)

Page 54: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Primitive Specialization• Proposal: Primitive lists, sets, and maps in

Scala• Not Traversable – outside the collections

hierarchy• Fix Specialization on Functions (lambdas)

– Want for-comprehensions to work well

Page 55: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Fork/Join

Page 56: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Parallel / Lazy / Scalaval list = this.integers .par .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .filter(each => (each + 1) % 10000 != 0) .toBuffer

Assert.assertEquals(999800, list.size)

Page 57: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Parallel / Lazy / GSCMutableList<Integer> list = this.integersGSC .asParallel(this.executorService, BATCH_SIZE) .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .select(each -> (each + 1) % 10_000 != 0) .toList();

Verify.assertSize(999_800, list);

Page 58: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Parallel / Lazy / JDKList<Integer> list = this.integersJDK .parallelStream() .filter(each -> each % 10_000 != 0) .map(String::valueOf) .map(Integer::valueOf) .filter(each -> (each + 1) % 10_000 != 0) .collect(Collectors.toList());

Verify.assertSize(999_800, list);

Page 59: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Stacked computation ops/s (higher is better)

Serial Lazy Parallel Lazy Parallel hand-coded0

10

20

30

40

50

60

Java 8 GS Collections Scala8x

Page 60: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Fork-Join Merge• Intermediate results are merged in a tree• Merging is O(n log n) work and garbage

Page 61: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Fork-Join Merge• Amount of work done by last thread is O(n)

Page 62: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Parallel / Lazy / GSCMutableList<Integer> list = this.integersGSC .asParallel(this.executorService, BATCH_SIZE) .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .select(each -> (each + 1) % 10_000 != 0) .toList();

Verify.assertSize(999_800, list);

ParallelIterable.toList() returns a

CompositeFastList, a List with O(1) implementation

of addAll()

Page 63: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Parallel / Lazy / GSCpublic final class CompositeFastList<E> { private final FastList<FastList<E>> lists = FastList.newList();

public boolean addAll(Collection<? extends E> collection) { FastList<E> collectionToAdd = collection instanceof FastList ? (FastList<E>) collection : new FastList<E>(collection); this.lists.add(collectionToAdd); return true; } ...

}

Page 64: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

CompositeFastList Merge• Merging is O(1) work per batch

CFL

Page 65: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Fork/Join• Fork-join is general purpose but always requires

merge work• We can get better performance through specialized

data structures meant for combining

Page 66: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

SummaryGS Collections and Scala’s Collections are different• Persistent data structures• Hash tables• Primitive specialization• Fork-join

Page 67: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

More Information@motlin@GoldmanSachs

stackoverflow.com/questions/tagged/gs-collections

github.com/goldmansachs/gs-collections/tree/master/jmh-testsgithub.com/goldmansachs/gs-collections/tree/master/memory-testsgithub.com/goldmansachs/gs-collections-kata

infoq.com/presentations/java-streams-scala-parallel-collections

Page 68: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Learn more at GS.com/Engineering

© 2015 Goldman Sachs. This presentation should not be relied upon or considered investment advice. Goldman Sachs does not warrant or guarantee to anyone the accuracy, completeness or efficacy of this presentation, and recipients should not rely on it except at their own risk. This presentation may not be forwarded or disclosed except with this disclaimer intact.

Page 69: Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Q&A