Why functional why scala

Jul 2014

WHY FUNCTIONAL?WHY SCALA?

Neville Li@sinisa_lyh

mailto:[email protected]

https://twitter.com/sinisa_lyh

MONOID!Actually it's a semigroup, monoid just sounds more interesting :)

A Little Teaser

Crunch: CombineFns are used to represent the associative operations...

PGroupedTable<K,V>::combineValues(CombineFn<K,V> combineFn, CombineFn<K,V> reduceFn)

Scalding: reduce with fn which must be associative and commutative

KeyedList[K, T]::reduce(fn: (T, T) => T)

Spark: Merge the values for each key using an associative reduce function

PairRDDFunctions[K, V]::reduceByKey(fn: (V, V) => V)

All of them work on both mapper and reducer side

0

MY STORY

Before

Mostly Python/C++ (and PHP...)No Java experience at allStarted using Scala early 2013

Now

Discovery's* Java backend/riemann guyThe Scalding/Spark/Storm guyContributor to Spark, chill, cascading.avro

* Spotify's machine learning and recommendation team

WHY THIS TALK?

Not a tutorialDiscovery's experienceWhy FP mattersWhy Scala mattersCommon misconceptions

WHAT WE ALREADY USE

KafkaScaldingSpark / MLLibStratosphereStorm / Riemann (Clojure)

WHAT WE WANT TO INVESTIGATE

Summingbird (Scala for Storm + Hadoop)Spark StreamingShark / SparkSQLGraphX (Spark)BIDMach (GPU ML with GPU)

DISCOVERY

Mid 2013: 100+ Python jobs10+ hires since (half since new year)Few with Java experience, none with ScalaAs of May 2014: ~100 Scalding jobs & 90 testsMore uncommited ad-hoc jobs12+ commiters, 4+ using Spark

DISCOVERY

rec-sys-scalding.git

DISCOVERY

GUESS HOW MANY JOBSWRITTEN BY YOURS TRUELY?

3

WHY FUNCTIONAL

Immutable dataCopy and transformNot mutate in placeHDFS with M/R jobsStorm tuples, Riemann streams

WHY FUNCTIONAL

Higher order functionsExpressions, not statementsFocus on problem solvingNot solving programming problems

WHY FUNCTIONAL

Word count in Pythonlyrics = ["We all live in Amerika", "Amerika ist wunderbar"]wc = defaultdict(int)for l in lyrics: for w in l.split(): wc[w] += 1

Screen too small for the Java version

WHY FUNCTIONAL

Map and reduce are key concepts in FPval lyrics = List("We all live in Amerika", "Amerika ist wunderbar")lyrics.flatMap(_.split(" ")) // map .groupBy(identity) // shuffle .map { case (k, g) => (k, g.size) } // reduce

(def lyrics ["We all live in Amerika" "Amerika ist wunderbar"])(->> lyrics (mapcat #(clojure.string/split % #"\s")) (group-by identity) (map (fn [[k g]] [k (count g)])))

import Control.Arrowimport Data.Listlet lyrics = ["We all live in Amerika", "Amerika ist wunderbar"]map words >>> concat >>> sort >>> group >>> map (\x -> (head x, length x)) $ lyrics

WHY FUNCTIONALLinear equation in ALS matrix factorization

= ( Y + ( + I)Y p(u)xu Y T Y T Cu )+1Y T Cu

vectors.map { case (id, vec) => (id, vec * vec.T) } // YtY .map(_._2).reduce(_ + _)

ratings.keyBy(fixedKey).join(outerProducts) // YtCuIY .map { case (_, (r, op)) => (solveKey(r), op * (r.rating * alpha)) } .reduceByKey(_ + _)

ratings.keyBy(fixedKey).join(vectors) // YtCupu .map { case (_, (r, vec)) => val Cui = r.rating * alpha + 1 val pui = if (Cui > 0.0) 1.0 else 0.0 (solveKey(r), vec * (Cui * pui)) }.reduceByKey(_ + _)

https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf

WHY SCALA

JVM - libraries and toolsPythonesque syntaxStatic typing with inferenceTransition from imperative to FP

WHY SCALA

Performance vs. agility

http://nicholassterling.wordpress.com/2012/11/16/scala-performance/

http://nicholassterling.wordpress.com/2012/11/16/scala-performance/

WHY SCALA

Type inferenceclass ComplexDecorationService { public List<ListenableFuture<Map<String, Metadata>>> lookupMetadata(List<String> keys) { /* ... */ }}

val data = service.lookupMetadata(keys)

type DF = List[ListenableFuture[Map[String, Track]]]def process(data: DF) = { /* ... */ }

WHY SCALA

Higher order functionsList<Integer> list = Lists.newArrayList(1, 2, 3);Lists.transform(list, new Function<Integer, Integer>() { @Override public Integer apply(Integer input) { return input + 1; }});

val list = List(1, 2, 3)list.map(_ + 1) // List(2, 3, 4)

And then imagine if you have to chain or nested functions

WHY SCALA

Collections APIval l = List(1, 2, 3, 4, 5)l.map(_ + 1) // List(2, 3, 4, 5, 6)l.filter(_ > 3) // 4 5

l.zip(List("a", "b", "c")).toMap // Map(1 -> a, 2 -> b, 3 -> c)l.partition(_ % 2 == 0) // (List(2, 4),List(1, 3, 5))List(l, l.map(_ * 2)).flatten // List(1, 2, 3, 4, 5, 2, 4, 6, 8, 10)

l.reduce(_ + _) // 15l.fold(100)(_ + _) // 115

"We all live in Amerika".split(" ").groupBy(_.size)// Map(2 -> Array(We, in), 4 -> Array(live),// 7 -> Array(Amerika), 3 -> Array(all))

WHY SCALA

Scalding field based word countTextLine(path)) .flatMap('line -> 'word) { line: String => line.split("""\W+""") } .groupBy('word) { _.size }

Scalding type-safe word countTextLine(path).read.toTypedPipe[String](Fields.ALL) .flatMap(_.split(""\W+"")) .groupBy(identity).size

Scrunch word countread(from.textFile(file)) .flatMap(_.split("""\W+""") .count

WHY SCALA

Summingbird word countsource .flatMap { line: String => line.split("""\W+""").map((_, 1)) } .sumByKey(store)

Spark word countsc.textFile(path) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .reduceByKey(_ + _)

Stratosphere word countTextFile(textInput) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .groupBy(_._1) .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }

WHY SCALA

Many patterns also common in Java

Java 8 lambdas and streamsGuava, Crunch, etc.Optional, PredicateCollection transformationsListenableFuture and transformparallelDo, DoFn, MapFn, CombineFn

COMMON MISCONCEPTIONS

It's complex

True for language featuresNot from user's perspectiveWe only use 20% featuresNot more than needed in Java


It's slow

No slower than PythonDepend on how pure FPTrade off with productivityDrop down to Java or native libraries


I don't want to learn a new language

How about flatMap, reduce, fold, etc.?Unnecessary overhead interfacing with Python or JavaYou've used monoids, monads, or higher order functions already

THE ENDTHANK YOU

Why functional why scala

Technology

Transcript of Why functional why scala