Why functional why scala
-
Upload
neville-li -
Category
Technology
-
view
206 -
download
1
description
Transcript of Why functional why scala
Jul 2014
WHY FUNCTIONAL?WHY SCALA?
Neville Li@sinisa_lyh
MONOID!Actually it's a semigroup, monoid just sounds more interesting :)
A Little Teaser
Crunch: CombineFns are used to represent the associative operations...
PGroupedTable<K,V>::combineValues(CombineFn<K,V> combineFn, CombineFn<K,V> reduceFn)
Scalding: reduce with fn which must be associative and commutative
KeyedList[K, T]::reduce(fn: (T, T) => T)
Spark: Merge the values for each key using an associative reduce function
PairRDDFunctions[K, V]::reduceByKey(fn: (V, V) => V)
All of them work on both mapper and reducer side
0
MY STORY
Before
Mostly Python/C++ (and PHP...)No Java experience at allStarted using Scala early 2013
Now
Discovery's* Java backend/riemann guyThe Scalding/Spark/Storm guyContributor to Spark, chill, cascading.avro
* Spotify's machine learning and recommendation team
WHY THIS TALK?
Not a tutorialDiscovery's experienceWhy FP mattersWhy Scala mattersCommon misconceptions
WHAT WE ALREADY USE
KafkaScaldingSpark / MLLibStratosphereStorm / Riemann (Clojure)
WHAT WE WANT TO INVESTIGATE
Summingbird (Scala for Storm + Hadoop)Spark StreamingShark / SparkSQLGraphX (Spark)BIDMach (GPU ML with GPU)
DISCOVERY
Mid 2013: 100+ Python jobs10+ hires since (half since new year)Few with Java experience, none with ScalaAs of May 2014: ~100 Scalding jobs & 90 testsMore uncommited ad-hoc jobs12+ commiters, 4+ using Spark
DISCOVERY
rec-sys-scalding.git
DISCOVERY
GUESS HOW MANY JOBSWRITTEN BY YOURS TRUELY?
3
WHY FUNCTIONAL
Immutable dataCopy and transformNot mutate in placeHDFS with M/R jobsStorm tuples, Riemann streams
WHY FUNCTIONAL
Higher order functionsExpressions, not statementsFocus on problem solvingNot solving programming problems
WHY FUNCTIONAL
Word count in Pythonlyrics = ["We all live in Amerika", "Amerika ist wunderbar"]wc = defaultdict(int)for l in lyrics: for w in l.split(): wc[w] += 1
Screen too small for the Java version
WHY FUNCTIONAL
Map and reduce are key concepts in FPval lyrics = List("We all live in Amerika", "Amerika ist wunderbar")lyrics.flatMap(_.split(" ")) // map .groupBy(identity) // shuffle .map { case (k, g) => (k, g.size) } // reduce
(def lyrics ["We all live in Amerika" "Amerika ist wunderbar"])(->> lyrics (mapcat #(clojure.string/split % #"\s")) (group-by identity) (map (fn [[k g]] [k (count g)])))
import Control.Arrowimport Data.Listlet lyrics = ["We all live in Amerika", "Amerika ist wunderbar"]map words >>> concat >>> sort >>> group >>> map (\x -> (head x, length x)) $ lyrics
WHY FUNCTIONALLinear equation in ALS matrix factorization
= ( Y + ( + I)Y p(u)xu Y T Y T Cu )+1Y T Cu
vectors.map { case (id, vec) => (id, vec * vec.T) } // YtY .map(_._2).reduce(_ + _)
ratings.keyBy(fixedKey).join(outerProducts) // YtCuIY .map { case (_, (r, op)) => (solveKey(r), op * (r.rating * alpha)) } .reduceByKey(_ + _)
ratings.keyBy(fixedKey).join(vectors) // YtCupu .map { case (_, (r, vec)) => val Cui = r.rating * alpha + 1 val pui = if (Cui > 0.0) 1.0 else 0.0 (solveKey(r), vec * (Cui * pui)) }.reduceByKey(_ + _)
WHY SCALA
JVM - libraries and toolsPythonesque syntaxStatic typing with inferenceTransition from imperative to FP
WHY SCALA
Performance vs. agility
http://nicholassterling.wordpress.com/2012/11/16/scala-performance/
WHY SCALA
Type inferenceclass ComplexDecorationService { public List<ListenableFuture<Map<String, Metadata>>> lookupMetadata(List<String> keys) { /* ... */ }}
val data = service.lookupMetadata(keys)
type DF = List[ListenableFuture[Map[String, Track]]]def process(data: DF) = { /* ... */ }
WHY SCALA
Higher order functionsList<Integer> list = Lists.newArrayList(1, 2, 3);Lists.transform(list, new Function<Integer, Integer>() { @Override public Integer apply(Integer input) { return input + 1; }});
val list = List(1, 2, 3)list.map(_ + 1) // List(2, 3, 4)
And then imagine if you have to chain or nested functions
WHY SCALA
Collections APIval l = List(1, 2, 3, 4, 5)l.map(_ + 1) // List(2, 3, 4, 5, 6)l.filter(_ > 3) // 4 5
l.zip(List("a", "b", "c")).toMap // Map(1 -> a, 2 -> b, 3 -> c)l.partition(_ % 2 == 0) // (List(2, 4),List(1, 3, 5))List(l, l.map(_ * 2)).flatten // List(1, 2, 3, 4, 5, 2, 4, 6, 8, 10)
l.reduce(_ + _) // 15l.fold(100)(_ + _) // 115
"We all live in Amerika".split(" ").groupBy(_.size)// Map(2 -> Array(We, in), 4 -> Array(live),// 7 -> Array(Amerika), 3 -> Array(all))
WHY SCALA
Scalding field based word countTextLine(path)) .flatMap('line -> 'word) { line: String => line.split("""\W+""") } .groupBy('word) { _.size }
Scalding type-safe word countTextLine(path).read.toTypedPipe[String](Fields.ALL) .flatMap(_.split(""\W+"")) .groupBy(identity).size
Scrunch word countread(from.textFile(file)) .flatMap(_.split("""\W+""") .count
WHY SCALA
Summingbird word countsource .flatMap { line: String => line.split("""\W+""").map((_, 1)) } .sumByKey(store)
Spark word countsc.textFile(path) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .reduceByKey(_ + _)
Stratosphere word countTextFile(textInput) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .groupBy(_._1) .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }
WHY SCALA
Many patterns also common in Java
Java 8 lambdas and streamsGuava, Crunch, etc.Optional, PredicateCollection transformationsListenableFuture and transformparallelDo, DoFn, MapFn, CombineFn
COMMON MISCONCEPTIONS
It's complex
True for language featuresNot from user's perspectiveWe only use 20% featuresNot more than needed in Java
COMMON MISCONCEPTIONS
It's slow
No slower than PythonDepend on how pure FPTrade off with productivityDrop down to Java or native libraries
COMMON MISCONCEPTIONS
I don't want to learn a new language
How about flatMap, reduce, fold, etc.?Unnecessary overhead interfacing with Python or JavaYou've used monoids, monads, or higher order functions already
THE ENDTHANK YOU