Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and...
-
Upload
spark-summit -
Category
Data & Analytics
-
view
429 -
download
2
Transcript of Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and...
![Page 1: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/1.jpg)
Spark as the Gateway Drug To Typed Functional Programming
Jeff Smith Rohan Aletty x.ai
![Page 2: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/2.jpg)
Real World AI• Scale is increasing • Complexity is increasing • Human brain size is constant
![Page 3: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/3.jpg)
System Complexity
Data Ingest Annotation Routing
Response Generation
Annotation Services
Models
Annotation Services
Models
Annotation Services
Models
Annotation Services
Models
Annotation Services
Models
Annotation Services
Models
Annotation Services
Models
Models
Annotation Services
Knowledge Base
![Page 4: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/4.jpg)
Problem Complexity
![Page 5: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/5.jpg)
Complex Intelligence
![Page 6: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/6.jpg)
Datanauts
![Page 7: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/7.jpg)
Tools
![Page 8: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/8.jpg)
Scala• Bleeding edge • Real world
![Page 9: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/9.jpg)
Spark• Incredibly powerful • Easy to use
![Page 10: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/10.jpg)
Typed Functional Programming• Powerful abstractions • Tough learning curve
![Page 11: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/11.jpg)
Functions
![Page 12: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/12.jpg)
Methods• Collection of statements • Might have side effects • On an object
![Page 13: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/13.jpg)
Methodspublic class Dataset {
private List<Double> observations; private Double average;
public Dataset(List<Double> inputData) { observations = inputData; }
}
![Page 14: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/14.jpg)
Methodspublic class Dataset {
public double getAverage() { Double runningSum = 0.0;
for (Double observation : observations) { runningSum += observation; }
average = runningSum / observations.size();
return average; }}
![Page 15: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/15.jpg)
Methodspublic class Dataset {
public void setObservations(List<Double> inputData) { observations = inputData; }}
![Page 16: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/16.jpg)
Methodspublic class Dataset {
private List<Double> observations; private Double average;
public Dataset(List<Double> inputData) { observations = inputData; }
public double getAverage() { Double runningSum = 0.0;
for (Double observation : observations) { runningSum += observation; }
average = runningSum / observations.size();
return average; }
public void setObservations(List<Double> inputData) { observations = inputData; }}
![Page 17: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/17.jpg)
Functions• Collection of expressions • Returns a value • Are objects (in Scala) • Can be in-lined
![Page 18: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/18.jpg)
Functions in Scalaval inputData = List(1.0, 2.0, 3.0)
![Page 19: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/19.jpg)
Functions in Scaladef average(observations: List[Double]) { observations.sum / observations.size}
average(inputData)
![Page 20: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/20.jpg)
Functions in Scaladef add(x: Double, y: Double) = { x + y}
val sum = inputData.foldLeft(0.0)(add)
val average = sum / inputData.size
![Page 21: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/21.jpg)
Functions in Scalaval sum = inputData.foldLeft(0.0)(add)
val average = sum / inputData.size
inputData.foldLeft(0.0)(_ + _) / inputData.size
![Page 22: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/22.jpg)
Functions in SparkinputData.foldLeft(0.0)(_ + _) / inputData.size
val observations = sc.parallelize(inputData)
observations.fold(0.0)(_ + _) / observations.count()
![Page 23: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/23.jpg)
Immutability
![Page 24: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/24.jpg)
Mutation• Changing an object
![Page 25: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/25.jpg)
Mutationvisits = {"Church": 2, "Backus": 1, "McCarthy": 4}
old_value = visits["Backus"]
visits["Backus"] = old_value + 1
![Page 26: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/26.jpg)
Immutability• Never changing objects
![Page 27: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/27.jpg)
Immutability in Scalaval visits = Map("Church" -> 2, "Backus" -> 1, "McCarthy" -> 4)
val updatedVisits = visits.updated("Backus", 2)
![Page 28: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/28.jpg)
Immutability in Sparkval manyVisits = sc.parallelize(visits.toSeq)
val additionalVisit = sc.parallelize(Seq(("Backus", 1)))
val updatedVisits = manyVisits.union(additionalVisit) .aggregateByKey(0)(_ + _, _ + _)
![Page 29: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/29.jpg)
Recap
![Page 30: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/30.jpg)
Concepts• Higher-order functions • Anonymous functions • Purity of functions
![Page 31: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/31.jpg)
Concepts• Currying • Referential transparency • Closures • Resilient Distributed Datasets
![Page 32: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/32.jpg)
Lazy Evaluation
![Page 33: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/33.jpg)
Functional Programming — Lazy Evaluation
• Delaying evaluation of an expression until a value is needed
• Two major advantages of lazy evaluation • Deferring computation allows program only evaluate what is necessary • Changing evaluation scheme into to be more efficient
![Page 34: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/34.jpg)
Spark — Lazy Evaluation• All transformations are lazy
• Their existence added to Spark computation DAG
• Example DAGs
![Page 35: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/35.jpg)
Spark — Lazy Evaluationval rdd1 = sc.parallelize(...)
val rdd2 = rdd1.map(...)
val rdd3 = rdd1.map(...)
val rdd4 = rdd1.map(...)
rdd3.take(5)
![Page 36: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/36.jpg)
Spark — Learning Laziness• Advantage 1: (deferred computation)
• draws directly from only evaluating parts of DAG that are necessary when executing an action
• Advantage 2: (optimized evaluation scheme) • draws directly from pipelining within Spark stages to make execution
more efficient
![Page 37: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/37.jpg)
Types
![Page 38: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/38.jpg)
Functional Programming — Type Systems
• Mechanism for defining algebraic data types (ADTs) which are useful for program structure • i.e. “let’s group this data together and brand it a new type”
• Compile time guarantees of correctness of program • e.g. “no, you cannot add Foo to Bar”
![Page 39: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/39.jpg)
Spark — Types• RDD’s (typed), Datasets (typed), DataFrames (untyped)
• Types provide great schema enforcement on a dataset for preventing unexpected behavior
![Page 40: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/40.jpg)
Spark — Typescase class Person(name: String, age: Int)
val peopleDS = spark.read.json(path).as[Person]
val ageGroupedDs = peopleDS.groupBy(_.age)
![Page 41: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/41.jpg)
Spark — Learning Types• Spark through Scala also allows learning of pattern
matching • ADTs as both product types and union types
• Allows us to reason about code easier
• Gives us compile time safety
![Page 42: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/42.jpg)
Spark — Learning Typestrait Person { def name: String }
case class Student(name: String, grade: String) extends Person
case class Professional(name: String, job: String) extends Person
val personRDD: RDD[Person] = sc.parallelize(…)
// working with both union and product typesval mappedRDD: RDD[String] = personRDD.map { case Student(name, grade) => grade case Professional(name, job) => job}
![Page 43: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/43.jpg)
Spark — Learning Typesval rdd1: RDD[Person] = sc.parallelize(...)
val rdd2: RDD[String] = rdd1.map("name: " + _) // Compilation error!
val rdd3: RDD[String] = rdd2.map("name: " + _.name) // It works!
![Page 44: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/44.jpg)
Monads
![Page 45: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/45.jpg)
Functional Programming — Monads• In category theory:
• “a monad in X is just a monoid in the category of endofunctors of X”
• In functional programming, refers to a container that can: • Inject a value into the container • Perform operations on values returning a container with new values • Flatten nested containers into a single container
![Page 46: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/46.jpg)
Scala — Monads!trait Monad[M[]] { // constructs a Monad instance from the given value, e.g. List(1) def apply[T](v: T): M[T]
// effectively lets you transform values within a Monad def bind[T, U](m: M[T])(fn: (T) => M[U]): M[U]}
![Page 47: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/47.jpg)
Scala — Monads!• Many monads in Scala
• List, Set, Option, etc.
• Powerful line of thinking • Helps code comprehension • Reduces error checking logic (pattern matching!) • Can build further transformations: map(), filter(), foreach(), etc.
![Page 48: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/48.jpg)
Spark — Learning Monads?• We have many “computation builders” -- (RDD’s, Datasets,
DataFrames) • Containers on which transformations can be applied
• Similar to monads though not identical • No unit function to wrap constituent values • Cannot lift all types into flatMap function unconstrained
![Page 49: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/49.jpg)
For Later
![Page 50: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/50.jpg)
Conclusions• Spark introduces all types of devs to Scala
• Scala helps people learn typed functional programming
• Typed functional programming improves Spark development
![Page 51: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/51.jpg)
x.ai @xdotai [email protected] New York, New York
![Page 52: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/52.jpg)
Use the code ctwsparks17 for 40% off!
![Page 53: Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East talk by Jeff Smith and Rohan Aletty](https://reader031.fdocuments.in/reader031/viewer/2022022001/58ad44101a28ab8b598b5bf3/html5/thumbnails/53.jpg)
Thank You