Spark @ Virdata - BigData.be meetup 09/July/
at
BigDataBe Meetup, July 09, 2014Gerard Maas - Data Processing Team Lead
[email protected] | @maasg
Spark @ Virdata - BigData.be meetup 09/July/
@bout me
@maasg
Spark @ Virdata - BigData.be meetup 09/July/
Virdata: A cloud platform for the Internet of Things
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY
★ Elastic and Scalable cutting edge technologies★ API’s for different types of information/data consumption★ Cloud agnostic thru self build monitoring tools★ Running on both public & private cloud infrastructure★ Bi-directional messaging★ High performance brokers architecture
★ Lightweight and portable library★ Multiple programming languages★ Supports multiple transport protocols★ Available for all HW and OS★ Supports any type of data in any format/syntax★ Payload is compressed and encrypted
Spark @ Virdata - BigData.be meetup 09/July/
Scala @ Virdata
Spark @ Virdata - BigData.be meetup 09/July/
Spark @ Virdata - BigData.be meetup 09/July/ DataBricks Keynote - Spark Summit 2014
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
Spark @ Virdata - BigData.be meetup 09/July/
Spark: RDD Transformation
SAVEjoin
MAPFLATMAPGROUPFILTER...
INPUT DATA
HDFSTEXT/
Sequence File
RDD
RDD
.textFile
RDD RDD
OUTPUT
HDFSTEXT/
Sequence File
Cassandra
Spark @ Virdata - BigData.be meetup 09/July/
DStream
Spark: RDD Transformation
RDD
OUTPUT
Cassandra
Web Sockets
...
INPUT STREAM
Kafka
RDD RDD
DStream
RDD RDD RDD
GROUPFILTER ...
MAPFLATMAP...
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
Spark @ Virdata - BigData.be meetup 09/July/
HDFS
Worker
Worker
Worker
Spark @ Virdata - BigData.be meetup 09/July/
HDFS
Worker
Worker
Worker
Spark @ Virdata - BigData.be meetup 09/July/
Memory CPU’s(and don’t forget to throw some disks in the mix)
Network
Spark @ Virdata - BigData.be meetup 09/July/
Spark Deployment Options
M
Local Standalone Cluster
WW W
Using a ClusterManager
WW
spark.master=local[8] spark.master=spark://host:port spark.master=mesos://host:port
MM
D
W
D
Spark @ Virdata - BigData.be meetup 09/July/
Apache Mesos
“Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.”
http://mesos.apache.org/
Spark @ Virdata - BigData.be meetup 09/July/
Why Mesos ?
Think in terms of Resources, not Machines
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
Frameworks- Scheduler- Executor
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
Master
M
ZK
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
M
M
ZKSlaves- run tasks
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 4CPU,2GB
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 4CPU,2GB
2C, 2G
2C, 4G
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 2CPU,2GB
2C, 2G
2C, 4G
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
2C, 4G
Spark @ Virdata - BigData.be meetup 09/July/
The Mesos Paper… Where Spark Started
https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman.pdf
Spark @ Virdata - BigData.be meetup 09/July/
Marathon“Keep your services
running”
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
MD
O O
https://github.com/mesosphere/marathon
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
MD
O O
Mar
atho
n
https://github.com/mesosphere/marathon
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
M
D
O O
Mar
atho
n
https://github.com/mesosphere/marathon
Spark @ Virdata - BigData.be meetup 09/July/
Spark Job Server“Spark as a Service”
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
MD
val sc = new spark.
SparkContext(conf)
https://github.com/ooyala/spark-jobserver
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
MJob Impl
val sc = new spark.
SparkContext(conf)
Job
Serv
er
https://github.com/ooyala/spark-jobserver
object Job extends
SparkJob {
def runJob(...): Any
def validate(...):
SparkJobValidation
}
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
M
Job Impl
Job
Serv
er
https://github.com/ooyala/spark-jobserver
HTTP/jars/context/jobs
Spark @ Virdata - BigData.be meetup 09/July/
“The Datacenter as the computer”-Luis Barroso
HDFSFileSystem
MesosKernel
MarathonInit.d
Spark @ Virdata - BigData.be meetup 09/July/
What about System Monitor?
Spark @ Virdata - BigData.be meetup 09/July/
Ganglia
Spark @ Virdata - BigData.be meetup 09/July/
How we put it all together at
(Live Demo)
Spark @ Virdata - BigData.be meetup 09/July/
Questions?@virdata_iot | @maasg
Spark @ Virdata - BigData.be meetup 09/July/
Thank youvirdata.com | [email protected]
Top Related