Hugfr SPARK & RIAK -20160114_hug_france

SPARK & RIAKINTRODUCTION TO THE SPARK-RIAK-CONNECTOR

LATERALTHOUGHTS

Me, Myself & I

Associate at LateralThoughts.com

Scala, Java, Python Developer

Data Engineer @ Axa & Carrefour

Apache Spark Trainer with Databricks

LATERALTHOUGHTS

And the Other One …

Director Sales @ Basho Technologies

(Basho make Riak)

Ex of MySQL France

Co-Founder MariaDB

Funny Accent

Quick Introduction …2011 Creators of Riak

Riak KV: NoSQL key value database Riak S2: Large Object Storage

2015 New Products Basho Data Platform: Integrated NoSQL databases, caching, in-memory analytics, and search

Riak TS: NoSQL Time Series database

120+ employees

Global Offices Seattle (HQ), Washington DC, London, Paris, Tokyo

300+ Enterprise customers, 1/3 of the Fortune 50

PRIORITIZED NEEDS

High Availability - Critical Data

High Scale – Heavy Reads & Writes

Geo Locality – Multiple Data Centers

Operational Simplicity – Resources

Don’t Scale as Clusters

Data Accuracy – Write Conflict Options

RIAK S2 USE CASES

Large Object Store Content Distribution

Web & Cloud Services Active Archives

RIAK KV USE CASES

User Data Session Data Profile Data

Real-time Data Log Data

RIAK TS USE CASES

IoT/Devices Financial/Economic

Scientific Observations Log Data

The Evolution of NoSQL

Unstructured Data Platforms

Multi-Model Solutions

Point Solutions

Basho Data Platform …

ABOUT SPARK & RIAK

Spark & Riak

Disclaimer, the following presentation uses :

Spark v1.5.2

Spark-Riak-Connector v1.1.0

Pre-Requisites

To use the Spark Riak Connector, as of now, you need to build it yourself :

Clone https://github.com/basho/spark-riak-connector

`git checkout v1.1.0`

`mvn clean install`

Bootstrapped project

Reading from

Connect to a Riak KV Cluster from Spark

Query it :

Full Scan

Using Keys

Using secondary indexes (2i)

Connecting to

Loading data from

riakBucket[V](bucketName: String): RiakRDD[V]

riakBucket[V](bucketName: String, bucketType: String): RiakRDD[V]

riakBucket[K, V](bucketName: String, convert: (Location, RiakObject) => (K, V)): RiakRDD[(K, V)]

On your Spark Context, you can use :

add a query, otherwise…

Find all :

Find by key(s) :

Implicits that will give you the riak* methods

Reading from

Using case classes

Using Secondary Indexes

Basic I/O

Mapping Objects - Buckets

Adding fields during save

Spark Riak Connector - RoadmapBetter Integration with Riak TS

Enhanced DataFrames - based on Riak TS Schema APIs

Server-side aggregations and grouping - using TS SQL commands

Data Locality (partition RDDs according to replication in the cluster) - launch Spark executors on the same nodes where the data resides.

Better mapping from vnodes to Spark workers using coverage plan

Better support for Riak data types (CRDT) and Search queries

Today requires using Java Riak client APIs

Spark Streaming

Provide example and sample integration with Apache Kafka

Improve reliability using Riak for checkpoints and WAL

Add examples and documentation for Python support

Thank you@ogirardot

o.girardot@lateral-thoughts.com

https://github.com/ogirardot/spark-riak-example

https://speakerdeck.com/ogirardot/spark-and-riak-introduction-to-the-spark-riak-connector

@mcarney23

michael.carney@basho.com

fr.basho.com

Hugfr SPARK & RIAK -20160114_hug_france

Technology

Transcript of Hugfr SPARK & RIAK -20160114_hug_france

Getting Started with Riak TS€¦ · Riak TS is built on the same foundation as Riak KV, known for its high availability, resilience, fault tolerance, horizontal scalability using

Cuttleﬁsh - joe devivo · riak git:(develop) ./bin/riak config describe search.solr.start_timeout Documentation for search.solr.start_timeout How long Riak will wait for Solr to

Riak & Wooga_Geeek2Geeek Meetup2014 Berlin

A TECHNICAL OVERVIEW OF RIAK TS ENTERPRISE · Riak TS is the only enterprise-ready, NoSQL database specifically optimized to store, query, and analyze time series data. Like Riak

Comparison between Dynamo and riak

stratebi.esstratebi.es/todobi/Mar20/Dataiku_Stratebi.pdf · Scikit Learn theano K Keras Torch Cascading Sqoop k6fka Spark Streaming Storm Elastic Search Riak CouchDB MongoDB Cassandra

Ember learn from Riak Control

Masterless Distributed Applications With Riak Coretim-tang.github.io/images/pdf/Masterless_Distributed_Applications_With_Riak_Core.pdfMasterless Distributed Applications With Riak

Intro to riak

Riak CS in Cloudstack

Node Riak Frank06

Breaking a riak cluster

Rolling With Riak

Riak - From Small to Large

Riak a successful failure

Introducing Riak and Ripple

Riak perf wins

Introduction to Riak - Joel Jacobson

Migrating to Riak at Shareaholic

Using Spark and Riak for IoT Apps