Ruby, rails, no sql and big data

26
John Repko -- Pikasoft LLC Pictures at an Exhibition Ruby, Rails, NoSQL and Big Data John Repko

description

 

Transcript of Ruby, rails, no sql and big data

Page 1: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Pictures at an ExhibitionRuby, Rails, NoSQL and Big Data

John Repko

Page 2: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

The Goal: Exploring Big Data with NoSQL and Ruby on Rails

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak / MongoDB

– Through Elastic Mapreduce

2

Agenda

Page 3: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Big Data Is Not Just About “Big” Data … It’s About FAST Data!(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)

3

Source: http://www.startribune.com/sports/164830346.htmlSource: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg

So How Did We Get to Big Data Anyway?

Page 4: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

There Are Big Data Breakthroughs Everywhere…

“Watson” Wins on Jeopardy

Google Wins the Search

Market

Progressive’s Instant

“Overnight” rate quotes

Beat the best Jeopardy players of all time

Massively parallel web searches with

results back in a tenth of a second

Progressive creates an insurance quote for

every car and truck in the US – every night

4

Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/kayjay_1_blog_main_horizontal.jpg

Why is Everyone Diving into Big Data?

Page 5: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616

These appear to be “10 Problems” but are really only “2 Problems”

Big Data frequently provides solutions to a common set of problems

5

Exploring Big Data

Page 6: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Foresight– We are presented a pattern – What has the outcome

been when we’ve seen similar patterns in the past?

• Hindsight– We are presented an outcome -- What pattern of events

anticipated the outcome in the past?

The variety of Big Data wins in the press fall into just two solution patterns

You Don’t Need Dozens Of Solution Approaches For Big Data – Just Two6

Exploring Big Data

Page 7: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Exploring Big Data

1. Modeling True Risk• What past patterns led to success or default?

2. Customer Churn Analysis• What do customer churn patterns predict about our products and markets?

3. Recommendation Engine• We have search terms – what have the results been from similar searches in the past?

4. Ad Targeting• We have profile information – what offers have led to sales for similar profiles in the past?

5. PoS Transaction Analysis• We have your purchase history – what deals might we offer in the future?

Summary – 10 Common Hadoop-able Problems*

Foresight Hindsight

In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data

7

Page 8: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Exploring Big Data

Summary – 10 Common Hadoop-able Problems

6. Analyzing Data Logs to Forecast Events• We have your logs – what pattern of events have anticipated failures before?

7. Threat Analysis• We have a specific event – what results have we seen from similar threats in the past?

8. Trade Surveillance• Does this parcel raise any alarms, based on our history of past parcel-tracking?

9. Search Quality• We have a set of search terms – what have similar searches succeeded in finding in the

past?

10. Data “Sandbox”• We have your data, possibly unstructured data. What patterns in that data might we

bring to your attention now?

These two solution types apply generally to the Hadoop-able problems

Foresight Hindsight

8

Page 9: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

The Big Data Platform Provides with Rich Analytics Tools

1. Predictive Modeling

2. Data Visualization

3. Cluster Partitioning

4. Collaborative Filtering

Key Big Data Analytics Solution Patterns

5. Outlier Analysis

6. AB Testing

7. Markov Chains

8. Bloom Filters

9

Page 10: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

With Just Two Standard Solution Models We Can Solve Most Big Data Problems

The Key Is To Shape Big Data Into A Standard Platform Onto Which We Can Apply These

Analytics Tools…

“It is not the technology that creates a competitive edge, but the management process that exploits technology."~ Shaping the Future- Peter Keen (1991)

10

Exploring Big Data

Page 11: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

The Goal: Exploring Big Data

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak / MongoDB

– Through Elastic Mapreduce

11

Agenda

Page 12: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Clean install of 12.04 and all latest updates

• sudo apt-get update• sudo apt-get upgrade• sudo apt-get dist-upgrade

• sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion

• sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev

• bash -s stable < <(curl -shttps://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer)

• source ~/.bashrc

• gem update --system (Latest version currently installed)

• rvm ruby-1.9.2-p290@rails31 --create --default

• sudo apt-get install nodejs

• gem install rake

• gem install rails -v=3.1.3

Core Platform: Ubuntu 12.04 + AWS

12

The Core Development Platform

Page 13: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

The Goal: Exploring Big Data

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak

– Through Elastic Mapreduce

13

Agenda

Page 14: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key

-value-example-for-the-holidays.html

• Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/

• Code:– http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-valu

e-example-for-the-holidays.html

Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Play with this online at:http://jkr-blog.dyndns.org:3001/mini_urls

14

The good news is, we've already got our base image, and adding a new Redis data store and example app to it only took about an hour. As before, you can play with the URL-shortener at Redis URL Shortener, and you can download and play with the code for the application at:Redis URL Shortener Source Code.

Redis

Page 15: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2012/1/15/you

-only-live-twice-basho-and-riak.html

• Backing Articles: – http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and-

ripple.html

– http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in/

• Code:

Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

15

Riak

Page 16: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

The Goal: Exploring Big Data

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak / MongoDB

– Through Elastic Mapreduce

16

Agenda

Page 17: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-

cloud-our-first-application.html

• Backing Articles: – http://www.mongodb.org/display/DOCS/Building+for+

Linux

• Code: – http://www.pikasoft.com/journal/2010/8/16/why-our-little-

nosql-app-matters.html

Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Play with this online at:http://jkr-code.dyndns.org:3000/notes

17

So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code, we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud:

•Created a cloud account•Got our first app created, and saw it in a browser on the web•Loaded up real development environments (Ruby/Rails we added, Java we got for free)•Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything)•Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL•Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address: Rails Mongo Notes Example

Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the cloud, and if you like you can take a look at the code for it here: Rails Mongo Code.

MongoDB

Page 18: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2011/2/14/casi-c

asi-cassandra.html

• Backing Articles: – http://www.25hoursaday.com/weblog/2008/05/23/

SomeThoughtsOnTwittersAvailabilityProblems.aspx

• Code:

Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

18

Here's what the code for that broadcast might look like:

# Tweeter class Tweeter < ActiveRecord::Base has_many :followers end - class Follower < ActiveRecord::Base belongs_to :tweeter end

All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the messages from the other tweeters):

@tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter| @followers = tweeter.find(:all) @followers.each do |follower| tweeter.broadcast_to :recipient => follower end end end

So here we're going to do a query for each of the X tweeters, and for them we'll do another query for each of their Y followers.

Code smell! Fail Whale!!!

Cassandra

Page 19: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Exploring Big Data

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak / MongoDB

– Through Elastic Mapreduce

19

Agenda

Page 20: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2011/1/21/graph-da

tabases-and-star-wars.html

• Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redi

s-and-rails3/

• Code

http://jkr-blog.dyndns.org:9292/Six Degrees of Kevin Bacon =

Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Play with this online at:

20

Neo4J

Page 21: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

Agenda

Exploring Big Data

Just Two Solutions – Here’s How We Get There

• Key-Value Data Stores– Redis

– Riak

• Document Data Stores– MongoDB

– Cassandra

• Graph Data Stores– Neo4J

• MapReduce– Through Hadoop

– Through Riak

– Through Elastic Mapreduce

21

Page 22: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.pikasoft.com/journal/2011/1/9/n

osql-next-up-hadoop-and-cloudera.html

• Backing Articles: – http://www.joelonsoftware.com/items/2006

/08/01.html

• Code: Map

Reduce

22

MapReduce via Hadoop, Thrift and AWS

Page 23: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.control-alt-del.org/2011/09/14/fun-

with-bloom-filters-using-riak-mapreduce/

– http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out

• Backing Articles: – MapReduce on Riak

• http://wiki.basho.com/MapReduce.html

• http://stackoverflow.com/questions/2123004/mapreduce-with-riak

• http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its-mapreduce.php

• http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-stores-HBase-Cassandra-Riak

– MapReduce on MongoDB• http://dllhell.net/2010/07/17/on-mapreduce-in-mo

ngodb/

• http://www.mongodb.org/display/DOCS/MapReduce

• http://jonathanhui.com/mongodb-mapreduce

• http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/

Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/

23

MapReduce via Riak / MongoDB

Page 24: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• Example: – http://www.commoncrawl.org/mapreduce-for-the-masses/

• Backing Articles: – http://www.commoncrawl.org/mapreduce-for-the-masses/

• Code:

24

Elastic MapReduce

Page 25: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

This Is Only The Beginning. With A Standard Platform We’ll See Richer Big Data

Discoveries Become Routine

The Solution Tools (Slide 9) Become Straightforward if We Run Them on a

Standard Architecture “One man’s noise is another man’s data.”~ Bill Stensrud - InstantEncore

25

Summary

Page 26: Ruby, rails, no sql and big data

John Repko -- Pikasoft LLC

• John Repko: [email protected]

http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx

26

Contacts