Ruby, rails, no sql and big data
-
Upload
john-repko -
Category
Documents
-
view
129 -
download
1
description
Transcript of Ruby, rails, no sql and big data
John Repko -- Pikasoft LLC
Pictures at an ExhibitionRuby, Rails, NoSQL and Big Data
John Repko
John Repko -- Pikasoft LLC
The Goal: Exploring Big Data with NoSQL and Ruby on Rails
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak / MongoDB
– Through Elastic Mapreduce
2
Agenda
John Repko -- Pikasoft LLC
Big Data Is Not Just About “Big” Data … It’s About FAST Data!(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)
3
Source: http://www.startribune.com/sports/164830346.htmlSource: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg
So How Did We Get to Big Data Anyway?
John Repko -- Pikasoft LLC
There Are Big Data Breakthroughs Everywhere…
“Watson” Wins on Jeopardy
Google Wins the Search
Market
Progressive’s Instant
“Overnight” rate quotes
Beat the best Jeopardy players of all time
Massively parallel web searches with
results back in a tenth of a second
Progressive creates an insurance quote for
every car and truck in the US – every night
4
Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/kayjay_1_blog_main_horizontal.jpg
Why is Everyone Diving into Big Data?
John Repko -- Pikasoft LLC
Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616
These appear to be “10 Problems” but are really only “2 Problems”
Big Data frequently provides solutions to a common set of problems
5
Exploring Big Data
John Repko -- Pikasoft LLC
• Foresight– We are presented a pattern – What has the outcome
been when we’ve seen similar patterns in the past?
• Hindsight– We are presented an outcome -- What pattern of events
anticipated the outcome in the past?
The variety of Big Data wins in the press fall into just two solution patterns
You Don’t Need Dozens Of Solution Approaches For Big Data – Just Two6
Exploring Big Data
John Repko -- Pikasoft LLC
Exploring Big Data
1. Modeling True Risk• What past patterns led to success or default?
2. Customer Churn Analysis• What do customer churn patterns predict about our products and markets?
3. Recommendation Engine• We have search terms – what have the results been from similar searches in the past?
4. Ad Targeting• We have profile information – what offers have led to sales for similar profiles in the past?
5. PoS Transaction Analysis• We have your purchase history – what deals might we offer in the future?
Summary – 10 Common Hadoop-able Problems*
Foresight Hindsight
In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data
7
John Repko -- Pikasoft LLC
Exploring Big Data
Summary – 10 Common Hadoop-able Problems
6. Analyzing Data Logs to Forecast Events• We have your logs – what pattern of events have anticipated failures before?
7. Threat Analysis• We have a specific event – what results have we seen from similar threats in the past?
8. Trade Surveillance• Does this parcel raise any alarms, based on our history of past parcel-tracking?
9. Search Quality• We have a set of search terms – what have similar searches succeeded in finding in the
past?
10. Data “Sandbox”• We have your data, possibly unstructured data. What patterns in that data might we
bring to your attention now?
These two solution types apply generally to the Hadoop-able problems
Foresight Hindsight
8
John Repko -- Pikasoft LLC
The Big Data Platform Provides with Rich Analytics Tools
1. Predictive Modeling
2. Data Visualization
3. Cluster Partitioning
4. Collaborative Filtering
Key Big Data Analytics Solution Patterns
5. Outlier Analysis
6. AB Testing
7. Markov Chains
8. Bloom Filters
9
John Repko -- Pikasoft LLC
With Just Two Standard Solution Models We Can Solve Most Big Data Problems
The Key Is To Shape Big Data Into A Standard Platform Onto Which We Can Apply These
Analytics Tools…
“It is not the technology that creates a competitive edge, but the management process that exploits technology."~ Shaping the Future- Peter Keen (1991)
10
Exploring Big Data
John Repko -- Pikasoft LLC
The Goal: Exploring Big Data
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak / MongoDB
– Through Elastic Mapreduce
11
Agenda
John Repko -- Pikasoft LLC
• Clean install of 12.04 and all latest updates
• sudo apt-get update• sudo apt-get upgrade• sudo apt-get dist-upgrade
• sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion
• sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev
• bash -s stable < <(curl -shttps://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer)
• source ~/.bashrc
• gem update --system (Latest version currently installed)
• rvm ruby-1.9.2-p290@rails31 --create --default
• sudo apt-get install nodejs
• gem install rake
• gem install rails -v=3.1.3
Core Platform: Ubuntu 12.04 + AWS
12
The Core Development Platform
John Repko -- Pikasoft LLC
The Goal: Exploring Big Data
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak
– Through Elastic Mapreduce
13
Agenda
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key
-value-example-for-the-holidays.html
• Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/
• Code:– http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-valu
e-example-for-the-holidays.html
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Play with this online at:http://jkr-blog.dyndns.org:3001/mini_urls
14
The good news is, we've already got our base image, and adding a new Redis data store and example app to it only took about an hour. As before, you can play with the URL-shortener at Redis URL Shortener, and you can download and play with the code for the application at:Redis URL Shortener Source Code.
Redis
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2012/1/15/you
-only-live-twice-basho-and-riak.html
• Backing Articles: – http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and-
ripple.html
– http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in/
• Code:
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
15
Riak
John Repko -- Pikasoft LLC
The Goal: Exploring Big Data
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak / MongoDB
– Through Elastic Mapreduce
16
Agenda
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-
cloud-our-first-application.html
• Backing Articles: – http://www.mongodb.org/display/DOCS/Building+for+
Linux
• Code: – http://www.pikasoft.com/journal/2010/8/16/why-our-little-
nosql-app-matters.html
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Play with this online at:http://jkr-code.dyndns.org:3000/notes
17
So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code, we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud:
•Created a cloud account•Got our first app created, and saw it in a browser on the web•Loaded up real development environments (Ruby/Rails we added, Java we got for free)•Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything)•Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL•Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address: Rails Mongo Notes Example
Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the cloud, and if you like you can take a look at the code for it here: Rails Mongo Code.
MongoDB
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2011/2/14/casi-c
asi-cassandra.html
• Backing Articles: – http://www.25hoursaday.com/weblog/2008/05/23/
SomeThoughtsOnTwittersAvailabilityProblems.aspx
• Code:
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
18
Here's what the code for that broadcast might look like:
# Tweeter class Tweeter < ActiveRecord::Base has_many :followers end - class Follower < ActiveRecord::Base belongs_to :tweeter end
All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the messages from the other tweeters):
@tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter| @followers = tweeter.find(:all) @followers.each do |follower| tweeter.broadcast_to :recipient => follower end end end
So here we're going to do a query for each of the X tweeters, and for them we'll do another query for each of their Y followers.
Code smell! Fail Whale!!!
Cassandra
John Repko -- Pikasoft LLC
Exploring Big Data
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak / MongoDB
– Through Elastic Mapreduce
19
Agenda
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2011/1/21/graph-da
tabases-and-star-wars.html
• Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redi
s-and-rails3/
• Code
http://jkr-blog.dyndns.org:9292/Six Degrees of Kevin Bacon =
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Play with this online at:
20
Neo4J
John Repko -- Pikasoft LLC
Agenda
Exploring Big Data
Just Two Solutions – Here’s How We Get There
• Key-Value Data Stores– Redis
– Riak
• Document Data Stores– MongoDB
– Cassandra
• Graph Data Stores– Neo4J
• MapReduce– Through Hadoop
– Through Riak
– Through Elastic Mapreduce
21
John Repko -- Pikasoft LLC
• Example: – http://www.pikasoft.com/journal/2011/1/9/n
osql-next-up-hadoop-and-cloudera.html
• Backing Articles: – http://www.joelonsoftware.com/items/2006
/08/01.html
• Code: Map
Reduce
22
MapReduce via Hadoop, Thrift and AWS
John Repko -- Pikasoft LLC
• Example: – http://www.control-alt-del.org/2011/09/14/fun-
with-bloom-filters-using-riak-mapreduce/
– http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out
• Backing Articles: – MapReduce on Riak
• http://wiki.basho.com/MapReduce.html
• http://stackoverflow.com/questions/2123004/mapreduce-with-riak
• http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its-mapreduce.php
• http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-stores-HBase-Cassandra-Riak
– MapReduce on MongoDB• http://dllhell.net/2010/07/17/on-mapreduce-in-mo
ngodb/
• http://www.mongodb.org/display/DOCS/MapReduce
• http://jonathanhui.com/mongodb-mapreduce
• http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/
Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/
23
MapReduce via Riak / MongoDB
John Repko -- Pikasoft LLC
• Example: – http://www.commoncrawl.org/mapreduce-for-the-masses/
• Backing Articles: – http://www.commoncrawl.org/mapreduce-for-the-masses/
• Code:
24
Elastic MapReduce
John Repko -- Pikasoft LLC
This Is Only The Beginning. With A Standard Platform We’ll See Richer Big Data
Discoveries Become Routine
The Solution Tools (Slide 9) Become Straightforward if We Run Them on a
Standard Architecture “One man’s noise is another man’s data.”~ Bill Stensrud - InstantEncore
25
Summary
John Repko -- Pikasoft LLC
• John Repko: [email protected]
http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx
26
Contacts