SMU No SQL Talk
-
Upload
justin-weinberg -
Category
Technology
-
view
83 -
download
0
Transcript of SMU No SQL Talk
No SQL is not about SQL
No SQL is a Zoo.. Key-Value Stores
BigTableSimpleDB
Azure Table
Wide Column Stores
Document Stores Graph Databases
Why not Traditional RDBMs?Offer incredibly useful guarantees and have been battleworn and tested.
Referential Integrity
ACID Transactions
And SQL..
SQL is a powerful expressive DSL (Domain Specific Language) that many, many people understand.
So Why No SQL?
Web Scale
Web scale can be done in SQL
How?• Vertical Part / Logical Sharding
(Instagram)• Caching (28 terabytes Facebook,
2008)• SQL + No SQL• Think about your Architect
Want to learn more? Spend time on http://highscalability.com/
But a reasonable question is..
How much time should we be devoting to managing scaling problems versus adding business value to these systems?
So what are we giving up?
Availability
Consistency
Partitiontolerant
MongoDB
MySQLSQL Server
Oracle
RDBMsHBase (Hadoop)
Google BigTable
DynamoCouch Cassandra
Voldemort
Redis
SimpleDB
CAP
FriendsWhoCook.comA social network of friends who enjoy cooking great food.
- Add my Recipes - Add my friends- Show my friends- Like / Comment on my Friend’s Recipes- Search recipes of my friends, their
friends, and so on by.
Problem 1: Store Recipes
Fairly Simple Objectclass Recipe {
Image PhotoList<Comments> CommentsList<Ingredients> IngredientsList<ProfileId> LikesCategory RecipeCategory}
Becomes a complex RDBM’ess
Object-Relational Impedance Mismatch
No SQL: Document Store• Data element is a document• Documents grouped into collections• Often store in JSON• Works great with Domain Driven
Design• Schema-less
Document Store Examples• MongoDB (PC)• CouchDB (PA)• RavenDB (PA)
DEMO: MongoDB
Demo: CouchDB
Problem 2: Model the Social Graph
Friends in RDBMS
For a more sophisticated view of modeling graphs in an RDBMs:http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age
Get my Friends
Declare @ProfileID int
SELECT FirstDegreeProfile.ID, FirstDegreeProfile.FirstName, FirstDegreeProfile.LastName
FROM Profile AS FirstDegreeProfileJOIN Friendship ON FirstDegreeProfile.ID = Friendship.FriendIDWHERE Friendship.ProfileID = @ProfileID
Friends and their friends
Declare @ProfileID int Set @ProfileID = 1
Select FirstDegreeFriendship.FriendId as MyFriendId, SecondDegreeProfile.ID as
SecondDegreeId, SecondDegreeProfile.FirstName as SecondDegreeFirstName, SecondDegreeProfile.LastName as SecondDegreeLastName
from Profile as SecondDegreeProfileJoin Friendship as SecondDegreeFriendship ON SecondDegreeProfile.ID = SecondDegreeFriendship.FriendIDjoin Friendship as FirstDegreeFriendship ON SecondDegreeFriendship.ProfileID = FirstDegreeFriendship.FriendIDWhere FirstDegreeFriendship.ProfileId = @ProfileId
/* Note: A much better solution would use a recursive CTE to compute transitive closure */
Graph Databases• Optimized for graphs data• Check out Neo4J
Problem 3: Schemaless / Big Data
Facebook's Network: Credit Traud & Frost, UNC-Chapel Hill
How do we ask these questions?• After we changed the “like” button
icon for half of our users, did we get more or less likes from that sample?
• Of users who click on our ads, what pages did they spend the most time on?
• Which hidden patterns might make us competitive that we aren’t even aware of?Want to get far ahead of the pack? Read “The Lean Startup” by Eric Ries
Is this Actionable?
How about this?
Wide Column“A Bigtable is a sparse, distributed, persistent multidimensional sorted map”
Source: http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
MapReduceMap(k,v) [(k1, v1), (k2, v2), (k1, v3), (k3, v4)]Map(k, v) (list of intermediate key / value pairs)
Internal Step: Takes list of intermediate key value pairs and converts to a key / list of values.
Reduce(k, [v1, v2, v3…]) (k, n1), (k, n2)
One Down Side…• We have to have smart people write
MapReduce programs and the problems need to be expressible as Map Reduce..
• General solutions are BIG money.
Final thought: Big Data is BIG
= ?
Things to Read• Bigtable: A Distributed Storage System for
Structured Data • Dynamo: Amazon’s Highly Available Key-value Store• MapReduce: Simplified Data Processing on Large
Clusters• The Google File System• Towards Robust Distributed Systems • http://jimbojw.com/wiki/index.php?
title=Understanding_Hbase_and_BigTable
Creative Commons Acknowledgments and Thanks!
Bobwitloxrosipaw