Mongodb hackathon 02

23
MongoDB Hackathon 02 Vivek A. Ganesan [email protected] Big Data Gods Meetup, Santa Clara, CA May 18, 2013

Transcript of Mongodb hackathon 02

Page 1: Mongodb hackathon 02

MongoDB Hackathon 02

Vivek A. [email protected]

Big Data Gods Meetup, Santa Clara, CA May 18, 2013

Page 2: Mongodb hackathon 02

2

Before we start

Copyright 2013, Vivek A. Ganesan, All rights reserved

oA BIG thank you to our sponsors – Big

Data Cloud

oMeeting Space

oFood + Drinks

oConsulting/Training

Page 3: Mongodb hackathon 02

3

Agenda

Copyright 2013, Vivek A. Ganesan, All rights reserved

oReview of Hackathon 01

oData Modeling

oIndexing

oAggregation

oMap/Reduce

Page 4: Mongodb hackathon 02

4

Introduction

Copyright 2013, Vivek A. Ganesan, All rights reserved

o This is a hackathon, not a classo Which means we work on stuff together

o Please consult and help your team mates

o There will be labs (that’s when we learn!)

o Talk to your team mates

o Figure out what problem you want to solve

o Think about your data sets and how to model them in

Mongo DB

Page 5: Mongodb hackathon 02

5

Review – MongoDB Basics

Copyright 2013, Vivek A. Ganesan, All rights reserved

o MongoDB is a document-oriented NoSQL data store

o It saves data internally as Binary JSON

o A mongo data store may hold multiple databases

o A database may have multiple collections (analog of tables)

o A collection is a container of documents

o Documents contain Key/Value pairs

o A default key of “_id” is inserted by MongoDB for all documents

o User can set the value of “_id” to anything they want

o Documents are schema-free

o No fixed structure to a collection

o A collection can have documents with different key/value pairs

Page 6: Mongodb hackathon 02

6

Review – Shell and Clients

Copyright 2013, Vivek A. Ganesan, All rights reserved

o A Mongo Shell is a CLI client to MongoDB

o Shell commands are Javascript functions

o You can write your own Javascript code within the shell

o You can also import Javascript modules using load()

o Mongo Shell looks for an initialization file : ~/.mongorc.js

o Setup global variables here

o To use your favorite editor within the Mongo shell :

o Set the environment variable EDITOR to your editor

o MongoDB supports clients in several programming languages :

o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang

Page 7: Mongodb hackathon 02

7

Review – Mongo DB Objects

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Note : Mongo Shell commands are in blue and output is in green

o Mongo uses a hierarchical naming scheme for database objects

o The current database is always in the db object

o The db command prints the name of the current db

o A collection called “mycollection” in the current database :

o db.mycollection (Note : This is a mongodb object)

o Commands are methods invoked on objects

o For e.g., to insert a document to db.mycollection collection :

o db.mycollection.insert command

o For e.g., to find documents in db.mycollection collection :

o db.mycollection.find command

Page 8: Mongodb hackathon 02

8

Review – Create

Copyright 2013, Vivek A. Ganesan, All rights reserved

o First exercise :

o Create a new database called “blog”

o Create a collection called “users” and a collection called “posts”

o Solution to first exercise :

o use blog;

o db; => blog

o show collections; => system.indexes

o db.createCollection(“users”); => { “ok” => 1 }

o db.createCollection(“posts”); => { “ok” => 1 }

o show collections; => posts, system.indexes, users

Page 9: Mongodb hackathon 02

9

Review – Insert

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Second Exercise :

o In the “users” collection :

o Insert a single document, {username: “admin”}

o In the “posts” collection :

o Insert ten posts using a loop

o Blog data : post_title, post_body and post_tags as CSV

o Solution to Second Exercise :o db.users.insert({username : “admin”});

o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title: "Title",

post_body: "Post Body", post_tags: "tag1,tag2,tag3,tag4,tag5"});

}

Page 10: Mongodb hackathon 02

10

Review – Updates with modifier

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Third Exercise :

o In the “posts” collection :

o Update ten posts with an updated_at key and set it to the

current timestamp

o Solution to the Third Exercise :

o Note : MongoDB replaces the entire document for an

update call without a modifier (modifiers start with a

‘$’ symbol)

o db.posts.update({}, {$set : {updated_at: new Date()}},

false, true);

Page 11: Mongodb hackathon 02

11

Review – Selective Updates

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Fourth Exercise :

o In the “posts” collection :

o Update the posts such that the first three posts have a “foo”

tag (use the cursor functionality to iterate)

o Solution to the Fourth Exercise :

o c = db.posts.find().limit(3);

o while ( c.hasNext() ) {

o post = c.next();

o post["post_tags"] = post["post_tags"] + ",foo";

o db.posts.save(post);

o }

Page 12: Mongodb hackathon 02

12

Review – Mastering find

Copyright 2013, Vivek A. Ganesan, All rights reserved

o In a Mongo Shell,o Find all posts but extract only the post_title field

o db.posts.find({}, {post_title: 1, _id: 0});

o List all posts but in reverse order of created_on

o db.posts.find().sort({_id: -1});

o Do the same as above but paginate in sets of three

o db.posts.find().sort({_id: -1}).skip(3).limit(3);

o Find all posts that contain a tag called “foo”

o db.posts.find({post_tags: /foo/});

Page 13: Mongodb hackathon 02

13

Review – Modifiers

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Fifth Exercise :o Modify “posts” collection

o Change the post_tags field to an array instead of

a CSV list

o c = db.posts.find();

o while ( c.hasNext() ) {

o post = c.next();

o post["post_tags"] = post["post_tags"].split(",");

o db.posts.save(post);

o }

Page 14: Mongodb hackathon 02

14

Data Modeling

Copyright 2013, Vivek A. Ganesan, All rights reserved

o http://docs.mongodb.org/manual/core/data-modeling/

o When to reference?

o When it makes sense to i.e. many-to-many relationships

o When document size is a concern

o Some drivers may do this automatically

o When to embed?

o When it is “natural” for e.g. blog post and comments

o When there is a need for atomic operations

o When read performance is critical

Page 15: Mongodb hackathon 02

15

Lab 01 – Model your data set

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Break – 15 minutes

o Lab 01 – 45 minutes - With your team :

o Look at your data set and figure out how you will model it

o How would you bulk load the data?

o How would you handle errors while loading?

o Implement the schema for your data set

o Bulk load a small portion of your data set

o Verify the load and also run some sample queries

o Figure out what queries you would run frequently

Page 16: Mongodb hackathon 02

16

Indexes

Copyright 2013, Vivek A. Ganesan, All rights reserved

o http://docs.mongodb.org/manual/core/indexes/

o When to index?

o Improve find performance

o Improve sort performance

o Note : There is a performance impact for writes

o What to index?

o Depends on the query

o Usually, most frequently searched for fields

o Sometimes, fields in embedded documents as well

Page 17: Mongodb hackathon 02

17

Types of Indexes and Options

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Unique indexes (_id has an unique index by default)

o Simple

o Compound Indexes

o Prefix order is important!

o Text indexes

o Sparse Indexes

o Multi-key indexes (for arrays)

o Geospatial and Geohaystack indexes

o Indexes can be built in the background (recommended!)

o Indexes can be named explicity (definitely recommened!)

Page 18: Mongodb hackathon 02

18

Lab 02 – Indexes

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Lab 02 – 30 minutes - With your team :

o Look at the frequent queries from Lab 01 and :

o Which would you index and why?

o What kind of indexes are needed?

o Since this is predominantly a read use case, index away

o Would you use the sparse index? For what and how?

o Would you use the geospatial index? For what and how?

o Would you use the TTL index? For what and how?

Page 19: Mongodb hackathon 02

19

Aggregation

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Used for “group by”-like queries

o Aggregation Framework (introduced in 2.1)

o http://docs.mongodb.org/manual/aggregation/

o Simple count : db.posts.count();

o Using Aggregation Framework :

db.posts.aggregate([{ $group: { _id: null, count: {$sum:

1}}}]);

o Check the reference for comparison with SQL group by

o Still supports Map/Reduce (older approach and still relevant)

Page 20: Mongodb hackathon 02

20

Lab 03 – Aggregation

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Lab 03 – 30 minutes - With your team :

o Figure out what aggregations to run on the data set :

o For e.g., average rating per user?

o Or, average number of movies rated by all users?

o Write the queries for these aggregations and test them

o Are indexes helpful in aggregations? Why/Why not?

o Are you better off just doing these in your client code?

Why/Why not?

o When would you use pipelined aggregations?

Page 21: Mongodb hackathon 02

21

Map/Reduce

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Scatter/Gather framework

o db.collection.mapReduce(map_fn, red_fn, {out: output_coll})

o http://docs.mongodb.org/manual/aggregation/

o Mapper – just emits key/value pairs

o Framework – Groups and sorts mapper output => Reducer

o Reducer – Applies a function on the input => Output Coll.

o Distributed computation framework for full table scans

o http://docs.mongodb.org/manual/tutorial/map-reduce-

examples/

Page 22: Mongodb hackathon 02

22

Lab 04 – Map/Reduce

Copyright 2013, Vivek A. Ganesan, All rights reserved

o Lab 04 – 30 minutes - With your team :

o Go through the Map/Reduce examples

o Figure out what Map/Reduce functions you would use

o Implement these functions (on a small data set)

o Some things to think about :

o Can you use Map/Reduce to “seed” your

recommendations?

o Can you use incremental Map/Reduce to “update”

your recommendations? How would you do this?

Page 23: Mongodb hackathon 02

Copyright 2013, Vivek A. Ganesan, All rights reserved

23

Questions? Comments?

Thank You!

E-mail: [email protected] : onevivek