Building a Directed Graph with MongoDB
-
Upload
tony-tam -
Category
Technology
-
view
27.734 -
download
6
description
Transcript of Building a Directed Graph with MongoDB
![Page 1: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/1.jpg)
BUILDING A DIRECTED GRAPH WITH MONGODB
MongoSF 5/24/2011
By Tony Tam @fehguy
![Page 2: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/2.jpg)
WHO IS WORDNIK
Word + Meaning Discovery EngineClustered Application built with:
Scala/Java/JettyOnly way in is via REST
19M API calls/day @ 7ms/query averagePhysical servers
72GB RAM, 8 core4.3TB DAS
We’re MongoDB users for ~1.5 yrsUsed in master/slave14B documents in MongoDB
![Page 3: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/3.jpg)
WHY A GRAPH FOR WORDS
Technique to model network relationshipsProperties are dynamicLinks are “arbitrary”
Runtime performanceAnswers in < 5ms/request
Routing functions based on goals“find most likely word for X”“find more common form of Y”
![Page 4: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/4.jpg)
WHY A GRAPH FOR WORDS
Misspellings, abbreviations, texting, Twitter
![Page 5: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/5.jpg)
MORE ABOUT GRAPHS
Different types of GraphsDecisions have huge impact on design +
implementationNodes (vertices)
String and numeric propertiesEdges (links)
Finite set of labeled edge types (~30)Multiple target nodes per edge
Each potentially different weightDirected, non-symmetrical
![Page 6: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/6.jpg)
WHY BUILD ON MONGODB?
Word Graph is core to WordnikMany ways to build a graph
Dedicated graph DBsRelational DBs
MongoDB Document StorageUber-flexibleSuccessfully routes in < 5msLong runway for scale-outLimit storage infrastructure componentsEasy to implement
![Page 7: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/7.jpg)
WORDNIK GRAPH DATA MODEL
Nodes_id field holds name, object type
Index at no extra costArbitrary number of properties
Only two datatypes for us, String, DoubleNode type info in node ID (_id)
na_corpusCount => Double sa_source => String
![Page 8: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/8.jpg)
WORDNIK GRAPH DATA MODEL
EdgesDestination(s)
WeightLink Properties
Stored in Mongo ArraysArray size is app limited
Use $push, $pop
![Page 9: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/9.jpg)
ACCESS TO MONGO
Mongo Access via DAO layerLimit queries to ones that work “well”
ALL queries use indexFind Node “cat” of type “word”:
db.node.findOne({_id:"cat|word"})Find Edge types for above:
db.edge.find({_id:/^cat\|word\|/},{_id:1})
Serialization/deserialization Done “the old-fashioned way” BasicDBObject, BasicDBList faster than mappers for
our use case
![Page 10: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/10.jpg)
QUERY EFFICIENCY
Max execution time is f (ahops)
![Page 11: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/11.jpg)
ROUTING, TRAVERSALS, FUNCTIONS
Typically find path from A to BRoutes have costs
Low cost or high probabilityOur use case is atypical
LinkedIn vs. MapsNot from A to B
More like “from A with 3 hops”This matters!
![Page 12: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/12.jpg)
PERFORMANCE + SCALING
![Page 13: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/13.jpg)
PERFORMANCE + SCALING
Query by index onlyUse regex syntax in restricted fashion
Starts with onlyNo look behindCase sensitive
Boring? Fast?Sharding is a no-brainer
What about ObjectId()?
![Page 14: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/14.jpg)
PERFORMANCE + SCALING
Horizontal? Vertical? Both? And when?Separate collections by edge type/object type
Increases storage needs Collections all have padding, 30 collections => ~30x padding
ShardingUse slick, built-in Mongo shardingRoll your own based on your data
What does Wordnik do?Neither! (yet)30M Nodes, 50M Edges
One collection for nodesOne collection for edges
![Page 15: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/15.jpg)
PERFORMANCE + SCALING
Selecting a shard keyDone in application logic based on OUR dataDepends on what you need
![Page 16: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/16.jpg)
END RESULT
Solves Wordnik Graph infrastructure needsStore Word nodes with UGC, corpus,
structured, analytical dataBatch fetch Edges @ > 50k/secondFind Edge + endpoints in 80mS
Powers our…Word Selection
CanonicalizationMisspelling“Did you mean” logic
Classification + Matching Engine
![Page 17: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/17.jpg)
EXAMPLES
Misspellings
Abbreviations
Lemmatization
![Page 18: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/18.jpg)
EXAMPLES
Term normalizationFind similar words
Meaning normalizationFind “more common” form
![Page 19: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/19.jpg)
EXAMPLES
Applied Word GraphRecall:
“Computers are stupid”English is complex
Clustering + classification algorithms:Stink without consistent data
“The” => “the” (duh) “geese” => “goose” (ok)
Stink when they’re slow
Graph + Clustering/ClassificationJust add data
![Page 20: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/20.jpg)
MONGODB MAKES A GREAT GRAPH BACK-END
See more about Wordnik APIs:
http://developer.wordnik.com Further Reading
Migrating from MySQL to MongoDBhttp://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik
Maintaining your MongoDB Installationhttp://www.slideshare.net/fehguy/mongo-sv-tony-tam
Source CodeMapping Benchmark
https://github.com/fehguy/mongodb-benchmark-tools
Wordnik OSS Tools https://github.com/wordnik/wordnik-oss
![Page 21: Building a Directed Graph with MongoDB](https://reader033.fdocuments.in/reader033/viewer/2022061523/554f90a9b4c905d25b8b51b6/html5/thumbnails/21.jpg)
MONGODB MAKES A GREAT GRAPH BACK-END
Questions?