MongoDB @ fliptop
-
Upload
robbie-cheng -
Category
Technology
-
view
122 -
download
2
description
Transcript of MongoDB @ fliptop
MongoDB @ Fliptop
2011/12/10
Agenda
• Fliptopo infrastructure
• MongoDBo architectureo sharding strategyo data schemao index and queryo miscellaneous
What is Fliptop?
• Social profiles lookupo facebook, twitter, linkedino campaign analysiso api lookup
• Our problemso scalability
Data ~ 7 billion data
Infrastructure ~ 1MM lookup/day
Fliptop Infrastructure
• Infrastructureo Amazon EC2
• NoSQL Database
o MongoDB
• Indexing and full-text searcho Apache SOLR
• Distributed computingo AWS Elastic MapReduce (Hadoop)
Fliptop DataBases
• Fliptop Datao ~50MM records
• w/t MongoDBo MySQL
AWS RDS x1o Solr
AWS EC2 m1.large x 10• w MongoDB
o MySQL AWS RDS x1
o Solr AWS EC2 m1.large x 2 (master/slave)
o MongoDB AWS EC2 m2.large x 10 (replication set)
From Solr to MongoDB
• Our Storage Requiremento auto shardingo richness of querieso short insert latency
• Other Reasonso documentationo active communityo word of mouth
• Migration Effortso querieso db drivero performance tuning
MongoDB Features
• Auto-Shardingo scale out to 1000 nodes
• Replication & High Availabilityo master/slave and replication set
• Queryingo most SQL syntax
• Document-oriented storageo json, schema-free
• Full Index Supporto inde any field
• Map/Reduceo javascript at server side
MongoDB Servers
MongoDB Shardings
• Automatic balancing for changes in load and data distribution
• Easy addition of new machines• Scaling out to one thousand nodes• No single points of failure• Automatic failover
MongoDB Replication
• master/slaveo easy setupo manually fail-over
• replication seto bit complex setupo automatic fail-overo minimun nodes: 3 (1 abriter)o maximun nodes :12
MongoDB Failover
• Voting algorithm (replication set)o floor(all nodes/current nodes)+1
• Priorityo if 0, never becomes primary
backup with small machine
Fliptop MongoDB Infrastructures
• Data
o 10MM/replication set
• MongoDB serverso router x 1o config server x1o shards servers x 10
5 primary 5 secondary
o abriter servers x 5
• AWS EC2 Instanceso m2.large x 10
MongoDB and AWS EC2
• Instances typeo m2.xlarge
17.1 GB of memory 6.5 EC2 Compute Units
• Storageo Local Drive
faster i/o not portable
o EBS i/o = network + disk i/o portable easy backup raid 1/0
MongoDB Sharding Strategy
• Sharding Key Strategyo Ascending shard key
data locality hotspot for read/write ex. timestamp, auto-incement PK
o Random sharding key evenly distribute read/write no data locality ex. UUID, md5
o Hybrid sharding key ascending evenly distribute ex. timestamp + uuid
From timestamp to uuid
• Why timestamp?o same sharding key with our solro issues
slowness of count (traverse) query maintenance headache
add node more frequently duplication of uuids
• From timestamp to uuido performance gain with cout
2x faster ex. count 1MM, from 10s ~ 5s.
o less maintenance enable multiple nodes at the same time
o dedup uniqueness of uuid is guarantee local only
MongoDB Balancer
• if number of chunks are not evenly distributed, balancer can fix ito stop criteria
until diff between each nodes is <=2o balancer window
active time windowo blocking if moving massive data
while add brand new node
MongoDB Schema
• Document orientedo json
• Schema Freeo pros
no predefined schema is required save 'as is'
o cons overhead of headers low sensitivity of broken data
MongoDB Schema and Size
• Size matterso simple schema is better
payment:[{"publisher_id": 176, "paid":true}] payment:[176_1]
o abbreviation of headers payment:[176_1] pm:[176_1]
MongoDB Queries
1) COLUMN = VALUE2) COLUMN in RANGE3) boolean operators AND, OR, NOT4) pagination (start, rows)5) sort6) count (of query result)7) COLUMN is non-existent8) multiValued fields9) dynamic fields10) dynamic multiValued fields11) stats queries (min, max)12) faceted queries (aggregation of specific fields)13) free text search (regular expression)
MongoDB Index
• Tree structure Index• At most 64 indexes per collection(table)• A query only leverages 1 index unless using $or query• Index entails addition work on insert, delete, update
MongoDB Index Types
• Basic Indexo db.persons.ensureIndex({name:1});
• Embedded Indexo db.pesons.ensureIndex({location.city:1})
• Compound Indexo db.persins.ensureIndex({name:1, location.city:1})
• Sparse Indexo db.persons.ensureIndex({job:1}, {sparse: true})
MongoDB Index Limits
• negations operationo $ne, $noto ex. db.things.find( { x : { $ne : 3 } } );
• arithmetic operations o $modo ex. db.things.find( "this.a % 10 == 1")
• most regular expressiono yes
db.persons.find({/^robbie/}) db.persons.find({/^robbie.*/}) db.persons.find({/^robbie.*/i})
o no db.persons.find({/robbie}})
• $where
MongoDB Index Optimization
• simple data typeo ex. int is faster than string
• simple data schemao ex. {payment: "176_1"}
• sparse indexo if optional fields
MongoDB Miscellaneous
• Monitoringo CPU
if high which implies index is brokeno Driver Size
time to add new instance• Backup
o EBS: snapshoto mongo import/export tool
mongodump/mongoimport• Auto Deployment
o Hudson + fabric (python)
What's Next?
• Further Data and Index weight loseo target: 20MM/instance
• introduce Java POJO/DAOo Morphiao Spring mongodb
• Watchdog mechanismo restart server automatically
Q & A
Robbie ChengLead Software [email protected]
We're Hiring
• please mail to [email protected]
Thank you!