Compare DynamoDB vs. MongoDB
Transcript of Compare DynamoDB vs. MongoDB
Compare – DynamoDb vs. MongoDB
Higher Ed
2
Requirements
Unstructured data storage ACID compliance not necessary Fast read/write Ability to index data and search Full text search (?) Java/Spring support JavaScript support REST API Community support Scaling up and maintenance
3
Shard – when database grows large
Horizontal partitioning of database where rows are held in separate database servers
Compare that to normalization or vertical partitioning where data is split into columns
Advantages• Reduces index size in each table in each database (performance +)• Load can be spread out over multiple machines (performance ++)
Disadvantages• Increased reliance on interconnected servers• Query latency when more than one shard is searched• Issues with consistency and durability
4
DynamoDb Internals
Key/Value Pair• Uses JSON only as a transport protocol• Data is not being stored "on-disk" in the JSON data format• Applications that use DynamoDB must either implement their own
JSON parsing or use a library like one of the AWS SDKs to do this parsing for them.
Data Types• Scalar – string, number and binary (BLOB and CLOB)• Multivalued – string set, number set and binary set
5
DynamoDb Internals
Data Model• Table – no fixed schema (columns, datatype etc)
Needs a fixed primary key, its data type and secondary index (if necessary)
Limit to 256 tables per region per account• Items - individual records in a table
Limited to 400 kb• Attributes• Support one-to-one, one-to-many and many-to-many relationship
6
DynamoDb Internals
Keys - need to create at the table creation time• Primary Keys – Hash, Hash and Range keys• Local Secondary Keys – can access only single partition
Limit – 5 indexes per table/ 20 attributes max• Global Secondary Keys – can access any partition
Limit – 5 indexes per table Creating a secondary index, you define the alternate key for the index, along with
any other attributes that you want to be projected in the index. DynamoDB copies these attributes into the index, along with the primary key attributes from the table
Add, update, delete action on table is automatically reflected on the index
7
DynamoDb Internals
Throughput• A read capacity unit size is 4 kb• A write capacity unit size is 1 kb• To read an item of 5kb the # of read capacity unit required = 2• These units are defined while creating a table• AWS sends alerts when these limits are exceeded• AWS also throttles further request beyond the capacity defined
8
DynamoDb Operations
Table level – create, update, delete, list, describe Item/attribute level – add, update, delete Query – query a table with hash key and range key. Result limits to 1 MB Scan – reads all items from a table. Slower than query Parallel scan is also available to makes things faster Supports pagination
9
DynamoDb Features
Fully Managed NoSql database service – handles scaling, partitioning, upgrades Durable – automatically replicates to different availability zones Scalable – automatically distributes data to multiple server as size grows Fast – on EC2 instance single digit millisecond latency for item size of 1kb
• 5 ms for read, 10 ms for write Simple Administration – Amazon Web Console Fault Tolerant – automatically replicates data Flexible – each item in a table can have different number of attributes Indexing – primary key of each item. Global and local secondary indexes allow
user to query non-primary key attributes Secure – authentication, use of latest cryptographic technique, ability to integrate
with IAM (AWS Identity and access management)
10
DynamoDb Features
Could be Cost-Effective – per 1kb item, $0.01/hour for every 10 writes/sec• $0.01/hour for every 50 strongly consistent read/sec• $0.28 per million writes• $0.056 per million strongly consistent reads• $1.00 per GB/month for indexed storage
SDK – AWS SDK for Java/.NET/PHP etc. • Supports all table operations, query and scans
Service Oriented Architecture – Rest support – simple API, only 12 operations• Data transfers as simple GET/POST/DELETE
Large items can be stored in S3 buckets, thereby reducing cost Monitoring – AWS management console, Cloudwatch, Command line tool
11
DynamoDb Features
Can be integrated with RedShift – a data warehousing tool DynamoDb Local - small client-side database and server that mimics the
DynamoDB service. Available as a .jar file
12
MongoDb Internals (derived from humongous)
Document Oriented database• Data is stored in BSON format (Binary JSON)• Supports up-to 100 levels of nesting
Data Types – BSON• String, Integer, Boolean, Double, Arrays, Date*, Timestamp, Binary
*, Null• Min/Max keys – compare against lowest and highest BSON elements• Object – embedded documents• ObjectId* – store document’s ID• Regular Expression *• JavaScript code *• Symbol – reserved for languages that use specific symbol type
* Indicates non-JSON types
13
MongoDb Internals (derived from humongous)
Data Model• Collections – documents that share similar structure• Document – similar to rows in RDBMS
Maximum BSON document size is 16 MB• Field – similar to columns in RDBMS
14
MongoDb Query
Query • Key/value – key can be any field in the document, including the
primary key• Range – greater than, less than or equal to, between• Geospatial – proximity criteria, intersection and inclusion• Text search – result shows relevance order• Aggregation – count, min, max, average etc• Map reduce
Covered Queries – queries that return only indexed fields Query Optimization – MongoDB performs automatic optimization
When necessary developer can utilize more indexes through index intersection
15
MongoDb Index
Index• Unique• Compound• ArrayTime-to-live (TTL)• Geospatial• Sparse• Text search
Size of index entry must be less than 1024 bytes A single collection can have no more than 64 indexes
16
MongoDb – Sample Query
Return states with populatin above 10 millionsdb.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } }] )
17
MongoDb Features
Mongo Shell – JavaScript shell that supports nearly all MongoDB commands Auto Shard – automatically balances data in the cluster Automatic Replica Failover Query Router - queries that don’t use the shard key, the query router broadcasts the
query to all shards and aggregate and sort the results ACID compliant at the document level Security - MongoDB Enterprise Advance provides extensive support authentication,
authorization, auditing and encryption MondbOps manager – deploy, upgrade (no downtime), monitor, backup and scale
MongoDB instances.• Hosted MongoDB Management Service also provides many of these
capabilities Provides in-memory caching
18
MongoDb Features
Large community support, 4th largest database in use right after SQL databases Spring Data Project for MongoDB Pluggable storage engine
• For low latency high performance – WiredTiger or in-memory• Analytical process – HDFS storage engine• Replica set automatically migrates independent of storage format –
no complex ETL Both Java and JavaScript API are available and documented MongoDB University provides free education
• https://university.mongodb.com/ Third-party hosted support exists for MongoDB with various price plans
• https://mongolab.com/• http://mongodirector.com/
19
References
http://aws.amazon.com/dynamodb/ http://www.mongodb.org/ http://docs.aws.amazon.com/amazondynamodb/latest/developerguide http://db-engines.com/en/system/Amazon+DynamoDB%3BMongoDB – little old http://blog.cloudthat.in/5-reasons-why-dynamodb-is-better-than-mongodb/ http://www.masonzhang.com/2013/08/7-reasons-you-should-use-mongodb-over.html http://www.mongodb.com/presentations/automate-mongodb-mongodb-management-service-0 http://www.mongodb.com/presentations/webinar-enterprise-architects-view-mongodb-0