Dynamo: Amazon’s Highly Available Key-value Store
description
Transcript of Dynamo: Amazon’s Highly Available Key-value Store
1
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-
VALUE STORE
Presenters: Pourya Aliabadi Boshra Ardallani
Paria Rakhshani
Professor : Dr Sheykh Esmaili
2
INTRODUCTION Amazon runs a world-wide e-commerce
platform that serves tens of millions customers at peak times using tens of thousands of servers located in many data centers around the world
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust
3
INTRODUCTION One of the lessons our organization has
learned from operating Amazon’s platform is that the reliability and scalability of a system is dependent on how its application state is managed
To meet the reliability and scaling needs, Amazon has developed a number of storage technologies, of which the Amazon Simple Storage Service (S3)
There are many services on Amazon’s platform that only need primary-key access to a data store
4
SYSTEM ASSUMPTIONS AND REQUIREMENTS Query Model
Operations to a data item that is uniquely identified by a key
State is stored as binary objects No operations span multiple data items Dynamo targets applications that need to store
objects that are relatively small (less than 1 MB)
5
SYSTEM ASSUMPTIONS AND REQUIREMENTS ACID Properties
ACID (Atomicity, Consistency, Isolation, Durability)
ACID is a set of properties that guarantee that database transactions are processed reliably
Dynamo targets applications that operate with weaker consistency
Dynamo does not provide any isolation guarantees and permits only single key updates
6
SYSTEM ASSUMPTIONS AND REQUIREMENTS Efficiency
The system needs to function on a commodity hardware infrastructure
Services must be able to configure Dynamo such that they consistently achieve their latency and throughput requirements.
The tradeoffs are in performance, cost efficiency, availability, and durability guarantees.
7
SYSTEM ASSUMPTIONS AND REQUIREMENTS
Dynamo is used only by Amazon’s internal services
We will discuss the scalability limitations of Dynamo and possible scalability related extensions
8
SERVICE LEVEL AGREEMENTS (SLA) To guarantee that the application can deliver its
functionality in a bounded time, each and every dependency in the platform needs to deliver its functionality with even tighter bounds
An example of a simple SLA is a service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second
For example a page request to one of the e-commerce sites typically requires the rendering engine to construct its response by sending requests to over 150 services
These services often have multiple dependencies
9Figure shows an abstract view of the architecture of Amazon’s platform
10
DESIGN CONSIDERATIONS Incremental scalability: Dynamo
should be able to scale out one storage host (henceforth, referred to as “node”) at a time, with minimal impact on both operators of the system and the system itself
Symmetry: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities
11
DESIGN CONSIDERATIONS Decentralization: An extension of symmetry,
the design should favor decentralized peer-to-peer techniques over centralized control. In the past, centralized control has resulted in outages and the goal is to avoid it as much as possible. This leads to a simpler, more scalable, and more available system.
Heterogeneity: The system needs to be able to exploit heterogeneity in the infrastructure it runs on. e.g. the work distribution must be proportional to the capabilities of the individual servers. This is essential in adding new nodes with higher capacity without having to upgrade all hosts at once.
12
SYSTEM ARCHITECTURE
The Dynamo data storage system contains items that are associated with a single key
Operations that are implemented: get( ) and put( ) get(key): locates object with key and returns
object or list of objects with a context put(key, context, object): places an object at a
replica along with the key and context Context: metadata about object
13
PARTITIONING
Provides mechanism to dynamically partition the data over the set of nodes
Use consistent hashing Similar to Chord
Each node gets an ID from the space of keys Nodes are arranged in a ring Data stored on the first node clockwise of the
current placement of the data key
14
VIRTUAL NODE (single node) -> multiple points in the ring
i.e. virtual nodes
Advantages of virtual nodes: Graceful handling of failure of a node Easy accommodation of a new node Heterogeneity in physical infrastructure can be
exploited
15
REPLICATION
Each data item replicated at N hosts N is configured per-instance Each node is responsible for the region of the
ring between it and its Nth predecessor Preference list: List of nodes responsible for
storing a particular key
16
VERSIONING
Multiple versions of an object can be present in the system at same time
Vector clock is used for version control
Vector clock size issue
17
EXECUTION OF GET() AND PUT() OPERATIONS
Operations can originate at any node in the system
Coordinator: node handing read or write operation
The coordinator contacts R nodes for reading and W nodes for writing, where R + W > N
18
HANDLING FAILURES
Temporary failures: Hinted Handoff Mechanism to ensure that the read and write
operations are not failed due to temporary node or network failures.
Handling Permanent Failures: Replica Synchronization Synchronize with another node Use Merkle Trees
19
MEMBERSHIP AND FAILURE DETECTION
Explicit mechanism available to initiate the addition and removal of nodes from a Dynamo ring
To prevent logical partitions, some Dynamo nodes play the role of seed nodes
Gossip-based distributed failure detection and membership protocol
IMPLEMENTATION
20
Storage NodeStorage Node
Request Coordination
Request Coordination
Membership & Failure DetectionMembership &
Failure DetectionLocal Persistence
EngineLocal Persistence
Engine
Pluggable Storage Engines• Berkeley Database (BDB) Transactional Data Store• BDB Java Edition• MySQL•In-memory buffer with persistent backing store•Chosen based on application’s object size distribution
Pluggable Storage Engines• Berkeley Database (BDB) Transactional Data Store• BDB Java Edition• MySQL•In-memory buffer with persistent backing store•Chosen based on application’s object size distribution
• Built on top of event-driven messaging substrate
• Coordinator executes client read & write requests
• State machines created on nodes serving requests
• Built on top of event-driven messaging substrate
• Coordinator executes client read & write requests
• State machines created on nodes serving requests
• Each state machine instance handles exactly one client request
• State machine contains entire process and failure handling logic
• Each state machine instance handles exactly one client request
• State machine contains entire process and failure handling logic
21
EXPERIENCES, RESULTS & LESSONS LEARNT
Main Dynamo Usage Patterns
1. Business logic specific reconciliation E.g. Merging different versions of a customer’s shopping cart
2. Timestamp based reconciliation E.g. Maintaining customer’s session information
3. High performance read engine E.g. Maintaining product catalog and promotional items
Client applications can tune parameters to achieve specific objectives: N: Performance {no. of hosts a data item is replicated at} R: Availability {min. no. of participating nodes in a successful read
opr} W: Durability {min. no. of participating nodes in a successful write
opr} Commonly used configuration (N,R,W) = (3,2,2)
22
EXPERIENCES, RESULTS & LESSONS LEARNT
Balancing Performance and Durability
Average & 99.9th percentile latencies of
Dynamo’s read and write operations during
a period of 30 days
Comparison of performance of 99.9th
percentile latencies for buffered vs. non-buffered
writes over 24 hours
23
CONCLUSION
Dynamo: Is a highly available and scalable data store Is used for storing state of a number of core services of
Amazon.com’s e-commerce platform Has provided desired levels of availability and performance
and has been successful in handling: Server failures Data center failures Network partitions
Is incrementally scalable Sacrifices consistency under certain failure scenarios Extensively uses object versioning Combination of decentralized techniques can be combined
to provide a single highly-available system.
24
thanks