Silicon Valley CloudStack User Group - Introduction to Apache CloudStack
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)
-
Upload
buildacloud -
Category
Technology
-
view
1.850 -
download
0
description
Transcript of Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)
![Page 1: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/1.jpg)
Riak and Riak CSRiak and Riak CSAndy Gross <@argv0>Andy Gross <@argv0>
Chief Architect, Basho TechnologiesChief Architect, Basho Technologies
Silicon Valley Cloud Computing GroupSilicon Valley Cloud Computing Group
April 2, 2013April 2, 2013
![Page 2: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/2.jpg)
BashoBasho120+ employees, offices in SF, MA, 120+ employees, offices in SF, MA, London, JapanLondon, Japan
Founded in 2008, open sourced Riak in Founded in 2008, open sourced Riak in 20092009
Sponsors of the Riak open source database Sponsors of the Riak open source database (Apache 2)(Apache 2)
Sell Enterprise features (multi-DC Sell Enterprise features (multi-DC replication), support, training.replication), support, training.
Riak CS (S3-compat storage) released in Riak CS (S3-compat storage) released in March 2012March 2012
![Page 3: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/3.jpg)
Now Open Source (Apache 2)Now Open Source (Apache 2)
Cloud storage software backed by RiakCloud storage software backed by Riak
S3 APIS3 API
Formerly closed-sourceFormerly closed-source
Per-tenant reportingPer-tenant reporting
Pluggable authenticationPluggable authentication
Detailed statsDetailed stats
DTrace supportDTrace support
Multi-datacenter replication (Enterprise)Multi-datacenter replication (Enterprise)
Preliminary integration with CloudStackPreliminary integration with CloudStack
![Page 4: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/4.jpg)
REDACTEDREDACTEDREDACTEDREDACTED
REDACTEDREDACTED
![Page 5: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/5.jpg)
what is a cloud what is a cloud service?service?
operationally simpleoperationally simple
horizontally scalablehorizontally scalable
globally distributedglobally distributed
highly availablehighly available
no SPOFsno SPOFs
fault tolerantfault tolerant
![Page 6: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/6.jpg)
you can’t outsource you can’t outsource these propertiesthese properties
operationally simpleoperationally simple
horizontally scalablehorizontally scalable
globally distributedglobally distributed
highly availablehighly available
no SPOFsno SPOFs
fault tolerantfault tolerant
![Page 7: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/7.jpg)
““use pacemaker” = use pacemaker” = wrong answerwrong answer
![Page 8: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/8.jpg)
““use mysql best use mysql best practices for practices for redundancy” = wrong redundancy” = wrong answeranswer
![Page 9: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/9.jpg)
““just plug it into a just plug it into a SAN” = wrong SAN” = wrong answeranswer
![Page 10: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/10.jpg)
all cloud services all cloud services need reliable, need reliable, distributed state distributed state storagestorage
![Page 11: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/11.jpg)
storage is the most storage is the most important and important and hardest parthardest part
![Page 12: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/12.jpg)
Riak CS uses RiakRiak CS uses Riak
![Page 13: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/13.jpg)
What is Riak?What is Riak?
![Page 14: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/14.jpg)
Key-Value store (plus extras)Key-Value store (plus extras)
Distributed, horizontally scalableDistributed, horizontally scalable
Eventually consistentEventually consistent
Fault-tolerantFault-tolerant
Highly-availableHighly-available
Inspired by Amazon’s DynamoInspired by Amazon’s Dynamo
![Page 15: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/15.jpg)
Simple operations - get, put, deleteSimple operations - get, put, delete
Value is mostly opaque (some metadata)Value is mostly opaque (some metadata)
ExtrasExtras
MapReduceMapReduce
Secondary IndexesSecondary Indexes
Full-text search (optional)Full-text search (optional)
Key-ValueKey-Value
![Page 16: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/16.jpg)
Distributed & Distributed & Horizontally ScalableHorizontally Scalable
Default configuration is in a clusterDefault configuration is in a cluster
Load and data are spread evenly via consistent Load and data are spread evenly via consistent hashinghashing
Scalable: Add more nodes to get more XScalable: Add more nodes to get more X
![Page 17: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/17.jpg)
Fault-TolerantFault-Tolerant
Symmetry: All nodes participate equallySymmetry: All nodes participate equally
Decentralized: no central control, no SPOFDecentralized: no central control, no SPOF
All data is replicated 3x by defaultAll data is replicated 3x by default
Cluster transparently survives...Cluster transparently survives...
node failurenode failure
network partitionsnetwork partitions
Built on Erlang/OTP (designed for FT)Built on Erlang/OTP (designed for FT)
![Page 18: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/18.jpg)
Highly-AvailableHighly-Available
Any node can serve client requestsAny node can serve client requests
Fallbacks (sloppy quorums) are used when Fallbacks (sloppy quorums) are used when nodes are downnodes are down
Always accepts write requests Always accepts write requests
Accepts read request as long as R/N nodes Accepts read request as long as R/N nodes are alive are alive
Per-request quorumsPer-request quorums
![Page 19: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/19.jpg)
Inspired by Amazon’s Inspired by Amazon’s DynamoDynamo
Masterless, peer-coordinated replicationMasterless, peer-coordinated replication
Consistent hashingConsistent hashing
Eventually consistentEventually consistent
Quorum reads and writesQuorum reads and writes
Anti-entropy: read repair, hinted handoffAnti-entropy: read repair, hinted handoff
![Page 20: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/20.jpg)
RiakNode
RiakNode
RiakNode
RiakNode
RiakNode
Large Object
Riak CS
S3API
ReportingAPI
Riak CS
S3API
ReportingAPI
Riak CS
S3API
ReportingAPI
Riak CS
S3API
ReportingAPI
Riak CS
S3API
ReportingAPI
1. user uploads an object
1 MB
2. Riak CSbreaks object
into 1 MB chunks
1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB
3. Riak CSstreams chunksto Riak nodes
4. Riak replicatesand stores
chunks
![Page 21: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/21.jpg)
PrinciplesPrinciples
Always-writable Always-writable
Incrementally scalableIncrementally scalable
SymmetricalSymmetrical
DecentralizedDecentralized
Focus on SLAs, tail latencyFocus on SLAs, tail latency
![Page 22: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/22.jpg)
TechniquesTechniques
Consistent HashingConsistent Hashing
Vector ClocksVector Clocks
Read RepairRead Repair
Anti-EntropyAnti-Entropy
Hinted HandoffHinted Handoff
Gossip ProtocolGossip Protocol
![Page 23: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/23.jpg)
Consistent HashingConsistent Hashing
Invented by Danny Lewin and others @ Invented by Danny Lewin and others @ MIT/AkamaiMIT/Akamai
Minimizes remapping of keys when number of Minimizes remapping of keys when number of hash slots changeshash slots changes
Originally applied to CDNs, used in Dynamo for Originally applied to CDNs, used in Dynamo for replica placementreplica placement
Enables incremental scalability, even spreadEnables incremental scalability, even spread
Minimizes hot spotsMinimizes hot spots
![Page 24: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/24.jpg)
![Page 25: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/25.jpg)
Vector ClocksVector Clocks
Introduced by Mattern et al, in 1988Introduced by Mattern et al, in 1988
Extends Lamport’s timestamps (1978)Extends Lamport’s timestamps (1978)
Each value in Dynamo tagged with vector clockEach value in Dynamo tagged with vector clock
Allows detection of stale values, logical siblingsAllows detection of stale values, logical siblings
![Page 26: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/26.jpg)
Read RepairRead Repair
Update stale versions opportunistically on Update stale versions opportunistically on reads (instead of writes)reads (instead of writes)
Pushes system toward consistency, after Pushes system toward consistency, after returning value to clientreturning value to client
Reflects focus on a cheap, always-available Reflects focus on a cheap, always-available write pathwrite path
![Page 27: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/27.jpg)
Hinted HandoffHinted Handoff
Any node can accept writes for other nodes if Any node can accept writes for other nodes if they’re downthey’re down
All messages include a destinationAll messages include a destination
Data accepted by node other than destination Data accepted by node other than destination is handed off when node recoversis handed off when node recovers
As long as a single node is alive the cluster can As long as a single node is alive the cluster can accept a writeaccept a write
![Page 28: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/28.jpg)
Anti-EntropyAnti-Entropy
Replicas maintain a Merkle Tree of keys and Replicas maintain a Merkle Tree of keys and their versions/hashestheir versions/hashes
Trees periodically exchanged with peer vnodesTrees periodically exchanged with peer vnodes
Merkle tree enables cheap comparisonMerkle tree enables cheap comparison
Only values with different hashes are Only values with different hashes are exchangedexchanged
Pushes system toward consistencyPushes system toward consistency
![Page 29: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/29.jpg)
Gossip ProtocolGossip Protocol
Decentralized approach to managing global Decentralized approach to managing global statestate
Trades off atomicity of state changes for a Trades off atomicity of state changes for a decentralized approachdecentralized approach
Volume of gossip can overwhelm networks Volume of gossip can overwhelm networks without carewithout care
![Page 30: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/30.jpg)
Hinted Handoff•Node fails
• Requests go to fallback
•Node comes back
• “Handoff” - data returns to recovered node
•Normal operations resume
hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)
``̀
X
X
XX
X
X
XX
`̀`
![Page 31: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/31.jpg)
Anatomy of a Request
get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)
Get Handler (FSM)Get Handler (FSM)
clientRiak
hash(“hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)”)
== 10, 11, 12== 10, 11, 12
get(“blocks/6307C89A-710A-42CD-9FFB-
2A6B39F983EA”)Coordinating node
Cluster
66 77 88 99 1010 1111 1212 1313 1414 1515 1616
The Ring
R=2R=2
v1v1 v2v2
v1v1 v2v2
v2v2
![Page 32: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/32.jpg)
v2v2v2v2
Read Repairget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)
Get Handler (FSM)Get Handler (FSM)
clientRiak
Coordinating nodeCluster
66 77 88 99 1010 1111 1212 1313 1414 1515 1616
R=2R=2 v1v1 v2v2
v2v2
v1v1
v2v2v1v1v1v1 v2v2v2v2
![Page 33: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/33.jpg)
Erlang/OTP RuntimeErlang/OTP Runtime
Riak KVRiak KV
Riak ArchitectureClient APIsClient APIs
Request CoordinationRequest Coordination
Riak CoreRiak Core
getget putput deletdeletee
map-map-reducereduce
HTTPHTTP Protocol BuffersProtocol Buffers
Erlang local clientErlang local client
membershipconsistent hashinghandoff
node-liveness
gossip
buckets
vnodesvnodes
storage backendstorage backend
JS RuntimeJS Runtime
vnode mastervnode master
![Page 34: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/34.jpg)
riak is a solid riak is a solid foundation for foundation for building cloud building cloud servicesservices
![Page 35: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/35.jpg)
Coming Soon:Coming Soon:Riak CS 1.4 (Q2)Riak CS 1.4 (Q2)
Swift APISwift API
Keystone IntegrationKeystone Integration
S3 FeaturesS3 Features
COPY ObjectCOPY Object
Object VersioningObject Versioning
Riak CS 1.5 (Q3)Riak CS 1.5 (Q3)
Server side encryptionServer side encryption
More S3 featuresMore S3 features
Enhanced CloudStack and OpenStack integrationEnhanced CloudStack and OpenStack integration
RiakRiak
![Page 36: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/36.jpg)
Coming Later (2014)Coming Later (2014)
Erasure codingErasure coding
Reduced redundancy storageReduced redundancy storage
Native indexing/searchNative indexing/search
![Page 37: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/37.jpg)
RICON East - May 13-14, RICON East - May 13-14, NYCNYC
A distributed systems conference for A distributed systems conference for developersdevelopers
Speakers from Comcast, State Farm, UC Speakers from Comcast, State Farm, UC Berkeley, Harvard, and many moreBerkeley, Harvard, and many more
Use discount code SVCloud20 for 20% off Use discount code SVCloud20 for 20% off ticketstickets
http://ricon.io/east.htmlhttp://ricon.io/east.html
![Page 38: Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)](https://reader036.fdocuments.in/reader036/viewer/2022062404/554bc09db4c90594278b5112/html5/thumbnails/38.jpg)
thanks!/questions?thanks!/questions?download riakcs: download riakcs:
http://docs.basho.com/riakcs/latest/riakcs-downloads/ hack riakcs:hack riakcs:
http://github.com/basho/riak_cs
work at basho:work at basho:http://bashojobs.theresumator.comhttp://bashojobs.theresumator.com
follow basho on twitter:follow basho on twitter: http:/twitter.com/bashohttp:/twitter.com/basho