Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing
-
Upload
kenneth-mooney -
Category
Documents
-
view
23 -
download
1
description
Transcript of Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing
Tapestry : An Infrastructure for Tapestry : An Infrastructure for Fault-tolerant Wide-area Location Fault-tolerant Wide-area Location
and Routingand Routing
Presenter: Chunyuan LiaoPresenter: Chunyuan Liao
March 6, 2002March 6, 2002
Ben Y.Zhao , John Kubiatowicz, and Anthony D,Josephetc.
Computer Science Division
University of California, Berkeley
OutlineOutline ChallengesChallenges System overviewSystem overview Operations, concerned issues & solutionsOperations, concerned issues & solutions
• RouteRoute• Locate Locate • PublishPublish
Evaluation & ConclusionEvaluation & Conclusion ImplementationImplementation Summary & CommentsSummary & Comments
• Insert Insert • DeleteDelete• MoveMove
Project backgroundProject background Driving force : Ubiquitous ComputingDriving force : Ubiquitous Computing OceanStore – A data utility infrastructureOceanStore – A data utility infrastructure Goals:Goals:
– Based on the current untrusted InfrastructureBased on the current untrusted Infrastructure– Achieve Nomadic DataAchieve Nomadic Data
Anytime, AnywhereAnytime, Anywhere Highly scalable, reliableHighly scalable, reliable
and fault-tolerantand fault-tolerant
Basic issues:Basic issues:– Data LocationData Location– RoutingRouting
ChallengesChallenges How to achieve How to achieve
naming, location and routing naming, location and routing with a with a complex & chaotic computing environment complex & chaotic computing environment
Dynamic natureDynamic nature– Mobile and replicated Data & ServicesMobile and replicated Data & Services– Complex interaction between components, even Complex interaction between components, even
in motionin motion
Traditional approachesTraditional approaches– fail to address the extreme dynamic naturefail to address the extreme dynamic nature
Tapestry : Tapestry : An infrastructure forAn infrastructure forFault-tolerant wide-area Location and RoutingFault-tolerant wide-area Location and Routing
An overlay Location & Routing infrastructureAn overlay Location & Routing infrastructure
built on the IPbuilt on the IP
FeaturesFeatures– Highly scalable Highly scalable : : Decentralized, Decentralized,
Point-2-PointPoint-2-Point
Self-OrganizingSelf-Organizing– Highly fault-tolerantHighly fault-tolerant : : Redundancy, AdaptationRedundancy, Adaptation– Good locality Good locality Content-based routing&location Content-based routing&location– Highly durable Highly durable
Basic Model of TapestryBasic Model of Tapestry Originated in Plaxton SchemeOriginated in Plaxton Scheme Basic components:Basic components:
– NodesNodesServers Servers
Routers Routers
ClientsClients
– ObjectsObjects
Data or ServicesData or Services
– LinkLink
Point-2-Point linkPoint-2-Point link
Operations in TrapestryOperations in Trapestry
NamingNaming RoutingRouting Object LocationObject Location Publishing ObjectsPublishing Objects Inserting/Deleting ObjectsInserting/Deleting Objects Mobile ObjectsMobile Objects
Tapestry - NamingTapestry - Naming Node ID/Object IDNode ID/Object ID
– A fixed length bit stringA fixed length bit string
(4 bits in each level )(4 bits in each level )
84F8, 909884F8, 9098– GlobalGlobal– Randomly generatedRandomly generated– Location-IndependentLocation-Independent– Even distributedEven distributed– Not unique ( shared by Not unique ( shared by
replicas )replicas )
Routing : RulesRouting : Rules Suffix matching ( similar to Plaxton )Suffix matching ( similar to Plaxton )
– Incrementally routing digital by digitalIncrementally routing digital by digital
– Maximum hops : logMaximum hops : logbb(N)(N)
6789
B4F8
9098
7598
4598
Msg to 4598
B437
Routing : Neighbor mapsRouting : Neighbor maps•A table with b*logb(N) entries•The i-th level neighbor share (i-1) suffix chunks•Entry( i, j ) Pointer to the neighbor “ j” + (i-1) suffix•Secondary Neighbors•Back Pointers Create bi-direction link
0642
Routing : Fault-tolerantRouting : Fault-tolerant
Detect Server/Link failureDetect Server/Link failure– TCP time out( Ping )TCP time out( Ping )– Periodic “heart beat” msg along back pointersPeriodic “heart beat” msg along back pointers
Resist faultResist fault– Secondary neighborSecondary neighbor
RecoverRecover– Probing messageProbing message– Second ChanceSecond Chance
Locating : basic procedureLocating : basic procedure
4 phrases locating4 phrases locating– Map the Object ID to a “virtual” Node IDMap the Object ID to a “virtual” Node ID– Route the request to that nodeRoute the request to that node– Arrive the surrogate or“root for the objectArrive the surrogate or“root for the object– Direct to the serverDirect to the server
Client : B4F8 Server : B346
1234
8724
F734
B234
6234 <O:1234,S:B346>
Surrogate Routing
Locating : Surrogate Routing(1)Locating : Surrogate Routing(1) Given any client at different place, how to finGiven any client at different place, how to fin
d the same “root”? d the same “root”? – PlaxtonPlaxton1.1. Find the nodes with the maximum matching suffix (Stop at the emptFind the nodes with the maximum matching suffix (Stop at the empt
y entry in neighbor map)y entry in neighbor map)2.2. Order them with the global knowledgeOrder them with the global knowledge3.3. Choose the No.1Choose the No.1
– TapestryTapestry1.1. Go further than Plaxton( choose an alternate entry )Go further than Plaxton( choose an alternate entry )2.2. Stop at a neighbor map where there is only one non-empty entry Stop at a neighbor map where there is only one non-empty entry
pointed to node Rpointed to node R3.3. R is the rootR is the root
Locating : Surrogate Routing(2)Locating : Surrogate Routing(2)
F3145
E1145
51145 <O:12345, S:B3467>
B7645
B3467
12345B3945
92145
B1145
Assumption:
1.Every node is reachable
Ensure the same “patterns”
2.Even distributed ID
Ensure less and less nodes
in mapping table
Conclusion:
1. Root can always be found
2. E. of Sur. Route is 2
PublishingPublishing
Similar to locatingSimilar to locating1.1. Server send msg and pretends to locate the objectServer send msg and pretends to locate the object
2.2. Find the surrogate node as the “root” for the Obj.Find the surrogate node as the “root” for the Obj.
3.3. Save the related info there, such as <O,S>Save the related info there, such as <O,S>
Server :B4F8
1234
8724
F734
B234
6234 <O:1234,S:B4F8>
Surrogate Routing
Locating/Publishing : Locating/Publishing : Fault-Tolerant & LocalityFault-Tolerant & Locality
Multiple “root” (better than Plaxton)Multiple “root” (better than Plaxton)– Map the Obj. ID to several “root”Map the Obj. ID to several “root”– Publish/Locate can be executed simultaneouslPublish/Locate can be executed simultaneousl
yy
Cache 2-tuple <O,S>Cache 2-tuple <O,S>– Clients can get the <O,S> on the way to the roClients can get the <O,S> on the way to the ro
otot– Intermediate notes can receive multiple <O,S> Intermediate notes can receive multiple <O,S>
for the same Obj., the nearest one is chosenfor the same Obj., the nearest one is chosen
Insert a new node: basic procedureInsert a new node: basic procedure
1.1. Get an Node IDGet an Node ID
2.2. Begin with a “Gateway node” GBegin with a “Gateway node” G
3.3. Pretends to route to itselfPretends to route to itself
4.4. Establish nearly optimal neighbor map during the “pseudo routing” by Establish nearly optimal neighbor map during the “pseudo routing” by coping & Choosing nearest ones.coping & Choosing nearest ones.
5.5. Go back and notify neighborsGo back and notify neighbors
Gateway node : B4F8
8724
F734
B234
6234
Surrogate Routing
New node : 1234
Delete a noteDelete a note
Most simple operationMost simple operation
Explicitly notify the neighbors with back pointersExplicitly notify the neighbors with back pointers
Use Soft sateUse Soft sateDon’t send “heart beat” messages and republish Don’t send “heart beat” messages and republish messages any moremessages any more
Maintain System ConsistencyMaintain System Consistency Components in a Tapestry nodeComponents in a Tapestry node
– Neighbor mapNeighbor map
– Back pointersBack pointers
– Object-Location pointersObject-Location pointers<Object, Node><Object, Node>
– Hotspot MonitorHotspot Monitor<Object, Node, Freq><Object, Node, Freq>
– Object storeObject store
Main correct statusMain correct status– Soft sate Soft sate – Proactive explicit updateProactive explicit update
Soft stateSoft state AdvantageAdvantage
– Easy to implementEasy to implement– Suited to slowly changing systemsSuited to slowly changing systems
DisadvantageDisadvantage– Tradeoff between bandwidth overhead and level of Tradeoff between bandwidth overhead and level of
consistencyconsistency– Not suited to the fast changing systemsNot suited to the fast changing systems– Example :Example :
Bytes for the republishing for a server can be 1400MB (!) Bytes for the republishing for a server can be 1400MB (!)
in a single interval.in a single interval.
Proactive explicit updateProactive explicit update( PEU )( PEU )
Proactive explicit updatesProactive explicit updates– Epoch number Epoch number
sequence # of the roundssequence # of the rounds
– Expanded 3-tuple Expanded 3-tuple <Obj. ID, Server ID, LastHopID ><Obj. ID, Server ID, LastHopID >
Soft state : backup resortSoft state : backup resort
PEU : Node MobilityPEU : Node MobilityRoot
C
D
E
F
A
*
*
B
Move Object 123 from A to B
Republishing(123,B)
Deleting (123,A) with “LostHopID”
PEU : Recover location pointers PEU : Recover location pointers
Root
E
F
C
A
Server
B
Exiting Notification
D
Reconstruction(O,S,B)
DeletingOld Data
Introspective Optimization :Introspective Optimization : AAdapting to the changing environmentdapting to the changing environment
Load balanceLoad balance1.1. Periodically Ping by refresher threadPeriodically Ping by refresher thread
2.2. Update neighbor pointersUpdate neighbor pointers
HotspotHotspot1.1. Find the source of the heavy traffic, “Hotspot”Find the source of the heavy traffic, “Hotspot”
2.2. Pub the desired data near the hotspotPub the desired data near the hotspot
EvaluationEvaluation
GainGain– Good LocalityGood Locality– Low Location latencyLow Location latency– High StabilityHigh Stability– High Fault-tolerenceHigh Fault-tolerence
CostCost– Bandwidth overhead linear to the replicasBandwidth overhead linear to the replicas
ImplementationImplementation
Packet level simulators are finished in CPacket level simulators are finished in C Used to support other applicationsUsed to support other applications
– such as OceanStoresuch as OceanStore– Bayeus, application-level multicast protocolBayeus, application-level multicast protocol
Future WorkingFuture Working– Security issuesSecurity issues– Mobile-IP like functionalityMobile-IP like functionality
SummarySummary Urgent need for new Location/Routing SchemeUrgent need for new Location/Routing Scheme
Features of TapestryFeatures of Tapestry– Location-independent namingLocation-independent naming– Integration of location and routingIntegration of location and routing– Content-based routingContent-based routing– Support for the dynamic environment Support for the dynamic environment
inserting/deleting/moving Node/Objectinserting/deleting/moving Node/Object
Comments and QuestionsComments and Questions
Paradox or discrepancy?Paradox or discrepancy?The underlying IP has bad scalability, how can Tapestry The underlying IP has bad scalability, how can Tapestry
achieve high scalability? achieve high scalability?
Just for demo!Just for demo!
What’s the relation between the IP and Tapestry?What’s the relation between the IP and Tapestry?Tapestry doesn’t intend to replace IP, it just tries to Tapestry doesn’t intend to replace IP, it just tries to
establish a higher level locating & routing infrastructure to support establish a higher level locating & routing infrastructure to support the content-based operation.the content-based operation.
How can we achieve the same goal without IP?How can we achieve the same goal without IP?