TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George...
-
Upload
david-braughton -
Category
Documents
-
view
220 -
download
0
Transcript of TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George...
![Page 1: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/1.jpg)
TAOFacebook’s Distributed Data Store for the Social Graph
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani
Presented at USENIX ATC – June 26, 2013
![Page 2: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/2.jpg)
FRIE
ND
The Social Graph
COMMENT
POST
USER
USER
PHOTO
LOCATION
USER Carol
USERUSERUSER
EXIF_INFO
GPS_DATA
AT
PHOTO
EXI
F
COM
MEN
T
CHECKI
NLIKE
LIKELIKELIKE
AUTH
OR
AUTHOR
FRIEND
(hypotheticalencoding)
![Page 3: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/3.jpg)
Dynamically Rendering the Graph
FRIE
ND
COMMENT
POST
USER
USER
PHOTO
LOCATION
USER Carol
USERUSERUSER
EXIF_INFO
GPS_DATA
APP
iPhoto
AT
PHOTO
EXI
F
UPL
OA
D_
FRO
M
COM
MEN
T
CHECKIN
LIKE
LIKELIKELIKE
AUTH
OR
AUTHOR
FRIEND
Web S
erv
er
(PH
P)
![Page 4: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/4.jpg)
TAO
Dynamically Rendering the Graph
FRIE
ND
COMMENT
POST
USER
USER
PHOTO
LOCATION
USER
Carol
USERUSERUSER
EXIF_INFO
GPS_DATA
APP
iPhoto
AT
PHOTO
EXI
F
UPL
OA
D_
FRO
M
COM
MEN
T
CHECKIN
LIKE
LIKELIKELIKE
AUTH
OR
AUTHOR
FRIEND
Web S
erv
er
(PH
P)
• 1 billion queries/second• many petabytes of data
![Page 5: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/5.jpg)
What Are TAO’s Goals/Challenges?
▪Efficiency at scale
![Page 6: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/6.jpg)
Dynamic Resolution of Data Dependencies
COMMENT
POST
USER
USER
PHOTOLOCATION
USER
Carol
APP
iPhoto
AUTHOR
LIKED_BY
UPLO
AD
_FR
OM
ATT
AC
H
AU
TH
OR
COMMENTCHECKED_IN 1
2
3
![Page 7: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/7.jpg)
What Are TAO’s Goals/Challenges?
▪Efficiency at scale
▪Low read latency
▪Timeliness of writes
▪High Read Availability
![Page 8: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/8.jpg)
Graph in Memcache
FRIE
ND
COMMENT
POST
USER
USER
PHOTO
LOCATION
USER
Carol
USERUSERUSER
EXIF_INFO
GPS_DATA
APP
iPhoto
AT
PHOTO
EXI
F
UPL
OA
D_
FRO
M
COM
MEN
T
CHECKIN
LIKE
LIKELIKELIKE
AUTH
OR
AUTHOR
FRIEND
Web S
erv
er
(PH
P)
Obj &
Ass
oc
API
memcache(nodes, edges, edge
lists)
mysql
![Page 9: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/9.jpg)
▪ Identified by unique 64-bit IDs
▪ Typed, with a schema for fields
▪ Identified by <id1, type, id2>
▪ Bidirectional associations are two edges, same or different type
Objects = Nodes
id: 308 => type: USER name: “Alice”
id: 2003 => type: COMMENT str: “how was it …
id: 1807 => type: POST str: “At the summ…
<1807,COMMENT,2003> time: 1,371,704,655
<308,AUTHORED,2003
>
time: 1,371,707,355
<2003,AUTHOR,308>
time: 1,371,707,355
Associations = Edges
![Page 10: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/10.jpg)
▪ <id1, type, *>
▪ Descending order by time
▪ Query sublist by position or time
▪ Query size of entire list
Association Lists
id: 2003 => type: COMMENT str: “how was it, was it w…
id: 1807 => type: POST str: “At the summ…
<1807,COMMENT,2003>
time: 1,371,707,355
id: 8332 => type: COMMENT str: “The rock is flawless, …
id: 4141 => type: COMMENT str: “Been wanting to do …
newer
older
<1807,COMMENT,8332>
time: 1,371,708,678
<1807,COMMENT,4141>
time: 1,371,709,009
![Page 11: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/11.jpg)
Objects and Associations API
Reads – 99.8%
▪ Point queries
▪ obj_get 28.9%
▪ assoc_get 15.7%
▪ Range queries
▪ assoc_range 40.9%
▪ assoc_time_range 2.8%
▪ Count queries
▪ assoc_count 11.7%
Writes – 0.2%
▪ Create, update, delete for objects
▪ obj_add 16.5%
▪ obj_update 20.7%
▪ obj_del 2.0%
▪ Set and delete for associations
▪ assoc_add 52.5%
▪ assoc_del 8.3%
![Page 12: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/12.jpg)
What Are TAO’s Goals/Challenges?
▪Efficiency at scale
▪Low read latency
▪Timeliness of writes
▪High Read Availability
![Page 13: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/13.jpg)
TAO
Independent Scaling by Separating Roles
Cache• Objects• Assoc lists• Assoc
counts
Database
Web servers • Stateless
• Sharded by id• Servers –> bytes
• Sharded by id• Servers –> read qps
![Page 14: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/14.jpg)
Subdividing the Data Center
Cache
Database
Web servers• Inefficient failure
detection• Many switch traversals
• Many open sockets• Lots of hot spots
![Page 15: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/15.jpg)
Subdividing the Data Center
Cache
Database
Web servers
• Thundering herds
• Distributed write control logic
![Page 16: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/16.jpg)
Follower and Leader Caches
Follower cache
Database
Web servers
Leader cache
![Page 17: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/17.jpg)
What Are TAO’s Goals/Challenges?
▪Efficiency at scale
▪Low read latency
▪Timeliness of writes
▪High Read Availability
![Page 18: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/18.jpg)
Write-through Caching – Association Lists
Follower cache
Database
Web servers
X,…
X,A,B,C
Leader cache X,A,B,C
Y,A,B,C
Y,A,B,C
X –> Y
X –> Y
X –> Y ok
ok
refill X refill Xok
Y,…
X,A,B,CY,A,B,C
range get
![Page 19: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/19.jpg)
Asynchronous DB Replication
Follower cache
Database
Web servers
Master data center Replica data center
Leader cacheInval and refill embedded in SQL
Writes forwarded to master
Delivery after DB replication done
![Page 20: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/20.jpg)
What Are TAO’s Goals/Challenges?
▪Efficiency at scale
▪Low read latency
▪Timeliness of writes
▪High Read Availability
![Page 21: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/21.jpg)
Improving Availability: Read Failover
Follower cache
Database
Web servers
Master data center Replica data center
Leader cache
![Page 22: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/22.jpg)
TAO Summary
• Separate cache and DB• Graph-specific caching• Subdivide data centers
Efficiency at scale Read latency
• Write-through cache• Asynchronous replication
Write timeliness
• Alternate data sources Read availability
![Page 23: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/23.jpg)
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
![Page 24: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/24.jpg)
Inverse associations
▪ Bidirectional relationships have separate a→b and b→a edges
▪ inv_type(LIKES) = LIKED_BY
▪ inv_type(FRIEND_OF) = FRIEND_OF
▪ Forward and inverse types linked only during write
▪ TAO assoc_add will update both
▪ Not atomic, but failures are logged and repaired
Nathan
Carol
“On the summit”
FRIEND_OF
FRIEND_OF
AUTHORED_BY
AUTHOR
LIKED_BY
LIKES
![Page 25: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/25.jpg)
Single-server Peak Observed Capacity
90% 91% 92% 93% 94% 95% 96% 97% 98% 99%0 K
100000 K
200000 K
300000 K
400000 K
500000 K
600000 K
700000 K
Hit rate
Op
era
tio
ns/s
eco
nd
![Page 26: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/26.jpg)
Write latency
![Page 27: TAO Facebook’s Distributed Data Store for the Social Graph Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris,](https://reader036.fdocuments.in/reader036/viewer/2022062308/56649c6e5503460f94920acc/html5/thumbnails/27.jpg)
More In the Paper▪ The role of association time in optimizing cache hit rates
▪ Optimized graph-specific data structures
▪ Write failover
▪ Failure recovery
▪ Workload characterization