The Stanford Login Web Tools Workshop 2 Your Presenter: Laura Silberstein.
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
-
Upload
samson-rodgers -
Category
Documents
-
view
212 -
download
2
Transcript of 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
![Page 1: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/1.jpg)
1
Web-Scale Data Serving with PNUTS
Adam SilbersteinYahoo! Research
![Page 2: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/2.jpg)
2
Outline
• PNUTS Architecture• Recent Developments
– New features– New challenges
• Adoption at Yahoo!
![Page 3: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/3.jpg)
3
Yahoo! Cloud Data Systems
• Scan oriented workloads• Focus on Sequential disk I/O
• CRUD • Point lookups and short scans• Index organized table and
random I/Os
• Object retrieval and streaming• Scalable file storage
![Page 4: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/4.jpg)
4
What is PNUTS?
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
Parallel database
Structured, flexible schema
Hosted, managed infrastructure
Key1 42342 E
Key2 42521 W
Key3 66354 W
Key4 12352 E
Key5 75656 C
Key6 15677 E
Geographic replication
Key1 42342 E
Key2 42521 W
Key3 66354 W
Key4 12352 E
Key5 75656 C
Key6 15677 E
Key1 42342 E
Key2 42521 W
Key3 66354 W
Key4 12352 E
Key5 75656 C
Key6 15677 E
![Page 5: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/5.jpg)
5
PNUTS Design Features
5
![Page 6: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/6.jpg)
6
Distributed Hash Table
Primary Key Record
Grape {"liquid" : "wine"}
Lime {"color" : "green"}
Apple {"quote" : "Apple a day keeps the …"}
Strawberry {"spread" : "jam"}
Orange {"color" : "orange"}
Avocado {"spread" : "guacamole"}
Lemon {"expression" : "expensive crap"}
Tomato {"classification" : "yes… fruit"}
Banana {"expression" : "goes bananas"}
Kiwi {"expression" : "New Zealand"}
0x0000
0x911F
0x2AF3
Tablet
![Page 7: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/7.jpg)
7
Distributed Ordered Table
Primary Key Record
Apple {"quote" : "Apple a day keeps the …"}
Avocado {"spread" : "guacamole"}
Banana {"expression" : "goes bananas"}
Grape {"liquid" : "wine"}
Kiwi {"expression" : "New Zealand"}
Lemon {"expression" : "expensive crap"}
Lime {"color" : "green"}
Orange {"color" : "orange"}
Strawberry {"spread" : "jam"}
Tomato {"classification" : "yes… fruit"}
Tablet clustered by key range
![Page 8: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/8.jpg)
8
PNUTS-Single Region
StorageUnits
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
n
Key JSON
Key JSON
Tablet 1
Tablet 2
Tablet 3
Tablet 4
Tablet 5
Tablet M
Table: FOO
1
3
5
Tablet Controller
2
9
n
Routers
• Maintains map from database.table.key to tablet to storage-unit
• Routes client requests to correct storage unit
• Caches the maps from the tablet controller
• Stores records• Services get/set/delete
requests8
![Page 9: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/9.jpg)
9
Tablet Splitting & Balancing
Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeOverfull tablets split
Storage unit may become a hotspot
Shed load by moving tablets to other servers
9
![Page 10: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/10.jpg)
10
PNUTS Multi-Region
StorageUnits
DC1
Applications
Tribble (Message Bus)
DC3
Messaging Layer
Tablet 1
Tablet 2
Tablet 3
Tablet 4
Tablet 5
Tablet M
Table XYZ
1
3
5
Tablet Controller
2
9
n
Filer
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
n
Key JSON
Key JSON
Routers
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
m
Key JSON
Key JSON
Routers
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
k
Key JSON
Key JSON
Routers
Tribble (Message Bus)
DC2
Tablet Controller
Tablet Controller
![Page 11: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/11.jpg)
11
Asynchronous Replication
![Page 12: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/12.jpg)
12
Consistency Options
Eventual ConsistencyoLow latency updates and inserts done locally
Record Timeline ConsistencyoEach record is assigned a “master region”o Inserts succeed, but updates could fail during outages*
Primary Key Constraint + Record TimelineoEach tablet and record is assigned a “master region”o Inserts and updates could fail during outages*
Availability C
onsistency
![Page 13: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/13.jpg)
13
Record Timeline Consistency
Transactions:• Alice changes status from “Sleeping” to “Awake”• Alice changes location from “Home” to “Work”
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Awake)
Region 2
Awake Work
(Alice, Work, Awake)
Work
(Alice, Work, Awake)
No replica should see record as (Alice, Work, Sleeping)
![Page 14: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/14.jpg)
14
Eventual Consistency
• Timeline consistency comes at a price– Writes not originating in record master region
forward to master and have longer latency– When master region down, record is
unavailable for write• We added eventual consistency mode
– On conflict, latest write per field wins– Target customers
• Those that externally guarantee no conflicts• Those that understand/can cope
![Page 15: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/15.jpg)
15
Outline
• PNUTS Architecture• Recent Developments
– New features– New challenges
• Adoption at Yahoo!
![Page 16: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/16.jpg)
16
Ordered Table Challenges
MIN
I
S
MAX
applecarrottomatobananaavocadolemon
MIN
B
L
MAX
• Carefully choose initial tablet boundaries• Sample input keys
• Same goes for any big load• Pre-split and move tablets if needed
![Page 17: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/17.jpg)
17
Ordered Table Challenges
• Dealing with skewed workloads– Tablet split, tablet moves
• Initially operator driven• Now driven by Yak load balancer
• Yak– Collect storage unit stats– Issue move, split requests– Be conservative, make sure loads are here to
stay!• Moves are expensive• Splits not reversible
![Page 18: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/18.jpg)
18
Notifications
• Many customers want a stream of updates made to their tables
• Update external indexes, e.g., Lucene-style index• Maintain cache• Dump as logs into Hadoop
• Under the covers, notification stream is actually our pub/sub replication layer, Tribble
client pnuts not. client client index, logs, etc.
![Page 19: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/19.jpg)
19
Materialized Views
Key Value
item123 type=bike, price=100
item456 type=toaster, price=20
item789 type=bike, price=200
Does not efficiently support list all bikes for sale!
Key Value
bike_item123 price=100
bike_item789 price=200
toaster_item456 price=20
Async updates via pub/sub layer
Adding/deleting item triggers add/delete on index
Updating item type trigger delete and add on index Get bikes for sale with prefix scan:
bike*
Index on type!
Items
![Page 20: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/20.jpg)
20
Bulk Operations
HDFS
1) User click history logs stored in HDFS
2) Hadoop job builds models of user preferences
4) Models read from PNUTS help decide users’ frontpage content
Candidate content
3) Hadoop reduce writes models to PNUTS user table
PNUTS
![Page 21: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/21.jpg)
21
PNUTS-Hadoop
Reading from PNUTSHadoop Tasks
scan(0x2-0x4)
scan(0xa-0xc)
scan(0x8-0xa)
scan(0x0-0x2)
scan(0xc-0xe)
MapPNUTS
1. Split PNUTS table into ranges2. Each Hadoop task assigned a range3. Task uses PNUTS scan API to
retrieve records in range4. Task feeds scan results and feeds
records to map function
RecordReader
Writing to PNUTS
Map or ReduceHadoop Tasks PNUTS
Routersetsetsetsetsetset
1. Call PNUTS set to write output
set
![Page 22: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/22.jpg)
22
Bulk w/Snapshot
Snapshot daemons
Per-tablet snapshot files
PNUTS tablet map
Hadoop tasks
PNUTS Storage units
Send map to tasks
Tasks write output to snapshot files
Sender daemons send snapshots to PNUTS
Receiver daemons load snapshots into PNUTS
foo
foo
![Page 23: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/23.jpg)
23
Selective Replication
• PNUTS replicates at the table-level, potentially among 10+ data centers– Some records only read in 1 or a few data
centers– Legal reasons prevent us from replicating
user data except where created– Tables are global, records may be local!
• Storing unneeded replicas wastes disk• Maintaining unneeded replicas wastes network
capacity
![Page 24: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/24.jpg)
24
Selective Replication
• Static– Per-record constraints– Client sets mandatory, disallowed regions
• Dynamic– Create replicas in regions where record is read– Evict replicas from regions where record not read– Lease-based
• When a replica read, guaranteed to survive for a time period• Eviction lazy; when lease expires, replica deleted on next write
– Maintains minimum replication levels– Respects explicit constraints
![Page 25: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/25.jpg)
25
Outline
• PNUTS Architecture• Recent Developments
– New features– New challenges
• Adoption at Yahoo!
![Page 26: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/26.jpg)
26
PNUTS in production
• Over 100 Yahoo! applications/platforms on PNUTS– Movies, Travel, Answers– Over 450 tables, 50K tablets
• Growth, past 18 months– 10s to 1000s of storage servers– Less than 5 data centers to over 15
![Page 27: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/27.jpg)
27
Customer Experience
• PNUTS is a hosted service– Customers don’t install– Customers usually don’t wait for hardware requests
• Customer interaction– Architects and dev mailing list help with design– Ticketing to get tables– Latency SLA and REST API
• Ticketing ensured PNUTS stays sufficiently provisioned for all customers– We check on intended use, expected load, etc.
![Page 28: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/28.jpg)
28
Sandbox
• Self-provisioned system for getting test PNUTS tables
• Start using REST API in minutes• No SLA
– Just running on a few storage servers, shared among many clients
• No replication– Don’t put production data here!
![Page 29: 1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.](https://reader030.fdocuments.in/reader030/viewer/2022032722/56649ce15503460f949ac871/html5/thumbnails/29.jpg)
29
Thanks!
• Adam Silberstein– [email protected]
• Further Reading– System Overview: VLDB 2008– Pre-planning for big loads: SIGMOD 2008– Materialized views: SIGMOD 2009– PNUTS-Hadoop: SIGMOD 2011– Selective replication: VLDB 2011– YCSB: https://github.com/brianfrankcooper/YCSB/,
SOCC 2010