PS1 Prototype Systems Design Jan Vandenberg, JHU
description
Transcript of PS1 Prototype Systems Design Jan Vandenberg, JHU
![Page 1: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/1.jpg)
slide 1
PS1 Prototype Systems DesignJan Vandenberg, JHU
Early PS1 Prototype
![Page 2: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/2.jpg)
slide 2
Engineering Systems to Support the Database Design
Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need
solid write performance Time to do full table scans Time to do index scans Need to do most work where the data is; can’t sling TB’s
over the network quickly• …though we can brute-force past 1 Gbit Ethernet if
necessary
![Page 3: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/3.jpg)
slide 3
Fibre Channel, SAN
Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks
![Page 4: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/4.jpg)
slide 4
SATA
Fast Cheap Ugly, spooky
• <cabling pic> Tough to manage
• <dlmsdb/sdssdb drive bay map>
![Page 5: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/5.jpg)
slide 5
SAS
For our purposes, it’s SATA without the ugliness Fast: 12 Gbit/s FD building blocks Cheap: PS1 prototype MD1000 pricing versus Newegg
media costs Not Ugly: IB cables versus rats’ nest Industrial strength manageability: pretty blinking lights
and mgmt apps versus downtime plus white knuckles <cabling pic>
![Page 6: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/6.jpg)
slide 6
I/O Performance of Dell SAS Systems in the PS1 Prototype
![Page 7: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/7.jpg)
slide 7
SAS Performance, Gory Details
SAS v. SATA differences
Native SAS V. SATA Performance
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7
Disks
MB
/s
20%
![Page 8: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/8.jpg)
slide 8
Per-Controller Performance
Luckily, one controller is fast enough for one SATA disk box
<performance chart>
![Page 9: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/9.jpg)
slide 9
Resulting PS1 Prototype I/O Topology
<topo diagram> <aggregate performance chart>
![Page 10: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/10.jpg)
slide 10
RAID-5 v. RAID-10?
Primer, anyone? RAID-5 probably feasible with contemporary controller… …though tough to predict real-world effects of latency… …and not a ton of redundancy But after we add enough disks to meet performance
goals, we have enough storage to run RAID-10 anyway!• Remember sub-Newegg media costs
![Page 11: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/11.jpg)
slide 11
RAID-10 Performance
Executive summary: RAID0/2 for single-threaded reads, RAID0 perf for 2-user/2-thread workloads. RAID0/2 writes
![Page 12: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/12.jpg)
slide 12
PS1 Prototype Servers
<diagram of server roles plus storage and network interconnects>
![Page 13: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/13.jpg)
slide 13
PS1 Prototype Servers
<iron photo (w/Will?)>
![Page 14: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/14.jpg)
slide 14
Projected PS1 Systems Design
<diagram of 8-slice triply-replicated systems> <plus geoplex?>
![Page 15: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/15.jpg)
slide 15
Backup/Recovery/Replication Strategies
No formal backup• …except maybe for mydb’s, f(cost*policy)
3-way replication• Replication != backup
– Little or no history– Replicas can be a bit too cozy: must notice badness before
replication propagates it• Replicas provide redundancy and load balancing…• Fully online: zero time to recover• Replicas needed for happy production performance plus
ingest, anyway Off-site geoplex
• Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage)
– <lava pic?>• Could help balance trans-Pacific bandwidth needs (service
continental traffic locally)
![Page 16: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/16.jpg)
slide 16
Why No Traditional Backups?
Not super pricey… …but not very useful relative to a replica for our
purposes• Time to recover
Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration
needs (IPP?)… do traditional backups too!!!
![Page 17: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/17.jpg)
slide 17
Failure Scenarios
Easy, zero-downtime:• Disks• Power supplies• Fans
Not so spooky, maybe some downtime and manual replica cutover:• System board (rare)• Memory (rare and usually proactively detected and handled via scheduled maintenance)• Disk controller (rare, potentially minimal downtime via cold-spare controller)• CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly)
More spooky:• Database mangling by human or pipeline error
– Gotta catch this before replication propagates it everywhere– Can’t replicate too aggressively– (and so off-the-shelf near-realtime replication tools don’t help us)
• Catastrophic loss of datacenter– Have the geoplex
– …but we’re dangling by a single copy ‘till recovery complete– …but are we still screwed? Depending on colo scenarios, did we also lose the IPP and flatfile
archive? Terrifying:
• Unrecoverable badness fully replicated before detection• Catastrophic loss of datacenter without geoplex• Can we ever catch back up with the data rate if we need to start over?
– At some point in the survey, the answer likely becomes “no”.
![Page 18: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/18.jpg)
slide 18
State Diagram for Replicas?
Loading Replicating Load balancing Failing Recovering
• Possibly repeat-loading
![Page 19: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/19.jpg)
slide 19
Operating Systems, DBMS?
Sql2005 EE x64• Why?• Why not DB2, Oracle RAC, PostgreSQL, MySQL,
<insert your favorite>? (Win2003 EE x64) <Why EE?> Platform rant from JVV available over beers
• <JVV/beer graphic?>
![Page 20: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/20.jpg)
slide 20
Systems/Database Management
Active Directory infrastructure Windows patching tools, methodology Linux patching tools, methodology Monitoring Staffing requirements
![Page 21: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/21.jpg)
slide 21
Facilities/Infrastructure Projections for PS1
Cooling Rack space Network ports (plus AD/WSUS/monitoring infrastructure above)
![Page 22: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/22.jpg)
slide 22
Operational Handoff to UofH
![Page 23: PS1 Prototype Systems Design Jan Vandenberg, JHU](https://reader036.fdocuments.in/reader036/viewer/2022062422/56813c78550346895da60e29/html5/thumbnails/23.jpg)
Mahalo!(See Ya, Hon!)