What Every Data Programmer Needs to Know about...
-
Upload
truongdang -
Category
Documents
-
view
243 -
download
3
Transcript of What Every Data Programmer Needs to Know about...
![Page 1: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/1.jpg)
Not proprietary or confidential. In fact, you’re risking a career by listening to me.
What Every Data Programmer Needs to Know about Disks
Ted Dziuba
@dozba
OSCON Data – July, 2011 - Portland
![Page 2: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/2.jpg)
Who are you and why are you talking?
A few years ago: Technical troll for The Register.
Recently: Co-founder of Milo.com, local shopping engine.
Present: Senior Technical Staff for eBay Local
First job: Like college but they pay you to go.
![Page 3: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/3.jpg)
The Linux Disk Abstraction
Volume
/mnt/volume
File System
xfs, ext
Block Device
HDD, HW RAID array
![Page 4: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/4.jpg)
What happens when you read from a file?
f = open(“/home/ted/not_pirated_movie.avi”, “rb”)avi_header = f.read(56)f.close()
user
buffer
page
cache
Disk
controllerplatter
![Page 5: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/5.jpg)
What happens when you read from a file?
user
buffer
page
cache
Disk
controllerplatter
•Main memory lookup•Latency: 100 nanoseconds•Throughput: 12GB/sec on good hardware
![Page 6: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/6.jpg)
What happens when you read from a file?
user
buffer
page
cache
Disk
controllerplatter
•Needs to actuate a physical device•Latency: 10 milliseconds•Throughput: 768 MB/sec on SATA 3•(Faster if you have a lot of money)
![Page 7: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/7.jpg)
Sidebar: The Horror of a 10ms Seek Latency
A disk read is 100,000 times slower than a memory read.
100 nanoseconds
Time it takes you to write a really clever tweet
10 milliseconds
Time it takes to write a novel, working full time
![Page 8: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/8.jpg)
What happens when you write to a file?
f = open(“/home/ted/nosql_database.csv”, “wb”)f.write(key)f.write(“,”)f.write(value)f.close()
user
buffer
page
cache
Disk
controllerplatter
![Page 9: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/9.jpg)
What happens when you write to a file?
f = open(“/home/ted/nosql_database.csv”, “wb”)f.write(key)f.write(“,”)f.write(value)f.close()
user
buffer
page
cache
Disk
controllerplatter
You need to make this
part happen
Mark the page dirty,
call it a day and go have a smoke.
![Page 10: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/10.jpg)
Aside: Stick your finger in the Linux Page Cache
Clear your page cache: echo 1 > /proc/sys/vm/drop_caches
Dirty pages: grep –i “dirty” /proc/meminfo
Pre-Linux 2.6 used “pdflush”, now per-Backing Device Info (BDI) flush threads
/proc/sys/vm Love:
•dirty_expire_centisecs : flush old dirty pages
•dirty_ratio : flush after some percent of memory is used
•dirty_writeback_centisecs : how often to wake up and start flushing
Crusty sysadmin’s hail-Mary pass: sync; sync; sync
![Page 11: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/11.jpg)
Fsync: force a flush to disk
f = open(“/home/ted/nosql_database.csv”, “wb”)f.write(key)f.write(“,”)f.write(value)os.fsync(f.fileno())f.close()
user
buffer
page
cache
Disk
controllerplatter
Also note, fsync() has a cousin, fdatasync() that does not sync metadata.
![Page 12: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/12.jpg)
Aside: point and laugh at MongoDB
Mongo’s “fsync” command:
> db.runCommand({fsync:1, async:true});
wat.
Also supports “journaling”, like a WAL in the SQL world, however…
•It only fsyncs() the journal every 100ms…”for performance”.
•It’s not enabled by default.
![Page 13: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/13.jpg)
Fsync: bitter lies
f = open(“/home/ted/nosql_database.csv”, “wb”)f.write(key)f.write(“,”)f.write(value)os.fsync(f.fileno())f.close()
user
buffer
page
cache
Disk
controllerplatter
Drives will lie to you.
![Page 14: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/14.jpg)
Fsync: bitter lies
page
cache
Disk
controller
…it’s a cache!
•Two types of caches: writethrough and writeback
•Writeback is the demon
platter
![Page 15: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/15.jpg)
(Just dropped in) to see what condition your caches are in
Disk
controller platter
No controller cache Writeback cache on disk
A Typical Workstation
![Page 16: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/16.jpg)
(Just dropped in) to see what condition your caches are in
Disk
controller platter
Writethrough cache
on controller
Writethrough cache on disk
A Good Server
![Page 17: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/17.jpg)
(Just dropped in) to see what condition your caches are in
Disk
controller platter
Battery-backed writeback
cache on controller
Writethrough cache on disk
An Even Better Server
![Page 18: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/18.jpg)
(Just dropped in) to see what condition your caches are in
Disk
controller platter
Battery-backed writeback
cache or
Writethrough cache
Writeback cache on disk
The Demon Setup
![Page 19: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/19.jpg)
Disks in a virtual environment
The Trail of Tears to the Platter
user
buffer
page
cache
Virtual
controller
platterHost
page
cache
Physical
controller
Hypervisor
![Page 20: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/20.jpg)
Disks in a virtual environment
Why EC2 I/O is Slow and Unpredictable
Image Credit: Ars Technica
Shared Hardware
•Physical Disk
•Ethernet Controllers
•Southbridge
•How are the caches configured?
•How big are the caches?
•How many controllers?
•How many disks?
•RAID?
![Page 21: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/21.jpg)
Aside: Amazon EBS
Please stop doing this.
MySQL Amazon EBS
![Page 22: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/22.jpg)
What’s Killing That Box?
ted@u235:~$ iostat -xLinux 2.6.32-24-generic (u235) 07/25/2011 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle0.15 0.14 0.05 0.00 0.00 99.66
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz %utilsda 0.00 3.27 0.01 2.38 0.58 45.23 19.21 0.24
![Page 23: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/23.jpg)
Cool Hardware Tricks
Beginner Hardware Trick: SSD Drives
0 1 2 3
SATA
SSD
$/GB
•$2.50/GB vs 7.5c/GB
•Negligible seek time vs 10ms seek time
•Not a lot of space
![Page 24: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/24.jpg)
Cool Hardware Tricks
Intermediate Hardware Trick: RAID Controllers
•Standard RAID Controller
•SSD as writeback cache
•Battery-backed
•Adaptec “MaxIQ”
•$1,200
Image Credit: Tom’s Hardware
![Page 25: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/25.jpg)
Cool Hardware Tricks
Advanced Hardware Trick: FusionIO
•SSD Storage on the Northbridge (PCIe)
•6.0 GB/sec throughput. Gigabytes.
•30 microsecond latency (30k ns)
•Roughly $20/GB
•Top-line card > $100,000 for around 5TB
![Page 26: What Every Data Programmer Needs to Know about Disksmomjian.us/main/writings/pgsql/Dziuba_OSCON_2011_Data.pdf · Not proprietary or confidential. In fact, you’re risking a career](https://reader031.fdocuments.in/reader031/viewer/2022021416/5a9ed03e7f8b9a62178bdb16/html5/thumbnails/26.jpg)
Questions
Thank Youhttp://teddziuba.com/
@dozba
Questions & Heckling