Introduction to Solid State Drives
-
Upload
matt-simmons -
Category
Technology
-
view
7.234 -
download
4
description
Transcript of Introduction to Solid State Drives
Introduction to Solid State Drive
Technology
Saturday, November 16, 13
Class Overview
• The Evolution of Storage Technology
• Spinning Disks
• Storage Metrics
• Solid State Technology
Saturday, November 16, 13
better understanding of spinning disks, understand high & low level flash, issues with SSDs
The Evolution of Storage Technology
Saturday, November 16, 13
Pre-History
Density/Time Speed/Time
Saturday, November 16, 13
Spinning Disks
Saturday, November 16, 13
The Parts of a Hard Drive
Saturday, November 16, 13
PlattersThe Parts of a Hard Drive
Saturday, November 16, 13
Actuator Arms and Heads
The Parts of a Hard Drive
Saturday, November 16, 13
Controllerand Interface
The Parts of a Hard Drive
Saturday, November 16, 13
Voltron Force Assemble!
Saturday, November 16, 13
Disk Interface
• Removable (USB/CF)
• SATA1 / II / III
• Nearline SAS
• SAS
• Fibre Channel
• PCI-e
Saturday, November 16, 13
Spinning Disk Removable Media
• Advantages:
• Nigh Universal
• Disadvantages:
• Slower
• Fragile
• Easily lost.
• Abstraction Layer
Disk Interface
Saturday, November 16, 13
USB
SATA I / II / III
• Speeds: 1.5 / 3 / 6Gb/s
• Requires AHCI for things like NCQ
• Subset of SAS
• Shares IDE command set
Disk Interface
Saturday, November 16, 13
AHCI - Advanced Host Controller Interface (IDE is ok for TRIM)NCQ on SSD ensure SSD has things to do while host is latent(Intel can queue 32 requests) - logo from SATA-IO (intnl org)
SATA 3.1
• Approved July 2011
• Universal Storage Module
• mSATA
• QTRIM
Disk Interface
(This time, it’s personal)
Saturday, November 16, 13
QTRIM - queued TRIM commands, USM is a mobile drive standard
SAS / Nearline SAS
• SAS
• Enhanced CRC checking
• 512/520/528 bit blocks
• Low density, high reliability
• Nearline SAS
• ...not so much
Saturday, November 16, 13
Serially Attached SCSI - 16 bits of CRC
Disk Geometry
Saturday, November 16, 13
PlattersDisk
Geometry
Saturday, November 16, 13
TracksDisk
Geometry
Saturday, November 16, 13
Cylinders Disk
Geometry
Saturday, November 16, 13
Cylinders Disk
Geometry
Saturday, November 16, 13
SectorsDisk
Geometry
Saturday, November 16, 13
Logical Block Addressing
• First introduced as an abstraction layer
• Replaced CHS addressing
• Address Space is Linear (block 0 - n)
• Size of address space depends on the standard at time of manufacture.
DiskGeometry
Saturday, November 16, 13
Currently at 48-bit LBA -
Variables Affecting Spinning Disk IO Rate
Saturday, November 16, 13
Platter Rotational Speed
Saturday, November 16, 13
Seek Speed
Saturday, November 16, 13
Data Density
Saturday, November 16, 13
Controller CacheVariables Affecting Spinning
Disk Speed
Saturday, November 16, 13
Size / battery backed / hybrid drives
Spinning Disk Damage Vectors
Saturday, November 16, 13
Movement
• Movement vertical or parallel to platter
• Measured in G forces
• Head Crashes
• Spinning Down
• Head uses “Landing Strip”
• Repeated platter contact causes damage to the read/write head
Spinning Disk Damage Vectors
Saturday, November 16, 13
Used to manually park the heads | Putting your computer to sleep can cause the head to park | nanocoating on the bumpy landing strip
Protection against movement
• “Active Drive Protection”: Free-fall sensor
• Has a lift arm to lift the head away from the platter
• Some protection systems are in the drive, some are in the controller
• Don’t mix the two
Saturday, November 16, 13
Apple: Sudden motion sensor, Lenovo/IBM: Hard Drive Active Protection System
Next slide: “You know, vibrations are movement...”
You know, vibrations are movement...
Saturday, November 16, 13
Yes, vibrations are important, too
Saturday, November 16, 13
Shelf Life
• Oil / Lubricants in bearings
• Temperature fluctuations
• Magnetic “events” (bit rot)
• Outgassing / vapor removal
Spinning Disk Damage Vectors
Saturday, November 16, 13
Long-term “archival quality” drives with long-life lubricant Long term “cold storage” arrays which periodically spin up drives every few weeks to clean & scrub the data
Spinning Disks in RAID
• Redundant Array of Inexpensive Disks
• Common RAID levels:
• 0,1,5,6,10
• Software / Hardware
Saturday, November 16, 13
Important Considerations
• Redundancy
• Capacity
• Speed
• Robust
SpinningDisks
In RAID
Saturday, November 16, 13
Speed: Dedicated hardware? Single point of failure? Parity Calculation? How long to rebuild a drive? NUMBER OF SPINDLES!! Redundancy: How many drive failures? URE errors? Capacity: Parity stripe or mirrors? (harder better faster stronger)
Advantages
• Linear Speed
• Price (Per Gigabyte)
• Well-Understood
SpinningDisks
In RAID
Saturday, November 16, 13
Disadvantages
• Random Speed
• Price (per IOPS)
• Failure Rate
• Rebuild Speed
SpinningDisks
In RAID
Saturday, November 16, 13
Storage Metrics
Saturday, November 16, 13
IOPS
• What are they?
• What aren’t they?
Saturday, November 16, 13
The Simplified Equation
IOPS = 1/(((R+W)/2)/1000) + (L/1000)
RWL
= Average Read Time= Average Write Time= Average Latency
Saturday, November 16, 13
Rule of Thumb Assumptions
RPM IOPS5400 50-807200 80-10010k 130-15015k 180-200
Saturday, November 16, 13
Determining IOPS
• Per Drive
• Manufacturer’s Stated Numbers
• Rule of Thumb
• Per RAID Array
• Write penalty
Saturday, November 16, 13
IO Profiling
• Active Tools
• Bonnie++
• dd
• Intel NAS Toolkit
• Passive Tools
• io(stat/meter/top), atop
• Resource Monitor / Process Explorer
Saturday, November 16, 13
http://www.intel.com/products/server/storage/NAS_Perf_Toolkit.htm
Solid State Drive Technology
Saturday, November 16, 13
NOR Flash
• Reads and writes are atomic single-bit
• Expensive
• Small specific use cases
Saturday, November 16, 13
Won’t talk about NOR much.
NAND Flash
• Reads are based on “read blocks” (4k)
• Writes are based on “erasure blocks”
• Cheap (and getting cheaper)
• Broad use cases
Saturday, November 16, 13
Read / Write Profiles
• Logical addresses abstracted from LBA
• No seek time
• Reads are generally very fast
• Writes are typically slower
Saturday, November 16, 13
Random and Linear IO have identical access timeNext slide: The magic
The Magic
InsulatingBarrier
Pure Silicon
Doped silicon capable of holding an electrical charge
Saturday, November 16, 13
Barrier is a dielectric film (silicon oxide)
Quantum Tunneling
(transmission coefficient for a particle tunneling through a single potential barrier)
Saturday, November 16, 13
Hot Carrier InjectionStorage / Erase uses Fowler-Nordheim Tunnel Injection / Release
Doped Silicon
Single Layer Cell (SLC)
Multi-Layer Cell (MLC)
Triple-Layer Cell (TLC)
Saturday, November 16, 13
Use charge pumps to get through the barrier Each charge level has a binary state - 1 or 0
Gradual Destruction
Energy increaseswith cell layers
Multiple cells needmultiple writes
Barrier accumulateselectrons
Electrical potential difference of barrier and cells disappears
Saturday, November 16, 13
Difficulty Going ForwardTLCTLC
000 100
001 101
010 110
011 111
SLC
0
1
MLC
00
01
10
11
4LC4LC4LC4LC
0000 0100 1000 1100
0001 0101 1001 1101
0010 0110 1010 1110
0011 0111 1011 1111
Saturday, November 16, 13
Density
• 3-Dimensional
• Charge levels
• Size of cells
• “Dot Pitch” (Cells Per Inch)
• 5nm, 3nm, 2nm
• Varies with “level” count
Saturday, November 16, 13
SLC / ESLC
• Low Density
• Single (bit) Level Cell
• Quick: 25µs Read / 200-300µ Write
• More robust & long wear time
• Write endurance near 100,000 cycles
Saturday, November 16, 13
Capacity expensive | only in 5nm / 3nm densities |
MLC / EMLC
• Reasonably High Density
• Two (bit) Level Cell
• Decently fast: 50µs Read / 600-900µs Write
• Medium lifetime
• Write endurance near 3,000 cycles
Saturday, November 16, 13
TLC
• Very High Density
• Three (bit) Level Cell
• Decently fast: 75µs Read / 900-1350µs Write
• Not very robust or durable :-(
• Write endurance ~ 1,000 cycles
Saturday, November 16, 13
Write Amplification and Garbage Collection
Saturday, November 16, 13
Block Sizes
• Read Block
• 4k (aka “page”)
• Erasure Block
• (Large) multiple of 4k
• aka “block” 256KB erasure
block size
Saturday, November 16, 13
e-ink parallel
Write Amplification
Written Data
Empty Cell
Saturday, November 16, 13
next - want to change the data in the upper right quadrant
Write Amplification
Written Data
Empty Cell
New Data
Old Data
Saturday, November 16, 13
next - big chunk of new data to write
Write Amplification
Written Data
Empty Cell
Old Data
New Data
Saturday, November 16, 13
Where does this go? We’re out of empty erasure blocks!
Write Amplification
Written Data
Empty Cell
New data writtenover old cell
w/o TRIM
Saturday, November 16, 13
Write Amplification
Written Data
Empty Cell
New data writtenover old cell
w/ TRIM
Saturday, November 16, 13
Garbage Collection
Saturday, November 16, 13
Garbage Collection
Saturday, November 16, 13
Garbage Collection
Saturday, November 16, 13
Garbage Collection
Saturday, November 16, 13
Garbage Collection
Saturday, November 16, 13
IO Performance Profiles
Saturday, November 16, 13
Remember:
• Spinning Disks
• Linear is fast
• Random is slow
• Read marginally faster than writes (sometimes)
Saturday, November 16, 13
writes slower when switching tracks
With SSDs:
• Reads are fast
• Writes are slow(ish)
• Random or linear doesn’t matter (as much)
Saturday, November 16, 13
SSD Performance Overview
• Depends on
• Number of flash chips in use
• Number of busses from the processor
• Performance of controller CPU
• Contention
• Bus speed
• Number of erasure blocks used
• Number of previous writes to flash cells
Saturday, November 16, 13
• Chips
• IO Busses
• CPU Cores
Saturday, November 16, 13
Causes of Contention
• Legitimate use
• Garbage collection
• Legitimate (but latent) useage
• IO Blender!
(Bender Blender: http://bit.ly/10vc7Sf)Saturday, November 16, 13
Latent: updatedb? atime? app-level garbage collection? (t-shirt at threadless)
Bus Speed
• SATA - 3 or 6 Gb/s?
• IOPS Calc
• Can your controller handle your disks?
Saturday, November 16, 13
Read
• Very fast
• No seek time
• moderately improved over spinning disk (linear - random greatly)
• Causes no damage to the media
• Generally scales up with capacity
Saturday, November 16, 13
Write
• Usually fast (depending on drive usage)
• No seek time
• highly improved over spinning disk
• Causes no damage to the media
• Generally scales up with capacity
Saturday, November 16, 13
Spinning DiskRead/Write Matrix
Read Write
Linear
Random
Saturday, November 16, 13
SSDRead/Write Matrix
Read Write
Linear
Random
Saturday, November 16, 13
Solid State in Practice
Saturday, November 16, 13
Solid State Form Factors
Saturday, November 16, 13
Removable Media
Saturday, November 16, 13
Drives
Saturday, November 16, 13
PCI Cards
Saturday, November 16, 13
Next slide: parts of an SSD
Parts of an SSD
Saturday, November 16, 13
Interface
USB
PCI
SATA/SAS
Saturday, November 16, 13
IDE (sadly?)
ControllerMain
Processor
I/O Bus Lanes
RAMCache
Battery / SuperCapacitor
Saturday, November 16, 13
Flash Chips
Saturday, November 16, 13
If individual chip capacity is finite, how do bigger drives increase capacity? What does this mean for performance?
Flash Controllers
• Flash Translation Layer (FTL)
• Stripe Writes
• Interpret bus instructions
• Wear Leveling
• Garbage Collection
Saturday, November 16, 13
Do the heavy lifting - single largest problem with flash drives, without a doubt.
Flash Translation LayerLBA (0...n blocks)
F L A S H C H I P S
Saturday, November 16, 13
SSD Aspects & Concerns
Saturday, November 16, 13
Longevity
• Primarily determined by the class of flash
• (e)SLC, (e)MLC, TLC
• Related to wear-leveling
• Under-reported capacity
• Short-stroking improves lifetime (not speed)
Saturday, November 16, 13
Partition Alignment
• Performance and longevity
• As big (or bigger) issue than it was in spinning disks
• Native 4k read blocks
• Far larger erasure blocks
• larger than is practical for alignment
Saturday, November 16, 13
TRIM
• As a command, refers to ATA-8 spec
• SCSI equivalent is UNMAP, but both are often referred to as TRIM.
• Does not immediately delete unused blocks
• Allows for GC
Saturday, November 16, 13
Linux calls this “discard” - TRIM refers
Linux TRIM Support
• EXT4 / XFS / JFS / BTRFS - Native using ‘discard’ option
• Consider NOOP or Deadline IO scheduler
• fstrim (part of util-linux) for R/W vols
• zerofree for R/O vols
Saturday, November 16, 13
fstrim & zerofree - userland - important for thin-provisioned volumes on SAN arrays which support it. Check docs on schedulers for details - deadline prefers read queues (under /sys/block)
OSX Trim Support
• Comes by default on factory-installed SSDs
• Trim-Enabler
• http://www.groths.org/trim-enabler/
Saturday, November 16, 13
ZFS and SSDs
• ZFS Intent Log (ZIL)
• Adaptive Replacement Cache (ARC)
• arc_summary can help you decide
Saturday, November 16, 13
ZIL is almost like a journal - ARC is a RAM cache that has disk backing it. SSDs can be L2ARC - https://code.google.com/p/jhell/wiki/arc_summary
Filesystems in General
• Standard journaling filesystems
• Mount options (atime/relatime, etc), /tmp->tmpfs
• Next-Gen
• ZFS / BTRFS
• Distributed Filesystems
• DRBD
Saturday, November 16, 13
ZFS - SSD cache pool | ZFS/BTRFS are COW | DRBD no trim
Monitor Health w/ S.M.A.R.T.
• S.M.A.R.T. information
• vendor-specific
• Includes flash erase count
• smartctl on Linux and Mac
• Dozens of tools on Windows (check wiki)
Saturday, November 16, 13
Forensics
(http://bit.ly/fast11-wei-paper)
...Our results lead to three conclusions:
First, built-in commands are effective, but manufacturers sometimes implement them incorrectly.
Second, overwriting the entire visible address space of an SSD twice is usually, but not always, sufficient to sanitize the drive.
Third, none of the existing hard drive-orientedtechniques for individual file sanitization are effective on SSDs
Reliably Erasing Data From Flash-Based Solid State DrivesMichael Wei∗, Laura M. Grupp∗, Frederick E. Spada†, Steven Swanson∗
∗Department of Computer Science and Engineering, University of California, San Diego†Center for Magnetic Recording and Research, University of California, San Diego
Saturday, November 16, 13
SSD-enhanced RAID Array Considerations
Saturday, November 16, 13
Hardware / Software
• Dedicated CPU Power
• Battery-backed storage
Hardware RAID Controllers
• Trust (eyes on code)
• Excessive cost of HW
Software RAID Controllers
• Commercial Support
• Proprietary Tech
• Portability
• Spare CPU Cycles
Saturday, November 16, 13
single point of failure
TRIM / GC?
• Does the RAID software/device know enough to pass along TRIM?
• Will the array eventually crawl because of ongoing GC issues?
Saturday, November 16, 13
No software RAID that I know of supports it. Intel chipset for RAID0 with TRIM
Access Bandwidth
• How much data can a single drive transmit?
• How many drives are in the array?
• What is the aggregate bus speed to the array controller?
• What is the bus speed to the host(s)?
Saturday, November 16, 13
SSD Throughput Example
From Tech Radar: http://bit.ly/100UhvY
4.15Gb/s
Saturday, November 16, 13
Controller / Bus
• Speed / Ports
• How mature / reliable / tested?
RememberMe?
Saturday, November 16, 13
Just because buses exist in a storage array oesn’t make them magic and infinite in size
Tiering / Caching
Very slow, cheap disks
Faster spinning disks
SSD tier - hot blocks
Very fast SDRAM
Saturday, November 16, 13
Future Technology
Saturday, November 16, 13
Enhanced Capacity
Saturday, November 16, 13
Kowloon Walled City
Enhanced Longevity
Saturday, November 16, 13
Telomeres in chromosomes
Smart SSDs
Saturday, November 16, 13
Active Flash
Saturday, November 16, 13
What should I buy?
Saturday, November 16, 13
Questions?
Saturday, November 16, 13