Quantcast File System (QFS) - Alternative to HDFS

23
Presented by Silvius Rus, Director, Big Data Platforms December 2013 Quantcast Petabyte Storage at Half Price with QFS 12-13 1 Big Data Gurus Meetup

description

Presentation by Silvius Rus (Quantcast) at Big Data Gurus meetup @ 2013-Dec-10

Transcript of Quantcast File System (QFS) - Alternative to HDFS

Page 1: Quantcast File System (QFS) - Alternative to HDFS

!

Presented by Silvius Rus, Director, Big Data Platforms!

December 2013!!

!

!

QuantcastPetabyte Storage at Half Price with QFS!

12-13!1!

Big Data Gurus Meetup!

Page 2: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Quantcast File System (QFS)!

A high performance alternative to the Hadoop Distributed File System (HDFS).!

!

12-13!Quantcast File System! 2!

Manages multi-petabyte Hadoop workloads with significantly faster I/O than HDFS and uses only half the disk space.!

Offers massive cost savings to large scale Hadoop users (fewer disks = fewer machines).!

Production hardened at Quantcast under massive processing loads (multi exabyte).!

Fully Compatible with Apache Hadoop.!

100% Open Source.!

!

!

Page 3: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Quantcast Technology Innovation Timeline!

12-13!Quantcast File System! 3!

Quantcast!Measurement!

Launched!!

Quantcast!Advertising!Launched!

!

!Launch!

QFS!

Started!using!

Hadoop!

Using and!sponsoring!

KFS!

Turned!off!

HDFS!

Receiving!1TB/day!

Receiving!10TB/day!

Receiving!20TB/day!

Receiving!40TB/day!

Processing!1PB/day!

Processing!10PB/day!

Processing!20PB/day!

2006! 2007! 2008! 2009! 2010! 2011! 2012! 2013!

Page 4: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Architecture!

12-13!Quantcast File System! 4!

Rack  1

Client

MetaserverRack  2

Chunk  servers

Chunk  servers

Chunk  Server·∙  Handles  IO  to  locally  stored  64MB  chunks

·∙  Monitors  host  file  system  health·∙  Replicates  and  recovers  chunks  as  metaserver  directsMetaserver

·∙  Maps  /file/paths  to  chunk  ids·∙  Manages  chunk  locations·∙  Directs  clients  to  chunk  servers

Client·∙  Implements  high  level  file  interface  (read/write/delete)

·∙  On  write,  RS  encodes  chunks  and  distributes  stripes  to  nine  chunk  servers.

·∙  On  read,  collects  RS  stripes  from  six  chunk  servers  and  recomposes  chunk.

Locate  or  allocate  chunks

Read/write  RS  encoded  data  from/to  chunk  servers

Chunk  replicationand  rebalancinginstructions

Copy/Recoverchunks

Page 5: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

QFS vs. HDFS!

Broadly comparable feature set, with significant storage efficiency advantages.!

12-13!Quantcast File System! 5!

Feature! QFS! HDFS!

Scalable, distributed storage designed for efficient batch processing! ü! ü!

Open source! ü! ü!

Hadoop compatible! ü! ü!

Unix style file permissions! ü! ü!

Error Recovery mechanism! Reed-Solomon encoding!

Multiple data copies!

Disk space required (as a multiple of raw data)! 1.5x! 3x!

Page 6: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Reed-Solomon Error CorrectionLeveraging high-speed modern networks!

HDFS optimizes toward data locality for older networks.!

10Gbps networks are now common, making disk I/O a more critical bottleneck.!

QFS leverages faster networks to achieve better parallelism and encoding efficiency.!

Result: higher error tolerance, faster performance, with half the disk space.!

12-13!Quantcast File System! 6!

1.  Break original data into 64K stripes.!

2.  Reed-Solomon generates three parity stripes for every six data strips!

!3.  Write those to nine

different drives.!

4.  Up to three stripes can become unreadable...!

5.  …yet the original data can still be recovered !Every write parallelized across 9 drives, every read across 6!

Reed-Solomon Parallel Data I/O!

Page 7: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

MapReduce on 6+3 Erasure Coded Filesversus 3x Replicated Files!

Positives!

Writing is ½ off, both in terms of space and time!

Any 3 broken or slow devices will be tolerated vs. any 2 with 3-way replication!

Re-executed stragglers run faster due to reading from multiple devices (striping)!

12-13!Quantcast File System! 7!

Negatives!

There is no locality, reading will require the network!

On read failure, recovery is needed – however it’s lightning fast on modern CPUs (2 GB/s per core)!

Writes don’t achieve network line rate as original + parity data is written by a single client!

Page 8: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Read/Write Benchmarks!

12-13!Quantcast File System! 8!

0

2

4

6

8

10

12

14

16

18

Write Read

End

-to-e

nd ti

me

(min

utes

)

HDFS 64 MB

HDFS 2.5 GB

QFS 64 MB

End-to-end 20 TB write test End-to-end 20 TB read test 8,000 workers * 2.5 GB each Tests ran as Hadoop MapReduce jobs

Page 9: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Read/Write Benchmarks!

12-13!Quantcast File System! 9!

0

2

4

6

8

10

12

14

16

18

Write Read

End

-to-e

nd ti

me

(min

utes

)

HDFS 64 MB

HDFS 2.5 GB

QFS 64 MB

End-to-end 20 TB write test End-to-end 20 TB read test 8,000 workers * 2.5 GB each Tests ran as Hadoop MapReduce jobs

Host network behavior during tests QFS write = ½ disk I/O of HDFS write QFS write à network/disk = 8/9 HDFS write à network/disk = 6/9 QFS read à network/disk = 1 HDFS read à network/disk = very small

Page 10: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

0 50 100 150 200 250 300

ls

mkdir

rmdir

stat

Operations per second (thousands)

QFS HDFS

Metaserver Performance!

12-13!Quantcast File System! 10!

Intel E5-2670 64 GB RAM 70 million directories

Page 11: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Production Hardening for Petascale!

Continuous I/O Balancing!

•  Full feedback loop!

•  Metaserver knows the I/O queue size of every device!

•  Activity biased towards under-loaded chunkservers!

•  Direct I/O = short loop!

12-13!Quantcast File System! 11!

Optimization!

•  Direct I/O and fixed buffer space = predictable RAM and storage device usage!

•  C++, own memory allocation and layout!

•  Vector instructions for Reed Solomon coding!

Operations!

•  Hibernation!

•  Evacuation through recovery!

•  Continuous space/integrity rebalancing!

•  Monitoring and alerts!

Page 12: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Fast and Efficient MapReduceQuantsort: All I/O over QFS!

12-13!Quantcast File System! 12!

Concurrent append. 10,000 writers append to same file at once.

http://qc.st/QCQuantsort

Largest sort = 1 PB Daily = 1 to 2 PB, max = 3 PB

Page 13: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

How Well Does It Work!

Reliable at Scale!

Hundreds of days of metaserver uptime common!

Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk!

8 petabytes of compressed data!

Close to 1 billion chunks!

7,500 I/O devices!

! 12-13!Quantcast File System! 13!

Page 14: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

How Well Does It Work!

Reliable at Scale!

Hundreds of days of metaserver uptime common!

Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk!

8 petabytes of compressed data!

Close to 1 billion chunks!

7,500 I/O devices!

! 12-13!Quantcast File System! 14!

Fast and Large!

Ran petabyte sort last weekend.!

Direct I/O not hurting fast scans: Sawzall query performance similar to Presto:!

! Presto/HDFS

Turbo/QFS

Seconds 16 16

Rows 920 M 970 M

Bytes 31 G 294 G

Rows/sec 57.5 M 60.6 M

Bytes/sec 2.0 G 18.4 G

Page 15: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

How Well Does It Work!

Reliable at Scale!

Hundreds of days of metaserver uptime common!

Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk!

8 petabytes of compressed data!

Close to 1 billion chunks!

7,500 I/O devices!

! 12-13!Quantcast File System! 15!

Fast and Large!

Petabyte sort.!

Direct I/O not hurting fast scans: Sawzall query performance similar to Presto:!

!

Easy to Use!

1 Ops Engineer for QFS and MapReduce on 1,000+ node cluster!

Neustar set up multi petabyte instance without help from Quantcast!

Migrate from HDFS using hadoop distcp!

Hadoop MapReduce “just works” on QFS!

Presto/HDFS

Turbo/QFS

Seconds 16 16

Rows 920 M 970 M

Bytes 31 G 294 G

Rows/sec 57.5 M 60.6 M

Bytes/sec 2.0 G 18.4 G

Page 16: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Metaserver Statistics in Production!

12-13!Quantcast File System! 16!

QFS metaserver statistics over Quantcast production file systems in July 2013. •  High Availability is nice to have but not a must-have for MapReduce. There are certainly

other use cases where High Availability is a must. •  Federation may be needed to support file systems beyond 10 PB, depending on file size

Page 17: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Chunkserver

12-13!Quantcast File System! 17!

Other FeaturesTiered Storage!

RAM

2 SSDs

10 Disks

Chunkserver

RAM

2 SSDs

10 Disks

And 450 more just like them. Tier Range as File Attribute. Use tier across 450 machines. Used in production to accelerate MapReduce fanout

Page 18: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!12-13!Quantcast File System! 18!

94.5

16.7

8.5 4.8

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

HDFS Default HDFS Small Blocks QFS on Disk QFS in RAM

Bro

adca

st T

ime

(s)

Configuration

Other FeaturesFast Broadcast through Wide Striping!

Page 19: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!12-13!Quantcast File System! 19!

700

7 0

100 200 300 400 500 600 700 800

HDFS QFS

Time (msec)

Time (msec)

Refreshingly Fast Command Line Toolhadoop fs -ls / versus qfs –ls /!

Page 20: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Who will find QFS valuable?!

Likely to benefit from QFS!

Existing Hadoop users with large-scale data clusters.!

Data heavy, tech savvy organizations for whom performance and efficient use of hardware are high priorities.!

12-13!Quantcast File System! 20!

May find HDFS a better fit!

Small or new Hadoop deployments, as HDFS has been deployed in a broader variety of production environments.!

Clusters with slow or unpredictable network connectivity.!

Environments needing specific HDFS features such as head node federation or hot standby.!

!

Page 21: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Summary!

Key Benefits of QFS!

Delivers stable high performance alternative to HDFS in a production-hardened 1.0 release!

Offers high performance management of multi-petabyte workloads!

Faster I/O than HDFS with half the disk space.!

Fully Compatible with Apache Hadoop!

100% Open Source!

Page 22: Quantcast File System (QFS) - Alternative to HDFS

© Quantcast 2012!

Future Work!

What QFS Doesn’t Have Just Yet!

Kerberos Security – under development!

HA – No strong case at Quantcast, but nice to have!

Federation – Not a strong case either at Quantcast!

Contributions welcome!!

Page 23: Quantcast File System (QFS) - Alternative to HDFS

New York432 Park Avenue SouthNew York, NY 10016!

San Francisco201 Third StreetSan Francisco, CA 94103!

London48 Charlotte StreetLondon, W1T 2NS!

© Quantcast 2012!

Thank You. Questions? !

!Download QFS for free at:github.com/quantcast/qfs!

12-13! 23!Quantcast File System!