BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

17
GlusterFS Meetup, Sep 2015 BitRot detection in GlusterFS (Detecting silent corruption of data) Gaurav Kumar Garg (GlusterFS Developer) [email protected] freenode nic: ggarg GlusterFS Meetup Sep 12, 2015

Transcript of BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

Page 1: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

BitRot detection in GlusterFS

(Detecting silent corruption of data)

Gaurav Kumar Garg (GlusterFS Developer)

[email protected] freenode nic: ggarg

GlusterFS MeetupSep 12, 2015

Page 2: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Agenda

GlusterFS – What is it ? BitRot – What is it ? BitRot detection

– Signing

– Scrubbing Correction Performance Demo Q & A

Page 3: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

GlusterFS - What is it ?

2.5+ exabytes of data produced every day 90% of data “produced” in last two years Storage demands

Page 4: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

GlusterFS (contd..)

Scale-out distributed storage system Scalability to petabyte & beyond Aggregate storage exports over network

interconnect to provide an unified namespace.

Page 5: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

GlusterFS (contd..)

Software only, runs on commodity hardware Scale-out with Elasticity No external meta-data server Flexiblity Extensible and modular

Page 6: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

BitRot – What is it ?

Page 7: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

BitRot Detection in GlusterFS

Signing

- Change notification

- Object versioning

- Checksumming

Scrubbing

- Access Check

Page 8: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Signing

A single daemon BitD per node.

This bitd will register with changelog xlator for all the brick of the node.

Page 9: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Signing (contd..)

Object expiry tracking by BitD Signing of expiry object's.

Page 10: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Signing (contd..)

Object versining

Version persist two extended attributes:

1). ongoing version

trusted.bitrot.version=0x020000000000000055f2ceed00065b1c

2). signing version

On going version Time stamp

Page 11: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Signing (contd..)

Object signing with hash. Store as an extended attributed of the object.

trusted.bitrot.signature=0x0102000000000000005891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03

sha256sumSigning version Object Data Checksum

Page 12: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Scrubbing

Object integrity verification. Dirty object Skipping Scrubber options Storing as a extended attribute of (bad) file

trusted.bit-rot.bad-file=0x3100

Constant value

Page 13: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Scrubbing (contd..)

Accessing bad file from mount point

- When volume is pure distributed

- When volume is distributed-replicate

[2015-09-11 08:05:00.204530] A [MSGID: 118023] [bit-rot-

scrub.c:226:bitd_compare_ckum] 0-vol-bit-rot-0: CORRUPTION DETECTED:

Object /file10 {Brick: /br1 | GFID: d1bad272-0f2b-453a-829d-52a2cb4ede52}

[2015-09-11 08:05:00.204685] A [MSGID: 118024] [bit-rot-

scrub.c:246:bitd_compare_ckum] 0-vol-bit-rot-0: Marking /file10 [GFID:

d1bad272-0f2b-453a-829d-52a2cb4ede52 | Brick: /br1] as corrupted..

Page 14: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

BitRot Correction

Page 15: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Performance

.

Page 16: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Demo

Page 17: BitRot (The silent corruption of data on disk ) detection in GlusterFS (GlusterFS_Meetup_Sep2015)

GlusterFS Meetup, Sep 2015

Q/A