CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email:...

31
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: [email protected] Notes #6
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email:...

Page 1: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

CPSC-608 Database Systems

Fall 2008

Instructor: Jianer ChenOffice: HRBB 309BPhone: 845-4259Email: [email protected]

Notes #6

Page 2: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Optimizing Disk Access

Page 3: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Optimizing Disk Access

(By reducing the seek time and rotational delay

via Operating Systems and Disk Controllers)

Page 4: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Optimizing Disk Access

(By reducing the seek time and rotational delay

via Operating Systems and Disk Controllers)3 or 5x

x

1 NCylinders Traveled

Time

Average seek time = 20 msShortest seek time = 5 ms

Average rotational delay = 8 ms

Page 5: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Dealing with many random accesses(disk scheduling) • Suppose that we have a large (dynamic)

sequence of disk read/write tasks, on blocks randomly distributed in the disk.

• How do we order the tasks so that the total time can be minimized?

• Elevator algorithm

Page 6: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Elevator algorithm

Disk head makes sweeps across the disk, stops at a cylinder if a task reads/writes a block in the cylinder, and reverses its direction if no read/write tasks (at that moment) ahead.

Intuitively good, in particular when there are a large number of tasks reading/writing blocks uniformly distributed in the disk

Real-time mannerPrecise analysis is difficult

Page 7: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Dealing with a long sequence of data on disk • Data in consecutive cylinders• Larger buffer• Pre-fetch/double buffering• Disk arrays• Mirrored disks

Page 8: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Example. Sorting on disk (again)

• A relation R of 10M tuples takes 100K blocks

• Main memory can store 6400 blocks• A disk block read/write: 40 ms

seek time = 31 ms, rotational delay = 8 ms transfer time = 1 ms

• Two-phase Multiway Sorting on randomly distributed blocks takes about 4.5 hours.

• Also assume that a track holds 500 blocks, and that traversing one cylinder takes 5 ms

Page 9: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Data in consecutive cylinders

In phase 1, suppose that we have the input relation stored in consecutive tracks

We can read/write 6400 consecutive blocks between main memory and disks

Phase 1 read/write: 2*(100K/6400)(31 + 8 + 12*5 + 6400*1) ≈ 208000 ms < 4 minutes (save 2 hour)

Page 10: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Larger Buffer

In phase 2, we have 16 sublists, each takes a block in main memory, with 6384 blocks left.

If we use all these 6384 blocks for output buffer, and write them to disk only when they are all full:

Phase 2 writing: (100K/6384)(31 + 8 + 12*5 + 6384*1) = 104000 ms < 2 minutes (save 1

hour)

Page 11: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

However

The reading in phase 2 seems harder to improve: it is kind of random.

Page 12: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Double Buffering

For applications where the read/write is predictable.

Have a program» Process B1» Process B2» Process B3

...

Page 13: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Single Buffer Solution

1. Read B1 Buffer2. Process Data in Buffer3. Read B2 Buffer4. Process Data in Buffer ...

Let P = time to process/blockR = time to read in 1 blockn = # blocks

Single buffer time = n(P+R)

Page 14: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Double Buffering

Memory:

Disk: A B C D GE F

Page 15: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Double Buffering

Memory:

Disk: A B C D GE F

A

Page 16: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Double Buffering

Memory:

Disk: A B C D GE F

B

done

process

A

Page 17: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Double Buffering

Memory:

Disk: A B C D GE F

C

process

B

done

Page 18: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

if P R

• Double buffering time = R + nP

• Single buffering time = n(R+P)

P = Processing time/blockR = IO time/blockn = # blocks

Page 19: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk ArraysTaking the advantage that disk read/write

can be done in parallel between a single CPU and multiple disks.

logically one disk

Would not help if the interesting blocks are in the same disk

Page 20: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Mirrored Disks

Duplicating disks so that multiple reads in the same disk can be done in parallel.

A A B B

Writing is more (but not much) expensive

Page 21: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Failures

• Partial Total• Intermittent Permanent

Page 22: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Coping with Disk Failures

• Detection– Checksum

• Correction– Redundancy

Page 23: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

At what level do we cope?

Operating System Level (Stable Storage)

Logical block Copy A Copy B

Database System Level (Log File)

Log

Current DB Yesterday’s DB

Page 24: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Intermittent Failure Detection (Checksums)

• Idea: add n parity bits every m data bits– Ex.: m=8, n=1

• Block A: 01101000:1 (odd # of 1’s)• Block B: 11101110:0 (even # of 1’s)

• But suppose: Block A instead contains• Block A’: 01000000:1 (also has odd # of 1’s) 50% change of detection per parity bit

• More parity bits decrease the probability of an

undetected failure 1/2n (with n ≤ m independent parity bits)

Page 25: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Crash (Disk Arrays)• RAIDs (Redundant Arrays of Inexpensive Drives)

logically one disk

Page 26: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Arrays• RAID Level 1 (Mirroring)

– Keep exact copy of data on redundant disks

AA BB AA BB

Page 27: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Arrays• RAID Level 4

– Keep only one redundant disk– Entire parity blocks on redundant disk

AA BB CC PP

Page 28: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Parity Blocks & Modulo-2 Sums

• Have an array of 3 data disks– Disk 1, block 1: 11110000– Disk 2, block 1: 10101010– Disk 3, block 1: 00111000

• … and 1 parity disk– Disk 4, block 1: 01100010

Note: - Sum over each column is always an even # of 1’s

- Mod-2 sum can recover any missing single row (e.g., a logical block)

Page 29: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Using Mod-2 Sums for Error Recovery

– Suppose we have:– Disk 1, block 1: 11110000

– Disk 2, block 1: ????????– Disk 3, block 1: 00111000– Disk 4, block 1: 01100010 ( Parity)

– Mod-2 sums for block 1 over disks 1,3,4: Disk 2, block 1: 10101010

Page 30: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Arrays

• RAID Level 5 (Striping)– Like level 4, but balanced read & write

load

DDCCBBAA

Parity partition on each disk

Page 31: CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email: chen@cs.tamu.edu Notes #6.

Disk Arrays

• RAID Level 6 (error correction code) more powerful, can recover from

more than one task crashes.