Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

16
Data Management for Decision Support Session-5 Prof. Bharat Bhasker

Transcript of Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Page 1: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Management for Decision Support

Session-5

Prof. Bharat Bhasker

Page 2: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

• The DISK (i.e. I/O ) speed has not kept pace with the CPU speed

• I/O throughput is the weakest link in the chain

• Greatest Possibility of Failure => loss of data

What is required?

• A Robust reliable, possibly failsafe storage mechanism

• Devices with better I/O throughput

Page 3: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

• Redundant Array of Independent Disks– Cheap (Small) disks can be combined to offer large storage

– Plug and Play

– Hot Swappable

– Reliability and Availability

– Disk Block Access = Seek Time + Block Transfer Time

Page 4: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

• RAID- I

• Disk Mirroring/ Shadowing

• Based on VMS shadowing - uses two disks in place of one

• Both disk contain exact same copy of the data

• It’s a constant backup/shadow/mirror require twice the disk drive

• VMS model has common failure point

• RAID-1 has independent drive/controller/power

Page 5: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

RAID- 3

• Data Stripping for fault tolerance

• Doesn’t require twice the disk for backup/mirroring

• Based on Parity drive I.e. one extra drive for recreating the data

• Assume five drives for data then RAID-3 needs 6 drives

• Stripping done at byte/bit level

5 1 3 4 2 15

5 1 ? 4 2 15

Page 6: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

RAID- 4

• Data Stripping for fault tolerance

• Stripping done at Block level

• Better performance

• Assume five drives for data then RAID-3 needs 6 drives

• Parallel Reads from Multiple heads

5 1 3 4 2 15

5 1 ? 4 2 15

Page 7: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Server Hardware ArchitectureDisk Technology -- RAID

RAID- 5

• Data Stripping for fault tolerance

• Stripping done at Block/record segments level but parity is rotated

• In RAID 3/4 all drives used for reading/writing

• RAID 5 ability to read as many drives as it needs at the same time for different individual read/write requests

Page 8: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Operations on organized data

– Find (Locate)

– Read (Get)

– FindNext

– Delete

– Insert

– Modify

– Findall

– Find Ordered

Page 9: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Unordered File Organization

• Find - Average b/2 O(b)

• Read - O(1)

• Insert (1)

• Modify O(b)

• Delete O(b)

A

v

b

x

c

d

w

e

Page 10: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Ordered File Organization

• Find - O(log b)

• Read - O(1)

• Insert O(b)

• Modify O(log b)

• Delete O(log b)

A

b

c

d

f

t

u

v

Page 11: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Primary Index an ordered file with fixed record length and two fields- key field and block pointer

field. Primary index is built on ordering key field.

Abcd

t

A

f

j

tf

j

Page 12: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Assume 30,000 records Blocksize =1024 bytes and R =100 bytes

Each block can store 1024/100 10 records.

Total block b = 3000

In Ordered files log (3000) = 12 block accesses

Ordering Key =9 bytes and Block pointer 6 bytes

Primary Index

R = 15 bytes records per block 1024/15 = 68

Blocks required to hold 3000 entries 3000/68 = 45 blocks

log2 (45) = 6 block accesses + 1 for data block

Page 13: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Clustering Indexan ordered file with fixed record length and two fields- key field and block pointer

field. Primary index is built on file ordered on a non-key field.

AAAd

t

A

d

j

td

dj

j

Page 14: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Secondary Index an ordered file with fixed record length and two fields- non ordering field and block

pointer field. Secondary index is built on non ordering field (dense).

Abcd

t

A

b

c

dx

ee

t

Page 15: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Assume 30,000 records Blocksize =1024 bytes and R =100 bytes

Each block can store 1024/100 10 records.

Total block b = 3000

In Unordered files 3000/2 = 1500 block accesses

Ordering Key =9 bytes and Block pointer 6 bytes

Primary Index

R = 15 bytes records per block 1024/15 = 68

Each record requires an entry

Blocks required to hold 30000 entries 30000/68 = 442 blocks

log2 (442) = 9 block accesses + 1 for data block = 10 Block accesses

Page 16: Data Management for Decision Support Session-5 Prof. Bharat Bhasker.

Data Organizations

Multi Data Pointers for Duplicate handling

Multi Level By creating a primary index on top of the base level

secondary index

442 blocks of ordered data can be addressed by primary key mechanism

of 68 entries per block 442/68 = 7

log (7) = 3 for locating the block in the secondary level + 1 for secondary level + 1 for data = 5 block accesses