Data Management for Decision Support Session-5 Prof. Bharat Bhasker.
-
Upload
collin-porter -
Category
Documents
-
view
212 -
download
0
Transcript of Data Management for Decision Support Session-5 Prof. Bharat Bhasker.
Data Management for Decision Support
Session-5
Prof. Bharat Bhasker
Server Hardware ArchitectureDisk Technology -- RAID
• The DISK (i.e. I/O ) speed has not kept pace with the CPU speed
• I/O throughput is the weakest link in the chain
• Greatest Possibility of Failure => loss of data
What is required?
• A Robust reliable, possibly failsafe storage mechanism
• Devices with better I/O throughput
Server Hardware ArchitectureDisk Technology -- RAID
• Redundant Array of Independent Disks– Cheap (Small) disks can be combined to offer large storage
– Plug and Play
– Hot Swappable
– Reliability and Availability
– Disk Block Access = Seek Time + Block Transfer Time
Server Hardware ArchitectureDisk Technology -- RAID
• RAID- I
• Disk Mirroring/ Shadowing
• Based on VMS shadowing - uses two disks in place of one
• Both disk contain exact same copy of the data
• It’s a constant backup/shadow/mirror require twice the disk drive
• VMS model has common failure point
• RAID-1 has independent drive/controller/power
Server Hardware ArchitectureDisk Technology -- RAID
RAID- 3
• Data Stripping for fault tolerance
• Doesn’t require twice the disk for backup/mirroring
• Based on Parity drive I.e. one extra drive for recreating the data
• Assume five drives for data then RAID-3 needs 6 drives
• Stripping done at byte/bit level
5 1 3 4 2 15
5 1 ? 4 2 15
Server Hardware ArchitectureDisk Technology -- RAID
RAID- 4
• Data Stripping for fault tolerance
• Stripping done at Block level
• Better performance
• Assume five drives for data then RAID-3 needs 6 drives
• Parallel Reads from Multiple heads
5 1 3 4 2 15
5 1 ? 4 2 15
Server Hardware ArchitectureDisk Technology -- RAID
RAID- 5
• Data Stripping for fault tolerance
• Stripping done at Block/record segments level but parity is rotated
• In RAID 3/4 all drives used for reading/writing
• RAID 5 ability to read as many drives as it needs at the same time for different individual read/write requests
Data Organizations
Operations on organized data
– Find (Locate)
– Read (Get)
– FindNext
– Delete
– Insert
– Modify
– Findall
– Find Ordered
Data Organizations
Unordered File Organization
• Find - Average b/2 O(b)
• Read - O(1)
• Insert (1)
• Modify O(b)
• Delete O(b)
A
v
b
x
c
d
w
e
Data Organizations
Ordered File Organization
• Find - O(log b)
• Read - O(1)
• Insert O(b)
• Modify O(log b)
• Delete O(log b)
A
b
c
d
f
t
u
v
Data Organizations
Primary Index an ordered file with fixed record length and two fields- key field and block pointer
field. Primary index is built on ordering key field.
Abcd
t
A
f
j
tf
j
Data Organizations
Assume 30,000 records Blocksize =1024 bytes and R =100 bytes
Each block can store 1024/100 10 records.
Total block b = 3000
In Ordered files log (3000) = 12 block accesses
Ordering Key =9 bytes and Block pointer 6 bytes
Primary Index
R = 15 bytes records per block 1024/15 = 68
Blocks required to hold 3000 entries 3000/68 = 45 blocks
log2 (45) = 6 block accesses + 1 for data block
Data Organizations
Clustering Indexan ordered file with fixed record length and two fields- key field and block pointer
field. Primary index is built on file ordered on a non-key field.
AAAd
t
A
d
j
td
dj
j
Data Organizations
Secondary Index an ordered file with fixed record length and two fields- non ordering field and block
pointer field. Secondary index is built on non ordering field (dense).
Abcd
t
A
b
c
dx
ee
t
Data Organizations
Assume 30,000 records Blocksize =1024 bytes and R =100 bytes
Each block can store 1024/100 10 records.
Total block b = 3000
In Unordered files 3000/2 = 1500 block accesses
Ordering Key =9 bytes and Block pointer 6 bytes
Primary Index
R = 15 bytes records per block 1024/15 = 68
Each record requires an entry
Blocks required to hold 30000 entries 30000/68 = 442 blocks
log2 (442) = 9 block accesses + 1 for data block = 10 Block accesses
Data Organizations
Multi Data Pointers for Duplicate handling
Multi Level By creating a primary index on top of the base level
secondary index
442 blocks of ordered data can be addressed by primary key mechanism
of 68 entries per block 442/68 = 7
log (7) = 3 for locating the block in the secondary level + 1 for secondary level + 1 for data = 5 block accesses