Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
-
Upload
breanna-windon -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
![Page 1: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/1.jpg)
1
Yet More on Indexes
Hash Tables
Source: our textbook, slides by Hector Garcia-Molina
![Page 2: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/2.jpg)
2
Main Memory Hash Tables
A hash function h maps search keys to integers in some range 0 to B-1
B is the number of buckets There is a B-element array, each
entry holds a pointer to a linked list Record with key k is put in the
linked list that starts at entry h(k) of B.
![Page 3: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/3.jpg)
3
Example of Hash Table
0
1
2
3
4
15 10
22
104 29
34
B = 5
h(k) = k mod 5
![Page 4: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/4.jpg)
4
Changes for Secondary Storage
Bucket array contains blocks, not pointers to linked lists
Records that hash to a certain bucket are put in the corresponding block
If a bucket overflows then start a chain of overflow blocks
![Page 5: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/5.jpg)
5
Insertion into Static Hash Table
To insert a record with key K: compute h(K) insert record into one of the blocks
in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary
![Page 6: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/6.jpg)
6
EXAMPLE 2 records/bucket
INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0
0
1
2
3
d
ac
b
h(e) = 1
e
![Page 7: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/7.jpg)
7
Deletion from a Static Hash Table
To delete records with key K: Go to the bucket numbered h(K) Search for records with key K,
deleting any that are found Possibly condense the chain of
overflow blocks for that bucket
![Page 8: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/8.jpg)
8
0
1
2
3
a
bc
e
d
EXAMPLE: deletion
Delete:ef
fg
maybe move“g” up
cd
![Page 9: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/9.jpg)
9
Rule of thumb: Try to keep space utilization
between 50% and 80% Utilization = # record used
total # records that fit
If < 50%, wasting space If > 80%, overflows significant
depends on how good hashfunction is & on # records/bucket
![Page 10: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/10.jpg)
10
Efficiency of Static Hash Tables
If the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks
In this case lookup typically takes one disk I/O and insertion/deletion take two
Significantly better than sequential indexes and B-trees
(But: hash tables do not support efficient range queries as B-trees do)
What if there are long overflow blocks?
![Page 11: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/11.jpg)
11
How do we cope with growth?
Overflows and reorganizations Dynamic hashing
Extensible Linear
![Page 12: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/12.jpg)
12
Extensible Hash Tables
Each bucket in the bucket array contains a pointer to a block, instead of a block itself
Bucket array can grow by doubling in size Certain buckets can share a block if small
enough hash function computes a sequence of k
bits, but only first i bits are used at any time to index into the bucket array
Value of i can increase (corresponds to bucket array doubling in size)
![Page 13: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/13.jpg)
14
(b) Use directory
h(K)[i ] to bucket
.
.
.
.
![Page 14: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/14.jpg)
15
Inserting into Extensible Hash Table
To insert record with key K: compute h(K) go to bucket indexed by first i bits of h(K) follow the pointer to get to block B if room in B, insert record else let j be number of bits of hash value
used to determine membership in B
![Page 15: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/15.jpg)
16
Insertion cont'd
Case 1: j < i. split block B in two distribute records in B to the 2 new blocks
based on value of their (j+1)-st bit update header of each new block to j+1 adjust pointers in bucket array so that
entries that used to point to B now point to correct block
if still no room in appropriate block for new record then repeat this process
![Page 16: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/16.jpg)
17
Insertion cont'd
Case 2: j = i. increment i by 1 double length of bucket array entry for w0 and w1 both point to
same block that old entry w pointed to (block is shared)
apply case 1 to split block B
![Page 17: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/17.jpg)
18
Example: h(k) is 4 bits; 2 keys/bucket
i = 1
1
1
0001
1001
1100
Insert 1010
11100
1010
New directory
200
01
10
11
i =
2
2
![Page 18: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/18.jpg)
19
10001
21001
1010
21100
Insert:
0111
0000
00
01
10
11
2i =
Example continued
0111
0000
0111
0001
2
2
![Page 19: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/19.jpg)
20
00
01
10
11
2i =
21001
1010
21100
20111
20000
0001
Insert:
1001
Example continued
1001
1001
1010
000
001
010
011
100
101
110
111
3i =
3
3
![Page 20: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/20.jpg)
21
Extensible hashing: deletion
No merging of blocks Merge blocks
and cut directory if possible(Reverse insert procedure)
![Page 21: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/21.jpg)
22
Extensible hashing
Can handle growing files- with less wasted space- with no full reorganizations
Summary
+
Indirection(Not bad if directory in
memory)
Directory doubles in size(Now it fits, now it does not)
-
-
![Page 22: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/22.jpg)
23
Linear Hash Tables
Number of buckets increases more slowly than with extensible hashing
Number of buckets is such that on average each block is x% full (say 80%) -- threshold
Overflow blocks can occur but average number per bucket << 1
Use the i low-order bits from the result of the hash function to index into the bucket array
![Page 23: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/23.jpg)
24
Linear hashing Another dynamic hashing scheme
Two ideas:(a) Use i low order bits of hash
01110101grows
b
i
(b) Bucket array grows linearly
![Page 24: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/24.jpg)
25
Inserting into Linear Hash Table
To insert record with key K, with last i bits of h(K) being a1a2…ai :
Let m be the integer represented by a1a2…ai in binary
If m < n (number of buckets), then bucket m exists -- put record in that bucket
If m ≥ n, then bucket m does not (yet) exist, so put record in bucket whose index corresponds to 0a2…ai
![Page 25: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/25.jpg)
26
Inserting cont'd
If no room in indicated bucket, then create an overflow bucket
Compare # records / # buckets to threshold
If exceeds threshold then add a new bucket and rearrange records
If number of buckets exceeds i, then increment i by 1
![Page 26: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/26.jpg)
27
Example b=4 bits, i =2, 2 keys/bucket
00 01 10 11
0101
1111
0000
1010
m = 01 (max used block)
Futuregrowthbuckets
If h(k)[i ] m, then look at bucket h(k)[i ]
else, look at bucket h(k)[i ] - 2i -1
Rule
0101• can have overflow chains!
• insert 0101
![Page 27: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/27.jpg)
28
Example b=4 bits, i =2, 2 keys/bucket
00 01 10 11
0101
1111
0000
1010
m = 01 (max used block)
Futuregrowthbuckets
10
1010
0101 • insert 0101
11
11110101
![Page 28: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/28.jpg)
29
Example Continued: How to grow beyond this?
00 01 10 11
111110100101
0101
0000
m = 11 (max used block)
i = 2
0 0 0 0100 101 110 111
3
. . .
100
100
101
101
0101
0101
![Page 29: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/29.jpg)
30
Linear Hashing
Can handle growing files- with less wasted space- with no full reorganizations
No indirection like extensible hashing
Summary
+
+
Can still have overflow chains-
![Page 30: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/30.jpg)
31
Hashing good for probes given keye.g., SELECT …
FROM RWHERE R.A = 5
Comparing Index Approaches
![Page 31: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/31.jpg)
32
Sequential Indexes and B-trees good for
Range Searches:e.g., SELECT
FROM RWHERE R.A > 5
Indexing vs Hashing
![Page 32: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/32.jpg)
33
Index definition in SQL
Create index name on rel (attr) Create unique index name on rel
(attr)defines candidate key
Drop INDEX name
![Page 33: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.](https://reader036.fdocuments.in/reader036/viewer/2022070306/5519b1e35503467a578b4628/html5/thumbnails/33.jpg)
34
CANNOT SPECIFY TYPE OF INDEX
(e.g. B-tree, Hashing, …)
OR PARAMETERS(e.g. Load Factor, Size of
Hash,...)
... at least in SQL...
Note