SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse:...
-
Upload
destiny-pipp -
Category
Documents
-
view
227 -
download
1
Transcript of SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse:...
![Page 1: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/1.jpg)
SUGI 28, PAPER 4
![Page 2: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/2.jpg)
SUGI 25
Paper 129: Paul M. DorfmanPrivate Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing
SUGI 26Paper 8: Paul M. DorfmanTable Look-Up by Direct Addressing: Key-Indexing — Bitmapping — Hashing
Paper 128: Paul M. DorfmanQuick Disk Table Look-up via Hybrid Indexing into a Directly Addressed SAS Data Set
SUGI 27Paper 12: Paul M. Dorfman and Gregg P. SnellHashing Rehashed
GENERATION I
Hand-Coded Direct-Addressing Routines Presented At:
![Page 3: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/3.jpg)
Associative Array
SUGI 27Presentation: Jason Secosky
The DATA step in Version 9: What’s New?
SAS® Version 9 DATA Step Object: Associative Array
Hash
GENERATION II
==
1
![Page 4: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/4.jpg)
2
Main Entry: pro·pae·deu·tic Pronunciation: "prO-pi-'dü-tik, -'dyü-Function: nounEtymology: Greek propaideuein to teach beforehand, from pro- before + paideuein to teach, from paid-, pais childDate: 1798
1: preparatory study or instruction
Merriam-Webster online http://www.m-w.com
PROPAEDEUTICS
1
![Page 5: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/5.jpg)
3
Main Entry: di·rect ad·dress·ing Pronunciation: "d&-'rekt, &-'dres-ingFunction: speedy table lookup
1: accessing key values “directly” by their location (address, node) in a table, as opposed to searching for them by comparing the search key to all or some table values
2: key-indexed search
Hashing Rehashed. SUGI 27, Orlando, FL, 2002.
DIRECT ADDRESSING
21
![Page 6: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/6.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
DEFINE
KEY-INDEXING
43
![Page 7: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/7.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
00
LOAD
DEFINE
KEY-INDEXING
2
![Page 8: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/8.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
00
01
DEFINE
LOAD
KEY-INDEXING
2
![Page 9: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/9.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
00
02
01
DEFINE
LOAD
KEY-INDEXING
2
![Page 10: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/10.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
00
02
01
DEFINE
LOAD
KEY-INDEXING
2
![Page 11: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/11.jpg)
WORK.SMALL
KEY S_SAT
185 00
971 01
400 02
260 03
922 04
970 05
543 06
532 07
050 08
067 09
data match ; array hkey (0:999) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; if missing (hkey(key)) then hkey(key) = s_sat ; end ; do until ( eof2 ) ; set large end = eof2 ; s_sat = hkey(key) ; if s_sat > . then output ; end ; stop ;run ;
ARRAY HKEY(0:999)
HKEY(000)=.
…
HKEY(185)=.
HKEY(400)=.
HKEY(971)=.
…
HKEY(999)=.
00
01
02
SEARCH
DEFINE
LOAD
KEY-INDEXING
1
![Page 12: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/12.jpg)
How can one possibly use key-indexing when SSN (or any other large or non-integer value) is the key?
LIMITATIONS
Key Must Be Integer
Key Range Limited By Memory
(9-Digit SSN Would Require 8 GB)
So you may be asking yourself...
KEY-INDEXING
321
![Page 13: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/13.jpg)
Main Entry: 1hash Pronunciation: 'hashFunction: transitive verbEtymology: French hacher, from Old French hachier, from hache battle-ax, of Germanic origin; akin to Old High German hAppa sickle; akin to Greek koptein to cutDate: 1590
1 a : to chop (as meat and potatoes) into small pieces
Merriam-Webster online http://www.m-w.com
HASHING
21
![Page 14: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/14.jpg)
Main Entry: 1hash·ing Pronunciation: 'hash-ingFunction: make BIG keys smaller
1: converting a long-range key (numeric or character) to a smaller-range integer number with a mathematical algorithm or function that must be:
• Rapidly computable
• Distribute the resulting keys uniformly
• Produce an integer in [0:HSIZE-1] range
2: H = MOD(KEY,HSIZE)
Private Detectives in a Data Warehouse. SUGI 25, Indianapolis, IN, 2000.
HASHING
21
![Page 15: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/15.jpg)
GENERATIONS
Which generation of hashing is the most efficient for subsetting LARGE based on the values of KEY in SMALL to produce a file MATCH?
data small ; input key s_sat ; cards ; 185 00 971 11 400 22 260 33 922 44 970 55 543 66 532 77 050 88 067 99 ; run ;
Our Mission:Match unsorted SMALL
(satellite variable S_SAT)To unsorted LARGE
LARGE cannot be sorted
Assume:
Memory will hold SMALL
(integer variable KEY)
7654321
![Page 16: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/16.jpg)
h = mod(key,&hsize);
%let load = 0.625;
data _null_;
do p=ceil(p/&load) by 1 until(j=up+1);
up = ceil(sqrt(p));
do j=2 to up until(not mod(p,j)); end;
end;
call symput('hsize',left(put(p,best.)));
stop;
set small nobs=p;
run;
%put hsize=&hsize;
hash_size=17
GENERATION I
Choose a proper function
Determine Load Factor
Calculate Optimal Array Size
KEY HASH_ADDR
185
971
400
260
922
970
543
532
050
067
Hash15
Hash
Hash
Hash
Hash
Hash
HashHash
Hash
Hash
02
09
05
04
01
16
05
16
16
Collision Resolution(Linear Probing)
54321
![Page 17: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/17.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT
185 00
971 11
400 22
260 33
922 44
970 55
543 66
532 77
050 88
067 99
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=. (02)=.
(03)=. (03)=.
(04)=. (04)=.
(05)=. (05)=.
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=. (09)=.
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
DEFINELOADSEARCH
DEFINE• Load Factor• Array Size• Define
13121110
![Page 18: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/18.jpg)
KEY S_SAT
185 00
971 11
400 22
260 33
922 44
970 55
543 66
532 77
050 88
067 99
KEY S_SAT H(KEY)
185 00
971 11
400 22
260 33
922 44
970 55
543 66
532 77
050 88
067 99
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=. (02)=.
(03)=. (03)=.
(04)=. (04)=.
(05)=. (05)=.
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=. (09)=.
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
15
185 00
9
![Page 19: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/19.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11
400 22
260 33
922 44
970 55
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=. (02)=.
(03)=. (03)=.
(04)=. (04)=.
(05)=. (05)=.
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=. (09)=.
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=185 (15)=00
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
02
11971
9
![Page 20: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/20.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22
260 33
922 44
970 55
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=971 (02)=11
(03)=. (03)=.
(04)=. (04)=.
(05)=. (05)=.
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=. (09)=.
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
09
22400
9
![Page 21: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/21.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33
922 44
970 55
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=971 (02)=11
(03)=. (03)=.
(04)=. (04)=.
(05)=. (05)=.
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
05
33260
9
![Page 22: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/22.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44
970 55
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=971 (02)=11
(03)=. (03)=.
(04)=. (04)=.
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
04 44922
9
![Page 23: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/23.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=. (01)=.
(02)=971 (02)=11
(03)=. (03)=.
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
01
55970
9
![Page 24: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/24.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55 01
543 66
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=970 (01)=55
(02)=971 (02)=11
(03)=. (03)=.
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=. (16)=.
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
16
66543
9
![Page 25: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/25.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55 01
543 66 16
532 77
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=970 (01)=55
(02)=971 (02)=11
(03)=. (03)=.
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=543 (16)=66
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
77532
05
98
![Page 26: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/26.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55 01
543 66 16
532 77 05
050 88
067 99
DEFINE
HKEY HSAT
(00)=. (00)=.
(01)=970 (01)=55
(02)=971 (02)=11
(03)=. (03)=.
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=543 (16)=66
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
16
88050
765
![Page 27: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/27.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55 01
543 66 16
532 77 05
050 88 16
067 99
DEFINE
HKEY HSAT
(00)=050 (00)=88
(01)=970 (01)=55
(02)=971 (02)=11
(03)=. (03)=.
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=543 (16)=66
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
16
99067
4321
![Page 28: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/28.jpg)
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat); retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_; array hsat (0:&hsize) _temporary_; ...
GENERATION I
KEY S_SAT H(KEY)
185 00 15
971 11 02
400 22 09
260 33 05
922 44 04
970 55 01
543 66 16
532 77 05
050 88 16
067 99 16
DEFINE
HKEY HSAT
(00)=050 (00)=88
(01)=970 (01)=55
(02)=971 (02)=11
(03)=067 (03)=99
(04)=922 (04)=44
(05)=260 (05)=33
(06)=. (06)=.
(07)=. (07)=.
(08)=. (08)=.
(09)=400 (09)=22
(10)=. (10)=.
(11)=. (11)=.
(12)=. (12)=.
(13)=. (13)=.
(14)=. (14)=.
(15)=. (15)=.
(16)=543 (16)=66
(17)=. (17)=.
*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep=key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey(0:&hsize) _temporary_; array hsat(0:&hsize) _temporary_; do until ( eof1 ) ; set small end = eof1 ; do h=mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h)=key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; ...
LOAD
185 00
SEARCH
...
do until ( eof2 ) ; set large end = eof2 ; do h=mod (key, &hsize) by +1 until ( hkey(h) = . ) ; if h = &hsize then h = 0 ; if hkey(h) = key then do ; s_sat = hsat(h) ; output ; if nodupes then leave ; end ; end ; end ; stop ; run ;
![Page 29: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/29.jpg)
h = mod(key,&hsize);
%let load = 0.625;
data _null_;
do p=ceil(p/&load) by 1 until(j=up+1);
up = ceil(sqrt(p));
do j=2 to up until(not mod(p,j)); end;
end;
call symput('hsize',left(put(p,best.)));
stop;
set small nobs=p;
run;
%put hsize=&hsize;
hash_size=17
Choose a proper function
Determine Load Factor
Calculate Optimal Array Size
KEY HASH_ADDR
185
971
400
260
922
970
543
532
050
067
Hash15
Hash
Hash
Hash
Hash
Hash
HashHash
Hash
Hash
02
09
05
04
01
16
05
16
16
Collision Resolution(Linear Probing)
GENERATION II
54321
![Page 30: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/30.jpg)
GENERATION II
DEFINE
LOAD
SEARCH
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); rc = hh.DefineKey ( 'key' ); rc = hh.DefineData ( 's_sat' ); rc = hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; rc = hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; rc = hh.find () ; if rc = 0 then output ; end ; stop ; run ;
LOAD
SEARCH
DEFINE
321
![Page 31: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/31.jpg)
return code = object.method;
GENERATION II
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); rc = hh.DefineKey ( 'key' ); rc = hh.DefineData ( 's_sat' ); rc = hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; rc = hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; rc = hh.find () ; if rc = 0 then output ; end ; stop ; run ;
data match ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
data match ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
dcl hash hh ();
Parameter Type MatchingVerbose Syntax
DEFINE
7654321
set small(keep=key s_sat) point = _n_ ;
![Page 32: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/32.jpg)
GENERATION II
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); rc = hh.DefineKey ( 'key' ); rc = hh.DefineData ( 's_sat' ); rc = hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; rc = hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; rc = hh.find () ; if rc = 0 then output ; end ; stop ; run ;
LOAD
data match ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
DEFINE set small(keep=key s_sat) point = _n_ ;dcl hash hh ();(dataset:'small');
1
![Page 33: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/33.jpg)
GENERATION II
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); rc = hh.DefineKey ( 'key' ); rc = hh.DefineData ( 's_sat' ); rc = hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; rc = hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; rc = hh.find () ; if rc = 0 then output ; end ; stop ; run ;
LOAD
SEARCH
data match ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof1 ) ; set small end = eof1 ; hh.add () ; end ; do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
DEFINE set small(keep=key s_sat) point = _n_ ;dcl hash hh ();(dataset:'small');
![Page 34: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/34.jpg)
DEFINE
GENERATIONS
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep = key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_ ; array hsat (0:&hsize) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; do h = mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h) = key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; do until ( eof2 ) ; set large end = eof2 ; do h = mod (key, &hsize) by +1 until ( hkey(h) = . ) ; if h = &hsize then h = 0 ; if hkey(h) = key then do ; s_sat = hsat(h) ; output ; if nodupes then leave ; end ; end ; end ; stop ; run ;
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
set small(keep=key s_sat) point = _n_ ;dcl hash hh ();(dataset:'small');
32
![Page 35: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/35.jpg)
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
set small(keep=key s_sat) point = _n_ ;dcl hash hh ();(dataset:'small');
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep = key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_ ; array hsat (0:&hsize) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; do h = mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h) = key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; do until ( eof2 ) ; set large end = eof2 ; do h = mod (key, &hsize) by +1 until ( hkey(h) = . ) ; if h = &hsize then h = 0 ; if hkey(h) = key then do ; s_sat = hsat(h) ; output ; if nodupes then leave ; end ; end ; end ; stop ; run ;
LOAD
GENERATIONS
1
![Page 36: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/36.jpg)
data match ( drop = rc ) ; length key $9 s_sat 8 ; declare AssociativeArray hh (); hh.DefineKey ( 'key' ); hh.DefineData ( 's_sat' ); hh.DefineDone (); do until ( eof2 ) ; set large end = eof2 ; if hh.find ()=0 then output ; end ; stop ; run ;
set small(keep=key s_sat) point = _n_ ;dcl hash hh ();(dataset:'small');
%let load = 0.625 ; data _null_ ; do p=ceil(p/&load) by 1 until(j=up+1); up = ceil(sqrt(p)); do j=2 to up until(not mod(p,j)); end; end; call symput('hsize', left(put(p,best.))); stop; set small nobs=p;run;*0 if dupes to be pulled ; %let nodupes = 1 ; data match (keep = key s_sat l_sat) ; retain nodupes &nodupes.. ; array hkey (0:&hsize) _temporary_ ; array hsat (0:&hsize) _temporary_ ; do until ( eof1 ) ; set small end = eof1 ; do h = mod (key, &hsize) by +1 ; if h = &hsize then h = 0 ; if hkey(h) = key and nodupes
then leave ; if hkey(h) = . then do ; hkey(h) = key ; hsat(h) = s_sat ; leave ; end ; end ; end ; do until ( eof2 ) ; set large end = eof2 ; do h = mod (key, &hsize) by +1 until ( hkey(h) = . ) ; if h = &hsize then h = 0 ; if hkey(h) = key then do ; s_sat = hsat(h) ; output ; if nodupes then leave ; end ; end ; end ; stop ; run ;
SEARCH
GENERATIONS
![Page 37: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/37.jpg)
Modify your hash code
• %let nodupes = 1 ; • if nodupes then leave ;
Three options !
• hh.add (); Keeps 1st occurrence
• hh.replace (); Keeps last
• create additional keys to discriminate further
DUPLICATE KEYS
CHALLENGES
4321
![Page 38: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/38.jpg)
Change your hash function
• rescale the key h = mod (key*1000, &hsize)
• use a numeric informat h=mod (input(key,16.), &hsize)
Change your array declaration
• array hkey(0:&hsize) $9 _temporary_
Already taken care of !
• set small(keep=key s_sat) point = _n_ ;
• hh.DefineKey ( 'key' );
CHALLENGES
21
NON-INTEGER KEYS• Fractional signed SAS numbers• Digit strings (char var of digits)• Arbitrary characters
![Page 39: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/39.jpg)
DYNAMIC TABLE PROCESSING
Simple coding
• DO LOOP through the array
do h=0 to &hsize – 1; if missing(hkey(h)) then continue; key = hkey(h); s_sat = hsat(h); < … further processing … >end;
GENERATION I GENERATION II
Use the HASH ITERATOR !
Dcl hash hh (dataset: ‘’sample’, ordered: 1);dcl hiter hi (‘hh’);< … hh.define stuff … >do rc=hi.first() by 0 while (rc=0); (key is automatically populated) rc=hi.next();end;
CHALLENGES
654321
![Page 40: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/40.jpg)
COMPOSITE KEYS
Change your hash function
• Possible but VERY difficult
• Complexity grows as the number of components to the key grows
GENERATION I GENERATION II
Just add the key(s) !
• set small(keep=k1 k2 k3 s_sat) point = _n_ ;
• hh.DefineKey ( 'k1', 'k2', 'k3' );
CHALLENGES
321
![Page 41: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/41.jpg)
PEEK UNDER THE HOOD
STARFLEET COMMAND ENGINEERINGACCESSING VERSION 9 SCHEMATICSACCESSING VERSION 9 SCHEMATICSACCESSING VERSION 9 SCHEMATICSACCESSING VERSION 9 SCHEMATICSACCESSING VERSION 9 SCHEMATICSACCESSING VERSION 9 SCHEMATICS
![Page 42: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/42.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACEHASH ITERATORADELSON-VELSKII & LANDIS TREES
![Page 43: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/43.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
declare AssociativeArray hh
• instantiates or creates the object
( dcl hash myhash )
![Page 44: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/44.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.DefineKey
• define a set of hash keys
![Page 45: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/45.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.DefineData
• define a set of hash table satellites
![Page 46: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/46.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.DefineDone
• tell SAS the definitions are done
![Page 47: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/47.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.Add
• insert the key and satellites
(if the key is not yet in the table)
![Page 48: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/48.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.Replace
• insert the key and satellites
(overwrites any existing data)
![Page 49: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/49.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.Find
• search for the key
• if found, extract the satellite
and update the host DATA STEP variables
![Page 50: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/50.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.Check
• search for the key
• if found, just return rc=0
![Page 51: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/51.jpg)
PEEK UNDER THE HOOD
DATA STEP COMPONENT INTERFACE
hh.Delete
• delete the hash table from memory
![Page 52: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/52.jpg)
PEEK UNDER THE HOOD
HASH ITERATOR
declare hiter hi
dcl hiter myhashiterator
• instansiates or creates the object
(don’t forget to add “ordered: 1”
to the hash declaration)
![Page 53: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/53.jpg)
PEEK UNDER THE HOOD
HASH ITERATOR
hi.First
• fetch the smallest key into the host variable
![Page 54: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/54.jpg)
PEEK UNDER THE HOOD
HASH ITERATOR
hi.Next
• fetch the next key in ascending order
![Page 55: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/55.jpg)
PEEK UNDER THE HOOD
HASH ITERATOR
hi.Last
• fetch the largest key into the host variable
![Page 56: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/56.jpg)
PEEK UNDER THE HOOD
HASH ITERATOR
hi.Prev
• fetch the previous key in descending order
![Page 57: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/57.jpg)
PEEK UNDER THE HOOD
ADELSON-VELSKII & LANDIS TREES
http://www.seanet.com/users/arsen/avltree.html
Binary Trees populated such that they average O(log(N)) search behavior regardless of the distribution.
For example:
• Insert the value: 05• Insert the value: 02
AVL maintains balance by rotating the values and preserving the search structure
HASHEXP
• controls the number of trees to create
• 2**
• EXP=16 HSIZE=65,536
HASHEXP
54321
![Page 58: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/58.jpg)
CONCLUSION
Simple numeric key falling in a limited range
• SAS date and time values are good examples
• This is the area where Generation I key-indexed search complete dominates the competition both in computer and programming efficiency
A short comparison of I vs. II
4321
![Page 59: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/59.jpg)
CONCLUSION
Simple numeric key with the range up to 9 digits; no satellites needed
• Bitmapping is king
1
![Page 60: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/60.jpg)
CONCLUSION
Simple numeric key or short (up to 10 bytes) character key
• Both generations do well
• If ultimate speed is the issue, Generation I (barely)
• Generation II has the advantage of coding simplicity
321
![Page 61: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/61.jpg)
CONCLUSION
Composite keys
• Generation I is better if the keys can be rapidly combined in a short integer
• Otherwise, Generation II dominates
21
![Page 62: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/62.jpg)
CONCLUSION
Retrieving data by key from a hash table in order
• Generation I can provide such functionality only through array sorting
• Generation II hash iterator object is designed for this purpose, works very fast
21
![Page 63: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/63.jpg)
CONCLUSION
Storing and handeling duplicate key entries in a hash table
• Generation I is more flexible
• Generation II only lets you control which duplicate takes over
2121
![Page 64: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/64.jpg)
CONCLUSION
Dynamic Data Step Processing
• Generation II is ideal
• Table grows at run-time as new entries are added• No need to allocate giant memories beforehand
321
![Page 65: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/65.jpg)
CONCLUSION
Dynamic DATA Step Structure
is the first ever…
Finally…
Although still experimental in V9…
The Generation II Hash Object
(AssociativeArray)
4321
![Page 66: SUGI 28, PAPER 4 SUGI 25 Paper 129: Paul M. Dorfman Private Detectives in a Data Warehouse: Key-Indexing, Bitmapping, and Hashing SUGI 26 Paper 8: Paul.](https://reader031.fdocuments.in/reader031/viewer/2022031909/56649cc25503460f94989c39/html5/thumbnails/66.jpg)
HASHING:
GENERATIONS
SUGI 28
GENERATIONS