Hash, Little Baby…
Some practical examples when SAS hash objects are really helpful
Dmitry ShopinData Analyst
BC Centre for Excellence in HIV/AIDS
Vancouver SAS User Group, 28 May 2014
AppenderAppender
LoggerLogger
What Are They
SAS Component ObjectsSAS Component Objects
HashHash Hash IteratorHash IteratorJava ObjectJava Object
What Are They Exactly
key var1 var2 var3 …
key var1 var2
DATASET PDV
HASH OBJECT
Request
Return
How To Work With Hash Objects
• Declare It
• Define It
• Load It
• Access/Change It
Using Dot.Notation:
A=Object.Attribute or RC=Object.Method(tag1:’value1’, …)
Hash Objects Classic: Look Up. 1/2
Patients Pat_HA
HA
Creating variables
Hash Objects Classic: Look Up. 2/2
data pat_ha; if _N_=1 then do; length ha 8 ha_name $20; declare hash h(); h.defineKey('ha'); h.defineData('ha_name'); h.defineDone();
do until(eof); set ha end=eof;
h.add(); end;
end;
set patient; h.find();run;
Loading dictionary
HA HA_NAME
Creating HASHHA HA_NAME
Interaction between hash object and data
key var1 var2 var3 … key var1 var2var1 = …
h.find()key var1 var2 var3 … key var1 var2
h.replace()h.add()
key var1 var2 var3 … key var1 var2
DATA HASH
Hash Objects Advantages
• Memory resident
• Direct-addressing
• Natural way to sort/get distinct values
Case 1. Dictionary-based replacement. 1/2
data address; if _N_=1 then do; if 0 then set dict; dcl hash h(dataset:'dict'); h.defineKey('type'); h.defineData('abbr'); h.defineDone(); end; set address; do i=1 to 99; call scan(street_addr, i, position, length); if not position then leave; type=substr(street_addr,position,length); rc=h.find(); if rc=0 then substr(street_addr,position,length)=abbr; end;
drop rc type abbr i position length;run;
Declares hash object during the 1st iteration
onlyAdds variables with all
attributes to PDV, but never reads their values
Loads the hash object from the dictionary
Grabs a wordExtracts a word and puts it into
the key variable
If found, replaces a word with
corresponding abbreviation
Case 1. Dictionary-based replacement. 2/2
Case 2. Multiple counts. 1/2
illness
# of visits? # of visits? # of visits?
+
Episodes of illness
Visits
data _NULL_; if _N_=1 then do; length before during after 8; dcl hash h(); h.defineKey('id'); h.defineData('id','start','end','before','during','after'); h.defineDone(); do until(eof1); set epi end=eof1; h.add(); end; end; set visits end=eof2; rc=h.find(); if rc=0 then do; select; when(visit<start) before+1; when(start<=visit<=end) during+1; when(end<visit) after+1; otherwise; end; h.replace(); end; if eof2 then h.output(dataset:'counts');run;
Case 2. Multiple counts. 2/2
Loads data with illness periods
If current patient found in hash object, increments corresponding
counter in data
Updates current hash object’s record
Outputs hash object as a dataset after the last visit has been
processed
Case 3. Find Some – Take All. 1/4
Tests
All tests of patients with 2+ tests >50
Case 3. Find Some – Take All. 2/4
id vl date
1 120 1-Jan-101 50 10-Mar-101 200 17-Jul-101 43 28-Feb-111 40 4-Aug-112 50 13-Apr-122 55 19-Sep-122 45 25-Dec-122 45 21-Jan-133 200 14-Feb-093 230 31-May-09
id
1
2
3
1
2
1
Hash object with unique IDs Hash object with multiple records per key (ID)
Case 3. Find Some – Take All. 3/4data _NULL_; if _N_=1 then do;
if 0 then set tests;
dcl hash h_test( dataset:'tests', multidata:'yes'); h_test.defineKey('id'); h_test.defineData(all:'yes'); h_test.defineDone();
dcl hash h_id(dataset: 'tests'); h_id.defineKey('id');
dcl hiter iter_id('h_id');
h_id.defineDone();
end;
…
Hash object with multiple data per
key
Hash object with unique IDs as a key.
No need for DefineData()
Iterator for the hash object with unique IDs
id
1
2
3
Loads data right awayid vl date
1 120 1-Jan-101 40 4-Aug-11
Uses all variables
Case 3. Find Some – Take All. 4/4
… rc=iter_id.first();
do while(rc=0); rc2= h_test.find();
i=0; do while(rc2=0 and i<2); if vl>50 then i+1; rc2= h_test.find_next(); end;
if i<2 then h_test.remove();
rc=iter_id.next();
end;
h_test.output(dataset:'high_2VL');
run;
Finds the first patient, using the iterator of the hash object with unique IDs
Iterates through all visits of the current patient, leaving when 2 found or no
more visits
If less than 2 visits, deletes all visits from the multidata hash object
Finds this patient in the multidata hash object
Finds the next patient
Case 4. Breadth First Tree Search. 1/5
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 2/5
Adjacency list (“edges”)
Connected components (“clusters”)
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
Chris
Elena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
ChrisElena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 3/5
John
David
Ken
ChrisElena
Adam
Fred
Berta
Mary
Peter
Vertices
Queue
John
David
Ken
Chris
Elena Adam
Fred
Berta
Mary
Peter
Case 4. Breadth First Tree Search. 4/5data _null_; dcl hash V(); V.defineKey('name'); V.defineData('name','cluster'); dcl hiter Vi('V'); V.defineDone();
dcl hash E(dataset:'Connections', multidata:'y'); E.defineKey('name'); E.defineData('name','friend'); E.defineDone();
dcl hash Q(ordered:'y'); Q.defineKey('qnum','name'); Q.defineData('qnum', 'name'); dcl hiter Qi('Q'); Q.defineDone();
do until(eof); set Connections end=eof; call missing(cluster); V.add(); end;
Hash object for Vertices, with iterator
Hash object for Edges
Hash object for Queue, with iterator
Loading the unique names
John
David
Ken
Chris
Queue
Selecting next name to start new cluster, when queue is empty
Dequeueing all names in queue one-by-one until it’s empty
Enqueueing all connections of dequeued name
rc1=Vi.first();do while(rc1=0); if missing(cluster) then do; qnum=1; Q.add(); n+1; cluster=n; V.replace(); rc2=Qi.first(); do while(rc2=0); qnum=qnum+Q.num_items-1; rc3=E.find(); do while(rc3=0); name=friend; rc4=V.find(); if rc4=0 and missing(cluster) then do; qnum+1; Q.add(); cluster=n; V.replace(); end; rc3=E.find_next(); end; Qi.first(); Qi.delete(); Q.remove(); Qi=_new_ hiter ('Q'); rc2=Qi.first(); end; end; rc1=Vi.next();end;V.output(dataset:'clusters');run;
Hash More!
?
Top Related