The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter
-
Upload
aimee-cobb -
Category
Documents
-
view
16 -
download
0
description
Transcript of The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter
Slide 2
About the Author
• Wrote two previous SAS books and over twenty SAS technical papers.
• Section Chair at SUGI, SESUG and NESUG and co-chaired NESUG in 1995.
• First book entitled Tuning SAS Applications in the MVS Environment resides in the Smithsonian Institution of American History’s Permanent Research Collection of Information Technology.
Slide 3
Goals of Indexing
• The main goal of using a SAS index is to read only a small portion of a large SAS dataset instead of reading the entire SAS dataset.
• Reduce I/Os.
• Reduce wall clock time.
• Reduce CPU time.
Slide 4
When to Index
Consider
• the size of the subsets (the overhead of using an index can become greater than the overhead of a sequential read of the entire dataset)
• frequency of use (is building an index cost effective)
• variability of the data (indexed SAS datasets have an overhead to maintain if data changes often and may not be cost effective)
Slide 5
1% to 15% An index will definitely improve processing. There should be dramatic resource savings in the lower end of this range.
16% to 30% An index will improve processing. However, the resource savings will not be as dramatic as in the lower range.
31% to 60% An index may improve processing, or it might worsen processing. Be very careful in this subset range.
61% to 100% Do not use an index. A sequential read of the entire dataset is very likely to be more efficient.
Indexing ActionSubset Size
Indexing Guidelines(Table 2.1 from book)
Slide 6
Index Variable Selection
What variables to consider for an index?
• Often used to subset data
• Values that represent a small subset
• A SAS dataset sorted by the index variable is more efficient
Slide 7
How to create an Index
• Data Step Method
DATA data-set-name(INDEX=(index-name=(var1 var2 etc.) </UNIQUE> </NOMISS>));
• SQL Method
proc sql;CREATE <UNIQUE> INDEX index-name ON table-name ( column
<, ... column>); quit;
Slide 8
Viewing Index Information
proc contents data=data-set-name <centiles>;run;
Indexes 1
Alphabetic List of Indexes and Attributes # of Unique # Index Values 1 var1 1535958
Slide 9
Using Indexes
• With a WHERE clause
• With a BY statement
SAS determines whether or not to use an index based on a number of rules as described in the book.
Slide 10
Other Topics covered in the book
• Composite indexes
• Creating indexes while using PROC SORT, PROC DATASETS
• Creating Multiple Indexes
• Rules for SAS Using an Index
• Removing Indexes
• Index options e.g. Unique Option, NoMiss Option
• Recovering Missing Index Files
• Repairing Damaged Index Files
• And more