The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter

12
Slide 1 The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter

description

The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter. About the Author. Wrote two previous SAS books and over twenty SAS technical papers. Section Chair at SUGI, SESUG and NESUG and co-chaired NESUG in 1995. - PowerPoint PPT Presentation

Transcript of The Complete Guide to SAS Indexes by Michael A. Raithel Reviewed by Horst Wolter

Slide 1

The Complete Guide to SAS IndexesbyMichael A. Raithel

Reviewed by Horst Wolter

Slide 2

About the Author

• Wrote two previous SAS books and over twenty SAS technical papers.

• Section Chair at SUGI, SESUG and NESUG and co-chaired NESUG in 1995.

• First book entitled Tuning SAS Applications in the MVS Environment resides in the Smithsonian Institution of American History’s Permanent Research Collection of Information Technology.

Slide 3

Goals of Indexing

• The main goal of using a SAS index is to read only a small portion of a large SAS dataset instead of reading the entire SAS dataset.

• Reduce I/Os.

• Reduce wall clock time.

• Reduce CPU time.

Slide 4

When to Index

Consider

• the size of the subsets (the overhead of using an index can become greater than the overhead of a sequential read of the entire dataset)

• frequency of use (is building an index cost effective)

• variability of the data (indexed SAS datasets have an overhead to maintain if data changes often and may not be cost effective)

Slide 5

1% to 15% An index will definitely improve processing. There should be dramatic resource savings in the lower end of this range.

16% to 30% An index will improve processing. However, the resource savings will not be as dramatic as in the lower range.

31% to 60% An index may improve processing, or it might worsen processing. Be very careful in this subset range.

61% to 100% Do not use an index. A sequential read of the entire dataset is very likely to be more efficient.

Indexing ActionSubset Size

Indexing Guidelines(Table 2.1 from book)

Slide 6

Index Variable Selection

What variables to consider for an index?

• Often used to subset data

• Values that represent a small subset

• A SAS dataset sorted by the index variable is more efficient

Slide 7

How to create an Index

• Data Step Method

DATA data-set-name(INDEX=(index-name=(var1 var2 etc.) </UNIQUE> </NOMISS>));

• SQL Method

proc sql;CREATE <UNIQUE> INDEX index-name ON table-name ( column

<, ... column>); quit;

Slide 8

Viewing Index Information

proc contents data=data-set-name <centiles>;run;

Indexes 1

Alphabetic List of Indexes and Attributes # of Unique # Index Values 1 var1 1535958

Slide 9

Using Indexes

• With a WHERE clause

• With a BY statement

SAS determines whether or not to use an index based on a number of rules as described in the book.

Slide 10

Other Topics covered in the book

• Composite indexes

• Creating indexes while using PROC SORT, PROC DATASETS

• Creating Multiple Indexes

• Rules for SAS Using an Index

• Removing Indexes

• Index options e.g. Unique Option, NoMiss Option

• Recovering Missing Index Files

• Repairing Damaged Index Files

• And more

Slide 11

From SAS Press

Slide 12

Questions?