Going Beyond SAS Default Formats

30
1 Going Beyond SAS Default Formats September 12th, 2007 Fall RTSUG Meeting

description

Going Beyond SAS Default Formats. September 12th, 2007 Fall RTSUG Meeting. Objectives. Easy to create multiple user-defined formats Easy to maintain these formats Customize these formats Create your own SAS-Catalog of formats. Efficiency. Lets say your data is like this -. - PowerPoint PPT Presentation

Transcript of Going Beyond SAS Default Formats

Page 1: Going Beyond  SAS Default Formats

1

Going Beyond SAS Default Formats

September 12th, 2007Fall RTSUG Meeting

Page 2: Going Beyond  SAS Default Formats

2

Objectives

Easy to create multiple user-defined formats Easy to maintain these formats Customize these formats Create your own SAS-Catalog of formats. Efficiency

Page 3: Going Beyond  SAS Default Formats

3

Lets say your data is like this - CUS_ID VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR11 VAR10 VAR12 VAR13 VAR14 VAR15 VAR16 ---VAR80

1 UG 3 01 A 8050 A 01 999 01 R003 N N 134 2 UG 3 01 A 5020 A 01 999 01 B099 N N 152 3 UG 3 01 A 7999 A 01 528 01 C035 N N 505 4 UG 3 01 A 00 3670 A 01 999 01 N N 119 5 OH 3 01 A 02 00 8060 A 01 014 01 B010 N N 144 6 OH 3 01 A 02 00 8060 A 01 014 01 B010 N N 144 7 OH 3 01 A 01 00 8060 A 01 014 01 B010 N N 144 8 UG 3 01 A 00 8060 A 01 014 01 B010 N N 144 9 UG 1 01 A 8060 A 01 014 01 B010 N N 144 10 UG 3 01 R 01 8060 A 01 014 01 B010 N N 144 11 OH 1 01 A 01 0002 F 02 014 01 B010 N N 144 12 OH 3 01 A 2510 A 01 528 01 C027 N N 505 13 OH 1 01 A 2510 A 01 528 01 C027 N N 505 14 UG 3 01 A 2510 A 01 528 01 C027 N N 505 15 OH 1 01 A 2510 A 01 528 01 C027 N N 505 16 OH 3 01 A 02 00 2510 A 01 528 01 C027 N N 505 17 UG 3 01 A 02 00 2510 A 01 528 01 C027 N N 505 18 UG 3 01 A 01 00 2510 A 01 528 01 C027 N N 505 19 UG 3 01 A 5812 A 02 192 01 C010 N N 138 20 UG 3 01 A 00 8060 A 01 016 01 B099 N N 107 21 OH 3 01 A 0724 002 A 01 598 01 B001 N N 511 22 OH 3 01 A 00 0724 002 A 01 598 01 B001 N N 511 23 UG 3 01 A 6512 A 01 320 01 N N 153 24 UG 1 01 A 7310 A 01 320 01 N N 153 25 UG 3 01 A 0001 A 01 320 01 B099 N N 153 26 OH 3 01 A 00 5140 A 01 320 01 B099 N N 153 27 UG 3 01 A 00 5410 A 01 542 01 B011 N N 505 28 UG 3 01 A 6512 A 01 076 01 N N 153 29 OH 3 01 A 00 8090 A 01 999 01 B012 N N 505 30 OH 3 01 A 00 8050 A 01 062 01 R004 N N 141 31 OH 3 01 A 00 2030 005 A 01 269 01 B099 N N 155 32 UG 3 01 A 2030 005 A 01 269 01 B099 N N 155

Page 4: Going Beyond  SAS Default Formats

4

An Example :

Cus_Num Acct_Cls 1 02 2 01 3 03 4 03 5 02 6 05 7 05 8 08 9 08 10 01 11 08 . . …more data...

Page 5: Going Beyond  SAS Default Formats

5

-Continued-

Lets assume your groups of customers are classified like this :

ACCT_CLS CODE_DESCRIPTION

01 CROWN 02 DIAMOND 03 PLATINUM 04 RUBY 05 EMERALD 06 SAPPHIRE 07 PEARL 08 GOLD 09 SILVER 10 BRONZE

Page 6: Going Beyond  SAS Default Formats

6

-continued-

Desired report/result for business folks,

who usually don’t know the raw-data very

well….

* Produce a summary of our customers

by all account types …. ?

Page 7: Going Beyond  SAS Default Formats

7

-continued-

Outputs in 2 ways: 1st way

The FREQ Procedure

ACCT_CLS

Cumulative CumulativeACCT_CLS Frequency Percent Frequency Percent---------------------------------------------------------------

01 1201772 83.44 1201772 83.44 02 234502 16.28 1436274 99.72 03 5 0.00 1436279 99.72 05 27 0.00 1436306 99.73 06 435 0.03 1436741 99.76 08 3507 0.24 1440248 100.00

Frequency Missing = 1

Page 8: Going Beyond  SAS Default Formats

8

-continued-

2nd way (preferable)

The FREQ Procedure ACCT_CLS

Cumulative CumulativeACCT_CLS Frequency Percent Frequency Percent------------------------------------------------------------------01: CROWN 1201772 83.44 1201772 83.4402: DIAMOND 234502 16.28 1436274 99.7203: PLATINUM 5 0.00 1436279 99.7205: EMERALD 27 0.00 1436306 99.7306: SAPPHIRE 435 0.03 1436741 99.7608: GOLD 3507 0.24 1440248 100.00 Frequency Missing = 1

Page 9: Going Beyond  SAS Default Formats

9

….SAS code.…

PROC FORMAT; VALUE $ACCT ’01’ = ’01: CROWN’ ’02’ = ’02: DIAMOND’ ’03’ = ’03: PLATINUM ’04’ = ’04: RUBY’ ’05’ = ’05: EMERALD’ ’06’ = ’06: SAPPHIRE’ ’07’ = ’07: PEARL’ ’08’ = ’08: GOLD’ ’09’ = ’09: SILVER’ ’10’ = ’10: BRONZE’ ;RUN;

PROC FREQ DATA = ACCOUNTS; TABLES ACCT_CLS ; FORMAT ACCT_CLS $ACCT. ;RUN;

The following code can easily create such results using ‘decodes’ as the labels …..

Page 10: Going Beyond  SAS Default Formats

10

Problem …. ? ?

What if …

Page 11: Going Beyond  SAS Default Formats

11

Problem …. ? ?

Need formatted reports for 10 or more coded variables, and very quickly….say in an hour….!!!!

Amount of time required to create individual formats….!!!!

How do we know which ‘format’ to use for a desired variable if there are lots of coded variables…!!!!

Page 12: Going Beyond  SAS Default Formats

12

An attempt to find a solution for this.

Lets say your dataset with the decodes look

like this …..

Page 13: Going Beyond  SAS Default Formats

13

NAME CODE DECODE

VAR1 01 Residential VAR1 02 Commercial VAR1 03 Industrial VAR1 04 Government VAR2 A On leave VAR2 B On Vacation VAR2 C In Hospital VAR2 D Working From Home VAR2 E Retired VAR2 F On Contract VAR2 G Resigned VAR2 H Laid Off VAR3 101 Salary < $20,000 VAR3 102 Salary between $20,000 and $40,000 VAR3 103 Salary between $40,001 and $60,000 VAR3 104 Salary between $60,001 and $80,000 VAR3 105 Salary between $80,001 and $100,000 VAR3 106 Salary > $100,000 VAR4 A001 Family of 1 member VAR4 A002 Family of 2 members VAR4 A003 Family of 3 members VAR4 A004 Family of 4 members VAR4 A005 Family of 5 members VAR4 A006 Family of 6 members VAR4 A007 Family of 7 or more members VAR5 00A1 In Batch # - 1 VAR5 00A2 In Batch # - 2 VAR5 00A3 In Batch # - 3 VAR5 00A4 In Batch # - 4 VAR5 00A5 In Batch # - 5

Page 14: Going Beyond  SAS Default Formats

14

NAME CODE DECODE VAR5 00A6 In Batch # - 6 VAR5 00A7 In Batch # - 7 VAR5 00A8 In Batch # - 8 VAR5 00A9 In Batch # - 9 VAR5 00A10 In Batch # - 10 VAR5 00A11 In Batch # - 11 VAR5 00A12 In Batch # - 12 VAR5 00A13 In Batch # - 13 VAR5 00A14 In Batch # - 14 VAR5 00A15 In Batch # - 15 VAR5 00A16 In Batch # - 16 VAR5 00A17 In Batch # - 17 VAR5 00A18 In Batch # - 18 VAR5 00A19 In Batch # - 19 VAR5 00A20 In Batch # - 20 VAR5 00A21 In Batch # - 21 VAR5 00A22 In Batch # - 22 VAR5 00A23 In Batch # - 23 VAR5 00A24 In Batch # - 24 VAR5 00A25 In Batch # - 25 VAR5 00A26 In Batch # - 26 VAR6 SS01 Service Territory Area 1 VAR6 SS02 Service Territory Area 2 VAR6 SS03 Service Territory Area 3 VAR6 SS04 Service Territory Area 4 VAR6 SS05 Service Territory Area 5 VAR6 SS06 Service Territory Area 6 VAR6 SS07 Service Territory Area . . . . . . . so on . . . .

Page 15: Going Beyond  SAS Default Formats

15

SAS code

DATA ONE (KEEP = NAME CODE DECODE2); SET Codes_decodes_data; DECODE2 = trim(CODE)||": "||DECODE;RUN;

/* TO CLEAN DATA */PROC SORT DATA=ONE DUPOUT=DUPS NODUPKEY; BY NAME CODE; RUN;

/* TO SEE JUST THE FIELD-NAMES */PROC FREQ NOPRINT DATA = ONE; TABLES NAME / OUT=FREQOUT (DROP=PERCENT);RUN;

Page 16: Going Beyond  SAS Default Formats

16

-continued-

/* CREATING FORMATS */DATA FMT; RETAIN N 0; RETAIN FMTNAME; SET ONE; BY NAME; IF FIRST.NAME THEN DO; N+1; FMTNAME='C'||PUT(N,Z3.)||'T'; END; START=CODE; LABEL=DECODE2; TYPE='C'; RUN;

PROC FORMAT LIBRARY = WORK.My_Formats CNTLIN=FMT FMTLIB; RUN;QUIT;

Page 17: Going Beyond  SAS Default Formats

17

‘FMT’ dataset looks like this….

N FORMAT NAME CODE DECODE2 START LABEL TYPE

1 C001T VAR1 01 01: Residential 01 01: Residential C1 C001T VAR1 02 02: Commercial 02 02: Commercial C1 C001T VAR1 03 03: Industrial 03 03: Industrial C1 C001T VAR1 04 04: Government 04 04: Government C2 C002T VAR10 M M: Sent 3 Notices M M: Sent 3 Notices C2 C002T VAR10 N N: Sent 2 Notices N N: Sent 2 Notices C2 C002T VAR10 Y Y: Sent 1 Notice Y Y: Sent 1 Notice C3 C003T VAR11 1A 1A: Sent 4 Notices 1A 1A: Sent 4 Notices C3 C003T VAR11 1B 1B: Sent 5 Notices 1B 1B: Sent 5 Notices C3 C003T VAR11 1C 1C: Sent 6 Notices 1C 1C: Sent 6 Notices C3 C003T VAR11 1D 1D: Sent 7 or more Notices 1D 1D: Sent 7 or more Notices C4 C004T VAR12 SSSS201 SSSS201: Credit Value - 100 SSSS201 SSSS201: Credit Value - 100 C4 C004T VAR12 SSSS202 SSSS202: Credit Value - 75 SSSS202 SSSS202: Credit Value - 75 C4 C004T VAR12 SSSS203 SSSS203: Credit Value - 50 SSSS203 SSSS203: Credit Value - 50 C4 C004T VAR12 SSSS204 SSSS204: Credit Value - 25 SSSS204 SSSS204: Credit Value - 25 C4 C004T VAR12 SSSS205 SSSS205: Credit Value - 0 SSSS205 SSSS205: Credit Value - 0 C4 C004T VAR12 SSSS206 SSSS206: Credit Value - Negative SSSS206 SSSS206: Credit Value - Negative C5 C005T VAR13 1 1: Rejected 1 1: Rejected C5 C005T VAR13 2 2: Selected 2 2: Selected C5 C005T VAR13 3 3: Missing 3 3: Missing C6 C006T VAR14 N N: No - Not On Schedule N N: No - Not On Schedule C6 C006T VAR14 Y Y: Yes - On Schedule Y Y: Yes - On Schedule C7 C007T VAR15 N N: No - Customer is not an Employee N N: No - Customer is not an Employee C7 C007T VAR15 Y Y: Yes - Customer is an Employee Y Y: Yes - Customer is an Employee C8 C008T VAR16 M M: Batch Missing M M: Batch Missing C8 C008T VAR16 N N: Batch Not Present N N: Batch Not Present C8 C008T VAR16 Y Y: Batch Present Y Y: Batch Present C9 C009T VAR17 M M: Data Unavailable M M: Data Unavailable C9 C009T VAR17 N N: Payment Not Done N N: Payment Not Done C9 C009T VAR17 Y Y: Payment Done Y Y: Payment Done C

…. SO ON….

Page 18: Going Beyond  SAS Default Formats

18

APPLICATION

OPTIONS FMTSEARCH = (WORK.MY_Formats);

PROC FREQ DATA = EXAMPLE; TABLES VAR1 VAR2 VAR3 VAR4 VAR5; FORMAT VAR1 $C002T. VAR2 $C013T. VAR3 $C015T. VAR4 $C016T. VAR5 $C017T. ;RUN;

Page 19: Going Beyond  SAS Default Formats

19

Output :

Cumulative CumulativeVAR1 Frequency Percent Frequency Percent-----------------------------------------------------------------------01: Residential 12 50.00 12 50.0002: Commercial 4 16.67 16 66.6703: Industrial 4 16.67 20 83.3304: Government 4 16.67 24 100.00

Cumulative CumulativeVAR2 Frequency Percent Frequency Percent-------------------------------------------------------------------------A: On leave 12 50.00 12 50.00B: On Vacation 4 16.67 16 66.67C: In Hospital 4 16.67 20 83.33D: Working From Home 4 16.67 24 100.00

Cumulative Cumulative

VAR3 Frequency Percent Frequency Percent -----------------------------------------------------------------------------------------101: Salary < $20,000 4 16.67 4 16.67102: Salary between $20,000 and $40,000 4 16.67 8 33.33103: Salary between $40,001 and $60,000 4 16.67 12 50.00104: Salary between $60,001 and $80,000 4 16.67 16 66.67105: Salary between $80,001 and $100,000 8 33.33 24 100.00

Page 20: Going Beyond  SAS Default Formats

20

Output continued…

Cumulative Cumulative

VAR4 Frequency Percent Frequency Percent

--------------------------------------------------------------------------------------

A001: Family of 1 member 17 70.83 17 70.83

A002: Family of 2 members 1 4.17 18 75.00

A003: Family of 3 members 3 12.50 21 87.50

A004: Family of 4 members 1 4.17 22 91.67

A005: Family of 5 members 2 8.33 24 100.00

Cumulative Cumulative

VAR5 Frequency Percent Frequency Percent

-----------------------------------------------------------------------------------------------------------------------------------------------

00A1: In Batch # - 1 2 8.33 2 8.33

00A10: In Batch # - 10 18 75.00 20 83.33

00A11: In Batch # - 11 1 4.17 21 87.50

00A12: In Batch # - 12 1 4.17 22 91.67

00A2: In Batch # - 2 1 4.17 23 95.83

00A4: In Batch # - 4 1 4.17 24 100.00

Page 21: Going Beyond  SAS Default Formats

21

References For Formats

SAS Dataset as a reference

Data My_Fmts_Reference (Rename = (Start = Code Label = Decode));

Set Fmt (Keep = Fmtname Name Start Label);

Run;

(Input dataset on next slide)

Page 22: Going Beyond  SAS Default Formats

22

Used this FMT dataset to create reference table

N FORMAT NAME CODE DECODE2 START LABEL TYPE

1 C001T VAR1 01 01: Residential 01 01: Residential C1 C001T VAR1 02 02: Commercial 02 02: Commercial C1 C001T VAR1 03 03: Industrial 03 03: Industrial C1 C001T VAR1 04 04: Government 04 04: Government C2 C002T VAR10 M M: Sent 3 Notices M M: Sent 3 Notices C2 C002T VAR10 N N: Sent 2 Notices N N: Sent 2 Notices C2 C002T VAR10 Y Y: Sent 1 Notice Y Y: Sent 1 Notice C3 C003T VAR11 1A 1A: Sent 4 Notices 1A 1A: Sent 4 Notices C3 C003T VAR11 1B 1B: Sent 5 Notices 1B 1B: Sent 5 Notices C3 C003T VAR11 1C 1C: Sent 6 Notices 1C 1C: Sent 6 Notices C3 C003T VAR11 1D 1D: Sent 7 or more Notices 1D 1D: Sent 7 or more Notices C4 C004T VAR12 SSSS201 SSSS201: Credit Value - 100 SSSS201 SSSS201: Credit Value - 100 C4 C004T VAR12 SSSS202 SSSS202: Credit Value - 75 SSSS202 SSSS202: Credit Value - 75 C4 C004T VAR12 SSSS203 SSSS203: Credit Value - 50 SSSS203 SSSS203: Credit Value - 50 C4 C004T VAR12 SSSS204 SSSS204: Credit Value - 25 SSSS204 SSSS204: Credit Value - 25 C4 C004T VAR12 SSSS205 SSSS205: Credit Value - 0 SSSS205 SSSS205: Credit Value - 0 C4 C004T VAR12 SSSS206 SSSS206: Credit Value - Negative SSSS206 SSSS206: Credit Value - Negative C5 C005T VAR13 1 1: Rejected 1 1: Rejected C5 C005T VAR13 2 2: Selected 2 2: Selected C5 C005T VAR13 3 3: Missing 3 3: Missing C6 C006T VAR14 N N: No - Not On Schedule N N: No - Not On Schedule C6 C006T VAR14 Y Y: Yes - On Schedule Y Y: Yes - On Schedule C7 C007T VAR15 N N: No - Customer is not an Employee N N: No - Customer is not an Employee C7 C007T VAR15 Y Y: Yes - Customer is an Employee Y Y: Yes - Customer is an Employee C8 C008T VAR16 M M: Batch Missing M M: Batch Missing C8 C008T VAR16 N N: Batch Not Present N N: Batch Not Present C8 C008T VAR16 Y Y: Batch Present Y Y: Batch Present C9 C009T VAR17 M M: Data Unavailable M M: Data Unavailable C9 C009T VAR17 N N: Payment Not Done N N: Payment Not Done C9 C009T VAR17 Y Y: Payment Done Y Y: Payment Done C

…. SO ON….

Page 23: Going Beyond  SAS Default Formats

23

Dataset as a referenceFMTNAME NAME CODE DECODE

C001T VAR1 01 01: ResidentialC001T VAR1 02 02: CommercialC001T VAR1 03 03: IndustrialC001T VAR1 04 04: GovernmentC002T VAR10 M M: Sent 3 NoticesC002T VAR10 N N: Sent 2 NoticesC002T VAR10 Y Y: Sent 1 NoticeC003T VAR11 1A 1A: Sent 4 NoticesC003T VAR11 1B 1B: Sent 5 NoticesC003T VAR11 1C 1C: Sent 6 NoticesC003T VAR11 1D 1D: Sent 7 or more NoticesC004T VAR12 SSSS201 SSSS201: Credit Value - 100C004T VAR12 SSSS202 SSSS202: Credit Value - 75C004T VAR12 SSSS203 SSSS203: Credit Value - 50C004T VAR12 SSSS204 SSSS204: Credit Value - 25C004T VAR12 SSSS205 SSSS205: Credit Value - 0C004T VAR12 SSSS206 SSSS206: Credit Value - NegativeC005T VAR13 1 1: RejectedC005T VAR13 2 2: SelectedC005T VAR13 3 3: MissingC006T VAR14 N N: No - Not On ScheduleC006T VAR14 Y Y: Yes - On Schedule

Page 24: Going Beyond  SAS Default Formats

24

References For Formats

SAS Catalogs as a reference

Page 25: Going Beyond  SAS Default Formats

25

SAS CATALOG as a reference

Page 26: Going Beyond  SAS Default Formats

26

-Continued-

/* SAS CODE TO CREATE FORMAT-CATALOG */

DATA LABELS; SET My_fmts_reference; KEEP FMTNAME NAME;RUN;

PROC SORT DATA = LABELS NODUPKEY; BY NAME;RUN; %MACRO CATMAC(FMTNAME,NAME); PROC CATALOG CAT = MY_Formats; MODIFY &FMTNAME (DESC=&NAME) / ET=FORMATC; CONTENTS OUT = CATOUT; RUN; QUIT;%MEND;

DATA _NULL_; SET LABELS; CALL EXECUTE("%CATMAC(" || FMTNAME || ",'" || NAME || "')");RUN;

Page 27: Going Beyond  SAS Default Formats

27

EfficiencyOption 1:

This SAS code (autoexec.sas) opens-up the catalogwindow automatically as soon as the SAS session starts,provided these formats were saved in a “PermanentSAS-Library”.

It’s saved here (Ask your SAS technical support where to save):<C:\Program Files\SAS\SAS 9.1>

DM 'CATALOG Paper.My_Formats';

Page 28: Going Beyond  SAS Default Formats

28

OPTIONS FMTSEARCH = (WORK.MY_Formats);

Run this …Run this…Run this again …Run this again again….Run this again again again….

Page 29: Going Beyond  SAS Default Formats

29

Option 2:

This code should be added in SASV9.txt (SAS ConfigurationInformation). It loads all the formats automatically in the memory assoon as the SAS session starts, provided these formats were savedin a “Permanent SAS-Library”.

It’s saved here (Ask your SAS technical support where to save):

<C:\Program Files\SAS\SAS 9.1\nls\en\SASV9.txt>

-FMTSEARCH (Paper.My_Formats)

No need to run OPTIONS FMTSEARCH = () every time to use the formats !!!!!

Page 30: Going Beyond  SAS Default Formats

30

For Questions…

Contact:

Divya Jain

[email protected]

(919)-546-5129

Progress Energy Company