Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple Worksheets into SAS

18
Copyright © 2013 Quintiles An Alternative way to Import Multiple Excel files with Multiple Worksheets into SAS Saurabh Patel All opinions expressed in this presentation are the author’s personal views, and may not reflect the opinions or views of Quintiles.

Transcript of Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple Worksheets into SAS

Copyright © 2013 Quintiles

An Alternative way to Import Multiple Excel

files with Multiple Worksheets into SAS

Saurabh Patel

All opinions expressed in this presentation are the author’s personal views, and may not reflect the opinions or views of Quintiles.

2

We will see….

• What is it?

• Why really alternative method is require?

• How an alternative way will work?

• Example

• Benefits

• Drawbacks

3

What is it?

• In the pharmaceutical industry often data transfers happen in Microsoft excel formats files (like: XLS, XLSX or XLSM).

• It is an alternative way to import multiple excel files with multiple worksheets into SAS datasets.

• Which allows users to import excel data easily without specifying file name, worksheet name, variable length and format.

Input

VB Script Processing SAS Macro Processing

Output

4

Why really alternative method is require?

When we have convention methods like:

• LIBNAME statement Excel engine

• Import Wizard

• PROC IMPORT

• DDE Method

5

Three criteria can use for choosing the "Optimum" method :

Limitations of Regular process

1. Precision : • Are we sure to read the input data properly? Automated SAS procedures like PROC IMPORT or LIBNAME statement classify the entire column as numeric data, which often loss of decimal precisions or formats of values.

Excel Input File Imported SAS dataset using LIBNAME statement

6

Limitations of Regular process

2. Flexibility :

1. LIBNAME STATEMENT EXCEL ENGINE

• Offers more practical advantages over PROC IMPORT and DDE.

• It doesn't provide flexibility to define variables format, lengths. • It doesn't works for file formats like .csv or delimited files.

2. Import wizards and PROC IMPORT

• It can use for both excel spreadsheet and delimited files.

• It can process only single spreadsheet at a time which is time consuming and also doesn't provide flexibility to define variables format, lengths. 3. DDE Method

• Offers more flexibility for define variable format, lengths.

• It always requires more input parameter like filename, worksheet name, work sheet ranges and variable lengths.

7

Limitations of Regular process

3. Automation : • While handling multiple excel files with multiple worksheets, and

numerous data transfer most important criteria is to save important time.

• All conventional methods requires minimal amount of input

parameter but it requires greater amount of attention regarding worksheet names, worksheet ranges, variable formats, and variable lengths.

• If the programmer first needs to save each Excel spreadsheet as a CSV or TXT file, or run an import wizard, this is not automated.

Simply mean that everything should done within the single SAS program.

8

How an alternative way will work?

Input

VB Script Processing

1. VBScript Processing

Output

9

• What is VBScript?

• VBScript (Visual Basic Scripting Edition) is an Active Scripting language developed by Microsoft that is modeled on Visual Basic.

• How VBScript will be helpful here?

• VBScript is modeled on Visual Basic and Excel file system also modeled on Visual Basic.

• It is helpful here two important way.

1. Convert multiple Excel Files with worksheets into different .CSV or delimited text file format with just providing path name or filename.

2. Unmerge cells into excel worksheets and fill duplicate data into rest of cells. And remove carriage return (Alt+Enter) values to get data into proper order.

1. VBScript Processing

10

VB Script Code

11

How an alternative way will work?

2. SAS Macro Processing

SAS Macro Processing

Output Input

12

2. SAS %CSV Macro Processing

1. Get the list of all CSV filenames in Input Directory to convert in SAS dataset

2. Get the variable names from the first row of each CSV files and coverts variable names into valid variable SAS names

3. Using INFILE statement and variable names import all data as character format with maximum lengths.

13

Example

14

Benefits

• Dynamic process (just providing input directory path or filename) is helpful to save time.

• Convert Multiple excel files into CSV , unmerging cells and removing carriage return using VB script more convenient and user-friendly.

• Specifying all variables format as character and maximum length helpful to easily find out difference in frequent data transfers.

• For QC purpose, provides better options to check getting data properly with native format and values.

15

Drawbacks

• It creates dataset with all variables as character format.

When numeric variables are more compare to character variables, in that situation will not be more user friendly.

• It imports hidden worksheets data also, so programmer need to define more input parameters to get import selected worksheets data.

16

References

• An Optimal Way to Import Excel Worksheets into PC SAS http://analytics.ncsu.edu/sesug/2008/SBC-134.pdf

• So, Your Data are in Excel!

http://www2.sas.com/proceedings/sugi31/020-31.pdf

• CSV: A MACRO WHICH WRITES SAS® PROGRAMS TO READ CSV FILES http://www.lexjansen.com/nesug/nesug03/ps/ps019.pdf

17

18

Contact: [email protected]