GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

18
GET 236: ENTERPRISE DATA ANALYSIS: TOOLS AND TECHNIQUES WEEK 03: IMPORT AND VALIDATE DATA

description

W EEK 01 R ECAP GET 236: What and Why, Expectations, Syllabus and Schedule Circular References Range Names Paste Special Cell Referencing $A1, A$1, $A$4 Shortcut via F4 Other useful things on various tabs of Excel 3

Transcript of GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

Page 1: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

GET 236: ENTERPRISE DATA ANALYSIS: TOOLS AND TECHNIQUES

WEEK 03: IMPORT AND VALIDATE DATA

Page 2: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

2

WEEK 03 AGENDA Week 01 Recap

Main Topics Covered: 38: Importing Data from Text Files 39: Importing Data from the Internet 40: Validating data 46: Filtering and Removing Dups 25: Sorting in Excel

Next Class Information

Page 3: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

3

WEEK 01 RECAP GET 236: What and Why, Expectations, Syllabus and

Schedule Circular References Range Names Paste Special Cell Referencing

$A1, A$1, $A$4 Shortcut via F4

Other useful things on various tabs of Excel

Page 4: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

4

IMPORTING FILES Record / Column Maximums:

Excel 2003: 65,536 rows by 256 columns Excel 2010: 1,048,576 rows by 16,384 columns Excel 2013: no increased limit size from 2010 version

Why is this important for importing? Any file size greater than this, data will get cut off. It will NOT

automatically go to another tab. It will just not import past the max. Larger data sets hence require other tools (ex. Access, ACL, SQL)

Page 5: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

5

IMPORTING DATA – TEXT FILES – FIXED WIDTH Fixed Width

Real Aging Example:

Excel guesses where the data should be broken into columns. You can easily modify Excel’s assumptions

Page 6: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

6

IMPORTING DATA – TEXT FILES – FIXED WIDTH (CONT.) Fixed Width

Import File 01 to tab “01 Fixed”

Key things to remember: Start import at row: does not need to be 1 (but often is) You can change the column breaks suggested Data Format: Choose text when it is text

Example of subscriber field Watch out for additional headers / footers throughout

Footer may not align based on file separators

Page 7: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

7

IMPORTING DATA – TEXT FILES - DELIMITED Delimited

Delimiters are the separators. Often common, bar, dash, etc. BAR (|) is best.

Text qualifiers are optional but preferred. A double quote should be used

Tell me why I worry about comma with no text qualifiers

Import File 03 to tab “03 - Delimited”

Page 8: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

8

IMPORTING DATA – TEXT FILES – DELIMITED (CONT.) Delimited

Can copy / paste into Excel and then Text To Columns Tab “03 - Delimited - Text to Col” Copy the contents of the text files into excel and then

separate the columns via Text To Columns

Page 9: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

9

IMPORTING DATA – SKEWED DATA Often with text files, you run the risk of skewed data.

Skewed data is sometimes referred to as “spill over”. It is when data does not appear in the column it is supposed to due to an additional delimiter(s) or lack of delimiter.

Read except from page 308 In the real world, you have to know when to fix and

when to push back.

Import file 04 to tab “04 - Delimited – Skewed”

Page 10: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

10

IMPORTING DATA – FROM THE WEB Data From Web Paste link: http://www.boxofficeguru.com/blockbusters.htm

Page 11: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

11

QUESTIONS?

Page 12: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

12

VALIDATING DATA The book talks about validating data in terms of making

sure a column / row / set of cells contains the appropriate data All numbers, all texts, all dates Whole numbers in a given range, dates greater than x, text

with a length less than x, etc.

In the business setting, validating data is sometimes a sanity check (with filtering) to make sure WHAT you are looking at is as expected No import errors Totals make sense, date ranges are appropriate, etc.

Page 13: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

13

VALIDATING DATA (CONT.) Data – Data Validation

What are you allowing? What are you telling the user to do? What happens if they don’t do that?

To clear data validation from a range, select the range, choose Data Validation and then select Clear All.

Page 14: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

14

FILTERING DATA Filtering is selecting a unique section of your data

Makeup example Jen, Lipstick, East region Cici or Colleen, Lipstick or mascara, East or South region Units > 90 and Dollars > $280 Names begin with C By Color Top 30 Dollar values for Hallagan or Jen

Note: Top x items or dollars is based on the ENTIRE population, NOT what you have filtered.

Page 15: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

15

SORTING Data – Sort Can sort by alphabetical, color, or by you own criteria Can add multiple levels (ex. sort by A, then B, then C)

Page 16: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

16

REMOVING DUPLICATES If duplicates exist in your data, clicking “remove

duplicates” will do just that – remove them. Excel won’t identify and show you the duplicates. It will only take them out.

Command often used for sanity checking the data as well as understanding unique values

Makeup example Unique list of salespeople Unique combination of salesperson, product, and location

Page 17: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

17

QUESTIONS?

Page 18: GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.

18

NEXT CLASS: WEEK 04 Assignment 01 tests your understanding on this lesson. This is due

prior to the start of class for full credit. Anything received after 5:15pm will be marked late and eligible for a

maximum of half credit. The assignment will be reviewed and discussed at the beginning of next

class to solidify learning and close any gaps on these topics

Formatting and Analyzing Data 06: Text Functions 07: Dates and Date Functions 13: Time and Time Functions 20: Count Functions 21: Sum / Average Functions

Quiz will be in Week 05, covering ALL material on weeks 03 and 04