Basic Stats with Tableau - tc18.tableau.com · Basic Stats with Tableau Tyler Martin Senior...

29

Transcript of Basic Stats with Tableau - tc18.tableau.com · Basic Stats with Tableau Tyler Martin Senior...

Basic Stats with Tableau

Tyler Martin

Senior Software Engineer

Tableau Software

# T C 1 8

Tyler Martin

[email protected]

Agenda

• Confidence intervals

• Hypothesis testing

• Trend lines

• Forecasting

• Q&A

Confidence Intervals

Confidence Interval: Definition

DefinitionFor 95% of samples, the confidence interval will contain the population average

For a particular sampleThere is an 95% chance that the confidence interval contains the population average

Confidence Intervals Answer Questions Like…

What does my sample of mile-run times tell me about the average 2nd grader in Seattle?

Is the average 2nd grader in Seattle likely to run a faster mile than me?

Hypothesis Testing

Hypothesis Testing: Test Statistic

A value calculated from your data

This value always follows the same distribution, regardless of the distribution of your data*

Hypothesis Testing: Procedure

1. State the hypothesis and the null hypothesis

2. Choose an appropriate test statisticThis usually follows a well-known distribution

3. Choose a threshold probabilityUsually small, we will use 0.005 (0.5%)

Hypothesis Testing: Procedure (continued)

4. Calculate the p-valueThe probability under the null hypothesis of sampling a test statisticat least as extreme as what we observed.

5. Accept or reject the hypothesisAccept: p < 0.005

Reject: p > 0.005

Student’s t-test

Test statistic follows Student’s t-distribution

We will use a two-sample location test

Tests the null hypothesis that the means of two populations are equal

Hypothesis Testing Can Answer Questions Like…

Are CrossFit Games athletes stronger on average in 2018 than they were in 2007? (t-test)

Are observations of two groups independent of one another? (Chi-squared test)

Is my sample drawn from a normally distributed population? (Shapiro-Wilk test)

Trend Lines

Trend Lines: Null Hypothesis

What if there is no relationship?

Trend Lines: Residuals

Trend Lines: OLS Questions

1. Do I suspect there is a relationship between two variables? What do I suspect that relationship is?

2. Do the residuals have mean = 0? Do they appear unrelated to the independent variable?

3. Are the residuals are unlikely to be correlated with one another?

4. Does the spread of the residuals look roughly the same with changes in the independent variable?

Trend Lines Answer Questions Like…

What is the relationship between profit and CEO compensation?

When wind speed changes, how does windmill power output change?

Does compensation change in a meaningful way when age changes?

Forecasting

Forecasting: Model Quality

We will consider only Mean Absolute Scaled Error (MASE)

MASE compares the error of your model with the error of the naïve forecast

MASE is typically between 0 (good) and 1 (bad)

Forecasting: Naïve Forecast

Forecast values copied from the last observed value.

For seasonal forecasts, values are copied from the last observed season.

Forecasting: Unexpected and Poor Forecast

1. Does it look like there is a structural break in my data?

2. Is there a lot of short-scale variation at the current date level?

Forecasts Answer Questions Like…

How many visitors to my page can I expect in the future, given data on past visits?

Based on past data, what will my inventory be in the future?

How is the value of my collection likely to change in the future?

Questions

Please complete the

session survey from the

Session Details screen

in your TC18 app