STAT 5101 Foundations of Data Science

16
1 Foundations of Data Science Instructor: Xinyuan Song Office: LSB 114, 39437929, email: [email protected] Teaching Assistant: Lai-Fan Pun Office: LSB G28, 39438529, email: [email protected] Assessment Scheme Exercise 20% Mid-term examination 30% October 31, 2012 7:00-9:00pm No make-up examination Final examination 50% December 12, 2012

description

STAT 5101 Foundations of Data Science. Instructor: Xinyuan Song Office: LSB 114, 39437929, email: [email protected] Teaching Assistant: Lai-Fan Pun Office: LSB G28, 39438529, email: [email protected]. Assessment Scheme. - PowerPoint PPT Presentation

Transcript of STAT 5101 Foundations of Data Science

Page 1: STAT 5101 Foundations of Data Science

1

STAT 5101 Foundations of Data Science

Instructor: Xinyuan Song Office: LSB 114, 39437929, email: [email protected]

Teaching Assistant: Lai-Fan PunOffice: LSB G28, 39438529, email: [email protected]

Assessment Scheme

Exercise 20%Mid-term examination 30% October 31, 2012 7:00-9:00pm No make-up examinationFinal examination 50% December 12, 2012 7:00-9:00pm

Page 2: STAT 5101 Foundations of Data Science

2

Course Description

This course provides comprehensive coverage of basic concepts of statistics.

Topics include

exploratory data analysis,

statistical graphics, sampling variability,

point and confidence interval estimation,

hypothesis testing,

other selected topics. Two computer software: R and Microsoft Excel will be

introduced to describe and analyze data.

Page 3: STAT 5101 Foundations of Data Science

3

Learning Outcomes

After completing the course, students should be able tounderstand basic concepts in statistics;use various statistical methods and techniques to summarize, present, and analyze data; read statistical reports and recognize when the quantitative information presented is accurate or misleading ;use computer software (R and Excel) to analyze data and draw conclusions.

Page 4: STAT 5101 Foundations of Data Science

4

Textbook and Reference Books

Levine, D. M., Stephan, D., Krehbiel, T. C. and Berenson, M. L.Statistics for Managers Using Microsoft Excel 5th Edition. Pearson Prentice Hall, 2008.

Textbook

Reference book

1. Siegel, A. F. Practical Business Statistics 5th Edition. Mc Graw Hill, 2003.

2. Agresti, A. and Franklin, C. Statistics: The Art and Science of Learning from Data. 2nd Edition, Pearson Prentice Hall, 2009.

3. Fraenkel, J., Wallen, N. and Sawin, E. I. Visual Statistics.4. Any other textbook for introducing basic statistics.

Page 5: STAT 5101 Foundations of Data Science

5

Organization of Textbook

Presenting and Describing Information Introduction and Data Collection (Chapter 1) Presenting Data in Tables and Charts (Chapter 2) Numerical Descriptive Measures (Chapter 3)

Drawing Conclusions About Populations Using Sample Information

Basic Probability (Chapter 4) Some Important Discrete Probability Distributions (Chapter 5) The Normal Distribution and Other Continuous Distributions (Chapter 6) Sampling and Sampling Distributions (Chapter 7) Confidence Interval Estimation (Chapter 8) Hypothesis Testing (Chapters 9-12) Decision Making (Chapter 17)

Page 6: STAT 5101 Foundations of Data Science

6

Organization of Textbook

Making Reliable Forecasts Simple Linear Regression (Chapter 13) Introduction to Multiple Regression (Chapter 14) Multiple Regression Model Building (Chapter 15) Time-Series Forecasting (Chapter 16)

Improving Business Process Statistical Applications in Quality Management (Chapter 18)

Page 7: STAT 5101 Foundations of Data Science

7

Course Outline

Chapter I Data Collection and Data Presentation

Chapter 2 Numerical Descriptive Measures

Chapter 3 Important Discrete Probability Distributions

Chapter 4 Important Continuous Distributions

Chapter 5 Sampling and Sampling Distributions

Chapter 6 Confidence Interval Estimation

Chapter 7 Hypothesis Testing: One Sample Tests

Chapter 8 Two-Sample Tests

Chapter 9 Chi-squared Tests and Nonparametric Tests

Chapter 10* Selected topic

Page 8: STAT 5101 Foundations of Data Science

8

Chapter 1Data Collection and Data Presentation

Explain key definitions:

Population vs. Sample Primary vs. Secondary Data

Parameter vs. Statistic Descriptive vs. Inferential Statistics Describe key data collection methods Describe different sampling methods

Probability Samples vs. Nonprobability Samples

Identify types of data and levels of measurement

Use graphical techniques to organize and present data ordered array stem-and-leaf display

frequency distribution, polygon, and ogive histogram

scatter diagrams bar charts, pie charts

Page 9: STAT 5101 Foundations of Data Science

9

Mean, median, mode

Range, variance, standard deviation, coefficient of variation

Five-number summary

Box-and-whiskers plot

Correlation coefficient

Chapter 2Numerical Descriptive Measures

Page 10: STAT 5101 Foundations of Data Science

10

Chapter 3Important Discrete Probability Distribution

Define mean and standard deviation Explain covariance and its application in finance Binomial probability distribution Poisson probability distribution Hypergeometric probability distribution Negative binomial distribution, geometirc distribution,

multinomial distribution

Page 11: STAT 5101 Foundations of Data Science

11

Chapter 4 Important Continuous Distributions

Continuous probability distribution

Characteristics of the normal distribution

Using a normal distribution table

Evaluate the normality assumption

Uniform and exponential distributions

Gamma and Weibull distributions

Page 12: STAT 5101 Foundations of Data Science

12

Chapter 5 Sampling and Sampling Distributions

Types of sampling methods

Sampling distributions

Sampling distribution of the mean

Sampling distribution of the proportion

Central Limit Theorem

Page 13: STAT 5101 Foundations of Data Science

13

Chapter 6Confidence Interval Estimation

Point estimate

Confidence interval estimate

Confidence interval for a population mean

Confidence interval for a population proportion

Determine the required sample size

Page 14: STAT 5101 Foundations of Data Science

14

Chapter 7Hypothesis Testing: One Sample Tests

Null and alternative hypotheses

A decision rule for testing a hypothesis

Hypothesis testing

Type I and Type II errors

Page 15: STAT 5101 Foundations of Data Science

15

Chapter 8Two-Sample Tests

Test the difference between two independent population means

Test two means from related samples

Test the difference between two proportions

F test for the difference between two variances

Page 16: STAT 5101 Foundations of Data Science

16

Chapter 9Chi-Square Tests and Nonparametric Tests

Chi-square test for the difference between two proportions

Chi-square test for differences in more than two proportions

Chi-square test for independence The Wilcoxon rank sum test for two population

medians The Kruskal-Wallis H-test for multiple population

medians