12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson...
-
Upload
lesley-owens -
Category
Documents
-
view
215 -
download
3
Transcript of 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson...
![Page 1: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/1.jpg)
12a. Regression Analysis, Part 1
CSCI N207 Data Analysis Using Spreadsheet
Lingma [email protected]
Department of Computer and Information Science, IUPUI
![Page 2: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/2.jpg)
StudentReading Aptitude
Reading Hours
1 20 52 5 13 5 24 35 75 30 86 35 87 10 38 5 29 15 510 40 9
Multivariate Analysis - Correlation
0 5 10 15 20 25 30 35 40 450123456789
10
Reading Aptitude and Read-ing Hours
Aptitude
Hours
Scatter chart with a trend line:
![Page 3: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/3.jpg)
Multivariate Analysis - Correlation
0 5 10 15 20 25 30 35 40 450123456789
10
Reading Aptitude and Read-ing Hours
Aptitude
Hours
Scatter chart with a trend line:
• With a trend line, are we able to roughly estimate the reading aptitude if a person reads 6 hours a week? If so, what is the estimation?
StudentReading Aptitude
Reading Hours
1 20 52 5 13 5 24 35 75 30 86 35 87 10 38 5 29 15 510 40 911 25 612 33 7.813 46 10
![Page 4: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/4.jpg)
Regression and Prediction• Regression refers to a mathematical method
for determining the best equation to reproduce a data set.
• Linear regression is a regression method that applies a straight line (linear model) for analysis.
• How do we generate a formula that represents a line with which we can use to predict a data without having to use a chart?
• We use regression analysis to …– … predict new X and Y values – … aid our understanding of data behavior
![Page 5: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/5.jpg)
Reviewing the Linear Equation• The equation for a line is:
bmXY Dependent
VariableIndependent Variable
Slope
y-intercept
![Page 6: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/6.jpg)
Slope and y-intercept
0
2
4
6
8
10
12
0 5 10 15 20 25
Y
X
Y = 0.4X + 2Y = 0.8X + 4
Y = 0x + 5
![Page 7: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/7.jpg)
m and b• m, the Slope is a ratio, defined as:
• ∆: change of
run
riseor as
X
Y
![Page 8: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/8.jpg)
Example – Determining Slope
Data Points
Value
X1 1
Y1 2.4
X2 20
Y2 10
4.019
6.7120
4.21012
12
m
m
XX
YYm
X
Ym
![Page 9: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/9.jpg)
Example of Determining Y-Intercept
2810
204.01022
22
22
b
b
mXYb
bmXY
bmXY
• X1=1, Y1=2.4, X2=20, Y2=10, m=0.4Example 1: Example 2:
24.04.2
14.04.211
11
11
b
b
mXYb
bmXY
bmXY
Equation: Y = 0.4X + 2
![Page 10: 12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science,](https://reader035.fdocuments.in/reader035/viewer/2022072014/56649e875503460f94b8a7fc/html5/thumbnails/10.jpg)
Practice
• Find the equation for the line below.p1(5,1), p2(10,3)
4 5 6 7 8 9 10 110
0.5
1
1.5
2
2.5
3
3.5
f(x) = 0.4 x − 1
Reading Aptitude and Reading Hours