Correlation
-
Upload
stephanie-e-heintzman -
Category
Data & Analytics
-
view
16 -
download
0
Transcript of Correlation
WHICH NUTRITIONAL VALUE AFFECTS THE AMOUNT OF CALORIES IN CANDY THE MOST?OR WHY YOU GET SO MANY CALORIES FROM A CANDY WITH HIGH CONTENTS OF TOTAL FAT
Total Fat
Prepared by:Alexander VoroninBonnie PangKayla MinaStephanie HeintzmanVladyslav Akimenko
Image source: Google Images
EXECUTIVE SUMMARY
Calories in candy products are highly impacted by Total Fat (0,8 correlation)
High levels of cholesterol in candies aren’t connected to the contents of saturated fat (0,47 correlation)
2
TABLE OF CONTENTS
1. Our Data….……………………………………………………………4-5
2. Data Assumptions……………………………………………………….6Correlation coefficient as method of research…………………………..7-8
Using SAS to get insights about the data………………………………..9-12
3. Conclusions about the data.………………………………………..........13
4. How you can reach us…………………………………………………..14
3
BACKGROUND
Our data set consists of 75 candies.
For each candy, the following information is available: Servings, Weight, Calories, Total Fat, Saturated Fat, Cholesterol, Sodium, Carbohydrates, Fiber, Sugars, Protein, Vitamin A, Vitamin C, Calcium, and Iron.
4
WE HYPOTHESIZE THAT…
1. The more Saturated Fat there is, the more Cholesterol there is.
2. Calories are impacted more by Total Fat than by Sugar.
6
METHOD
We will use the correlation coefficient (r) to indicate the strength of the relationships
This will be done using the analytical software, SAS Enterprise Guide
7
HOW WE USE THE CORRELATION COEFFICIENT? It measures the strength and the direction of a linear relationship between two variablesWhen the correlation is positive (r > 0), it means that as the value of one variable increases, so does the other.
If a correlation is negative (r < 0), it indicates that when one variable increases, the other variable decreases. This means there is an inverse relationship between the two variables.
[Shen, David. "Computation of Correlation Coefficient and It's Confidence Interval in SAS." Sas.com. Web. ]
8
STEP 2 Window opens
Drag the indicated variables under “Variables to assign” to the Analysis variable <variable required> under “Task roles”
10
STEP 3 Under the Results tab to the left, check “Create a scatter plot for each correlation pair” and uncheck “Show significance probabilities associated with correlations”.
Finally, Run the correlation
11
RESULTS Each cell in the following data output shows the strength of the relationship between the variables listed in the corresponding rows and columns.
The higher the number the stronger the relationship is
Numbers in the orange box link to hypothesis 1Numbers in the red box link to hypothesis 2 12
Highest number in the dataset – strongest relationship
CONCLUSIONSHypothesis 1- The more Saturated Fat there is, the more Cholesterol there is.
0.47270 – SatFat & Cholesterol
The relationship between Saturated Fat and Cholesterol is 47%. This is a weak correlation.
This means that having more Cholesterol does not indicate higher Saturated Fat levels, and vice versa.
Therefore we reject this hypothesis.
Hypothesis 2 – When a candy is high in calories, there is more likely to be higher levels of Total Fat than Sugar.
0.80707 – Calories & Total Fat
0.41692 – Calories & Sugar
The relationship is about 2X stronger between calories and Total Fat (80%), than Calories and Sugar (41.6%).
This means that the higher levels of Calories can be more likely determined by the levels of Total Fat than levels of Sugar.
Therefore we accept this hypothesis.
13
HOW YOU CAN REACH US
Our email address: [email protected]
Feel free to contact us for data research using other analytical tools and approaches
Using data analysis we’re able to find other correlations in your dataset
14