crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the...

23
Root 1 Christopher Root Maw-MATH 1040 Fall 2015 Online Term Project Part 6 - ePortfolio Posting 29 November 2015 Introduction This project is the culmination of an entire semesters worth of knowledge in relation to analyzing a sample wherein each member of the class contributed individually as well as part of a group. Each student was instructed to purchase a package of Skittles and count the total number of each color as well as the total number of candies. There were three phases of the group and individual project. First, we analyzed the proportions and took guesses as to how our sample would compare to the rest of the class and made statistical charts (both pie and Pareto) representing the full classroom sample of 46 students. Second, we put together a Histogram chart and a Boxplot to show the mean number of Skittles per bag as well as standard deviation. This allowed us to view outliers. The final portion of the project was to construct confidence internals for the population proportion of yellow candies, the mean number of candies per bag, and the

Transcript of crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the...

Page 1: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 1

Christopher Root

Maw-MATH 1040 Fall 2015 Online

Term Project Part 6 - ePortfolio Posting

29 November 2015

Introduction

This project is the culmination of an entire semesters worth of knowledge in relation to

analyzing a sample wherein each member of the class contributed individually as well as part of

a group. Each student was instructed to purchase a package of Skittles and count the total

number of each color as well as the total number of candies. There were three phases of the

group and individual project. First, we analyzed the proportions and took guesses as to how our

sample would compare to the rest of the class and made statistical charts (both pie and Pareto)

representing the full classroom sample of 46 students. Second, we put together a Histogram chart

and a Boxplot to show the mean number of Skittles per bag as well as standard deviation. This

allowed us to view outliers. The final portion of the project was to construct confidence internals

for the population proportion of yellow candies, the mean number of candies per bag, and the

population standard deviation. This allowed us to take our first taste of inferential Statistics and

state with some level of confidence these values for population would return if sampling the

entire population were possible. While the idea of sampling Skittles seemed silly at first, I found

it to be an extremely effective tool for teaching as it was relatable, easy to acquire, and delicious.

I will include both the individual and group portions I had completed for each of these

sections, beginning with the initial analysis of my purchased bag of Skittles on the next page of

this document. The first document is the result of counting the number of candies in the bag I

had purchased and creating a five number summary alongside the mean and standard deviation.

Page 2: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 2

Page 3: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 3

The first several chapters focused on the proper methods of sample collection and how to

minimize bias. Then, we focused on how to read charts and how to avoid making them

misleading. This lead to the second portion of the project, putting together some beautiful charts

representing the sample color proportions for the entire classroom.

Chris, Kerra, Kira, Courtney

Discussion Group 8

Skittles Project Part 2 – Group

Maw – MATH 1040 Online

FIRST: Guess!  What do you expect the proportions to be? Why?

SECOND: Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and Purple candies in the class data set. Note that the sample size is the total number of candies collected by the class.

Christopher’s initial guess was that the proportion of candies would be fairly uniform even though his bag of Skittles had a large proportion of Red and Purple. Kerra guessed that Red and Yellow would have the largest proportion due to them being primary colors and being more cost efficient for mass production since the other colors would require mixing. Kira assumed that the proportion would be uniform even before opening her bag! Courtney guessed that Red Skittles would be the highest proportion, possibly because they are her favorite.

To discover the actual proportions, take the total number of candies in each color and divide by proportion was divided fairly evenly. I rounded to the nearest thousandth like in the homework.

Page 4: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 4

It appears that Christopher and Kira’s initial guess was correct as the proportion looks fairly uniform. Kerra acknowledged that her guess was wrong and suggested that perhaps cost and time were not an issue. Courtney’s Red was, unfortunately not the most represented by our class.

We all seem to agree that the class data represents a very good random sample, based on the fact that they were all purchased at completely random times from different locations and will represent the population very well.

As for the population, Christopher initially thought that the population would be all Skittles of all colors in the entire world. Kira caught that it would probably make more sense if the population was only Original Skittles as there are a variety of delicious flavors and types that have many different colors and mixes of colors, and this sample would not represent anything but the Original. Kerra and Christopher agreed that this was true.

Pie Chart

Pareto Chart

Page 5: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 5

After the basics of taking a sample and creating charts representing the sample data was

covered, we moved forward with number summaries about the sample. We discussed deviation

(spread), measures about center (mean, mode, median), proportions, and frequency distributions. We

also differentiated between qualitative (categorical) and quantitative (numeric) data, as well as

discussing what types of charts should be used for those types of data.

Page 6: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 6

Christopher Root

Alia Maw – MATH 1040 Online Fall 2015

20 September 2015

Skittles Group Project Part 3 – Individual Portion

For the variable ‘Total Candies in each bag’, the shape of the distribution is bell-shaped.

This means that while there are some bags with a few extra and some that are short, most of them

have similar amounts. As far as surprises versus expectations, I found it kind of interesting that

the range was pretty wide even if most of the amounts are similar. It’s interesting to see that

some people got less than 55 candies and some people got more than 65. That’s a pretty

significant number to be shorted when you are paying the same amount. However, since the bags

are measured in weight, the number doesn’t mean you actually got less candy. My bag had 59

pieces which is almost exactly the average for the entire class, which is 59.87. I guess I’m pretty

average in that respect.

The difference between categorical (qualitative) and quantitative data is that categorical

refers to variables that can be assigned to groups by some common attribute (such as blood type,

sex, age, etc) whereas quantitative refers to data that is numeric. For qualitative data, it is

impossible to subtract say, one color from another. With quantitative data, you can break it down

into quartiles, measures of central tendency, deviations from the mean, and dispersion. To do this

with qualitative data, you must group and create frequencies of occurrences within those

groupings.

For quantitative variables, the best type of graphs are those allow you to easily visualize

the information represented in the sample. For example, stem and leaf graphs allow you to easily

Page 7: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 7

see where a majority of the values lie but also allow you to translate each value. Histograms are

valuable because you can visualize the information fairly easily and you can tell whether it is

bell-shaped, uniform, or skewed in some direction. Frequency polygons are good to see changes

in data, while box plots allow a way to visualize outliers as they are drawn based on the fences

within quartiles and the median. Bar charts are the tried and true standard of industry as they

allow anyone to understand trends in data, but can easily be misrepresented through setting

interesting starting values. Knowing this, however, allows you to be aware of misleading

information. Line graphs are an excellent way to view trends and dot plots are good for easily

seeing where data groupings occur.

The best methods for graphing qualitative variables including creating frequency tables

for groupings of information. That way, you can create visualizations of the data in the form of

pie charts and bar graphs. My personal favorite is the pie chart for smaller amounts of data

because it can show discrepancies through colorization and are pleasing to the eye. However,

since they taper down to the center of a circle, it is often not a good idea to use this chart when

having to deal with specific numbers, but rather, when utilizing proportions.

When representing visual data, it is important to note that data can be easily

misrepresented in various ways. The most common seemingly is the adjust the starting point,

creating a ‘zoom’ on the data making it seem as if there are monstrous differences in the results.

Another method would be to compare two sets of data along different timelines or in different

groupings. This makes it seem like there are discrepancies that do not exist. Pictographs can also

be misleading because they are used for effect or humor rather than to represent the actual

amount of the data, and 3-D graphs can distort the ability to distinguish true variations.

Page 8: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 8

Christopher, Kerra, Kira, Courtney

21 September 2015

MATH 1040 Group Project 3

Final Draft

Skittles per Bag Frequency Histogram

Page 9: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 9

Skittles per Bag Boxplot

i. µ (population mean) = 59.9 (rounded to one decimal)

ii. σ (population standard deviation) = 2.6, s (sample standard deviation) = 2.6

iii. Five Number Summary: Minimum: 53 Q1: 59 Median: 60 Q3: 61 Max: 66

Page 10: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 10

Christopher Root

Alia Maw – MATH 1040 Online Fall 2015

20 September 2015

Skittles Group Project Part 3 – Individual Portion

For the variable ‘Total Candies in each bag’, the shape of the distribution is bell-shaped. This

means that while there are some bags with a few extra and some that are short, most of them have

similar amounts. As far as surprises versus expectations, I found it kind of interesting that the range was

pretty wide even if most of the amounts are similar. It’s interesting to see that some people got less

than 55 candies and some people got more than 65. That’s a pretty significant number to be shorted

when you are paying the same amount. However, since the bags are measured in weight, the number

doesn’t mean you actually got less candy. My bag had 59 pieces which is almost exactly the average for

the entire class, which is 59.87. I guess I’m pretty average in that respect.

The difference between categorical (qualitative) and quantitative data is that categorical refers

to variables that can be assigned to groups by some common attribute (such as blood type, sex, age, etc)

whereas quantitative refers to data that is numeric. For qualitative data, it is impossible to subtract say,

one color from another. With quantitative data, you can break it down into quartiles, measures of

central tendency, deviations from the mean, and dispersion. To do this with qualitative data, you must

group and create frequencies of occurrences within those groupings.

For quantitative variables, the best type of graphs are those allow you to easily visualize the

information represented in the sample. For example, stem and leaf graphs allow you to easily see where

a majority of the values lie but also allow you to translate each value. Histograms are valuable because

you can visualize the information fairly easily and you can tell whether it is bell-shaped, uniform, or

skewed in some direction. Frequency polygons are good to see changes in data, while box plots allow a

Page 11: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 11

way to visualize outliers as they are drawn based on the fences within quartiles and the median. Bar

charts are the tried and true standard of industry as they allow anyone to understand trends in data, but

can easily be misrepresented through setting interesting starting values. Knowing this, however, allows

you to be aware of misleading information. Line graphs are an excellent way to view trends and dot

plots are good for easily seeing where data groupings occur.

The best methods for graphing qualitative variables including creating frequency tables for

groupings of information. That way, you can create visualizations of the data in the form of pie charts

and bar graphs. My personal favorite is the pie chart for smaller amounts of data because it can show

discrepancies through colorization and are pleasing to the eye. However, since they taper down to the

center of a circle, it is often not a good idea to use this chart when having to deal with specific numbers,

but rather, when utilizing proportions.

When representing visual data, it is important to note that data can be easily misrepresented in

various ways. The most common seemingly is the adjust the starting point, creating a ‘zoom’ on the data

making it seem as if there are monstrous differences in the results. Another method would be to

compare two sets of data along different timelines or in different groupings. This makes it seem like

there are discrepancies that do not exist. Pictographs can also be misleading because they are used for

effect or humor rather than to represent the actual amount of the data, and 3-D graphs can distort the

ability to distinguish true variations.

The final portion of the project was our first taste of inferential statistics. It was the culmination

of an entire semester’s knowledge to create confidence intervals. I personally found this to be the most

interesting and challenging section, as everything we had previously learned was utilized. We were able

to make predictions about an entire population from an extremely small sample with high levels of

Page 12: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 12

confidence. This is where the true power of statistics was revealed, as it practical application to real-

world scenarios was first introduced. In our case, we analyzed yellow candies per bag, the mean number

of candies per bag, and the population standard deviation. All of this simply by counting Skittles!

Christopher, Kira, Kerra, Courtney

MATH 1040 – Fall 2015 Online

Term Project Part 4 – Confidence Interval Estimates/Group

18 November 2015

1) Construct a 99% confidence interval estimate for the population proportion of yellow candies. Show your work, including the computations for the margin of error and the critical value.

p̂ = total yellow/total skittles = 577/2754 = .2095First, verify np(1-p)≥10. (2754)(.2095)(1-.2095) = 456.09Find critical value of Zα/2 = 1-.99 = .01/2 = .005 = invNorm(.995) = 2.576

Use the following equation for margin of error, E.

Then find the upper and lower bounds using:

Lower bound = p̂-E = .18953, Upper bound = p̂+E = .22947

Confidence interval 99% estimation of the population proportion of yellow candies:

(.18953, .22947)

2) Construct a 95% confidence interval estimate for the population mean number of candies per bag. Show your work, including the computations for the margin of error and the critical value.

Satisfy three conditions: Sample comes from a randomized experiment, the sample is small relative to the population size, data comes from a normally distributed sample.

Page 13: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 13

Calulator – Stat – Edit – Enter Data in L1 – Stat. – Calc. – Vars. 1 – List. L1 – Calculate

s = 2.6213542, x = 59.869565, n = 46 (total sample)

tα/2 = (1-.95)/2=.025, 1-(.025)=.975. invT(.975,45) = 2.014

E = .7784

Lower bound = x−E = 59.869565 - .7784 = 59.091Upper bound = x+E=59.869565+.7784 = 60.648

95% Confidence Interval for population mean number of candies per bag:

(59.091,60.648)

3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including the computations and the critical values.

n = 46, df (n-1) = 45, s2 = 6.869641

First, discover what values need to be looked up on the Chi table. To do this, subtract 98/100 from 1 to get .02. Divide by 2 to get .01. Then subtract this value from one to discover the Chi one minus sigma divided by two value, to get .99. The degrees of freedom is 45 because it is the sample minus one, (n-1) or (46-1).

The critical values for 40 and 50 degrees of freedom were added together and divided in half since 45 was not on the table provided.

Page 14: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 14

So, our 98% confidence interval estimate for population standard deviation is:

(2.103,3.452)

4) Discuss and interpret (with complete sentences) the results of each of your three interval estimates.

The first test we did was discovering the population proportion of yellow candies using a confidence level of 99%. We discovered that the upper and lower bounds are .18953 and .22947, respectively by utilizing the formula to discover margin of error which involves discovery of a point estimate. This was found by taking the total number of yellow candies and dividing by the total number of candies. We can say with 99% confidence that the population of Skittles will have a proportion similar to our sample proportion, between these two bounds with a margin of error of .0199745.

The second problem was to construct a 95% confidence interval in relation to the population mean number of candies per bag. We had to satisfy several conditions: 1) the experiment being randomized, 2) must be small relative to the population size (n<.05N), and the data comes from a normally distributed sample. As these conditions were satisfied, we were able to move forward. We first had to discover the t-intervals as we would be utilizing sample standard deviation, which requires degrees of freedom. Once you have the student t-intervals, you can substitute into the formula which subtracts the margin of error from the sample mean. The margin of error can be calculated by using the t-value multiplied by the sample standard deviation divided by the root of the number in the sample (in our case, 46 students). We are 95% confident that the true population mean of Skittles candies per bag is between 59.091 and 60.648.

Finally, we had to construct a 98% confidence interval for the population standard deviation. This was done by using a Chi squared table because both tails are considered in the equation. First, you take the confidence interval and subtract from one and then divide by two. Then, using degrees of freedom, locate the value on a Chi table. Do the same for the other tail by subtracting the divided value again by one. Utilizing the Chi values, you input into the formula which takes variance into consideration. Degrees of freedom multiplied by variance divided by each Chi value results in the lower and upper bound. By completing these calculations, we discovered via our sample that we can say with 98% confidence that the population standard deviation will lie between 2.103 and 3.452, explaining the spread in the number of skittles per bag.

Page 15: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 15

In all of these equations, there is not 100% certainty that another sampling of Skittles will land within our bounds. It is possible that in another sampling that the proportion of yellow skittles, mean candies per bag, and standard deviation could lie outside the bounds we have discovered. However, since our level of confidence is quite high in each of these areas, it is not very likely that this would occur.

Christopher Root

Maw – MATH 1040 – Fall 2015 Online

Term Project Part 4 Confidence Intervals – Individual Portion

18 November 2015

Confidence intervals are an extremely fascinating method of determining an estimate of

the true population based on a sample proportion, sample mean, or standard deviation. It gives an

estimated range of values with a margin of error that basically says if you were to perform a

similar experiment again about a particular population, that the range of values calculated would

have a certain likelihood of landing within the lower and upper bounds determined. As the

sample size increases, you obviously have a better estimate of what the parameter of a population

would be as you have more information available. However, you can just as easily increase the

confidence level by widening the interval, therefore allowing a much wider range of values (and

therefore, a much larger margin of error) in your estimates.

I found this chapter to be the most exciting and interesting thus far. We utilized almost

every tool we have been taught in Statistics to make inferences on data that would otherwise be

impossible to measure. This allows for limitless possibilities in regard to studying vast amounts

of data and make predictions about the outcomes. Many people say that they never see a use for

Page 16: crayroot.weebly.com€¦  · Web view3) Construct a 98% confidence interval estimate for the population standard deviation of the number of candies per bag. Show your work, including

Root 16

math once they leave a classroom, but I can certainly see how this type of inferring could be

utilized in the field of information technology to map risk, security breaches, and a whole host of

different applications.

Conclusion

While it may have seemed silly to be counting Skittles initially, the overall project turned

into an extremely method for teaching Statistics step by step. From the progression of analyzing

quantitative data, spread, and measures about center to inferring upon an entire population, this

project truly allowed everyone who actively participated the ability to actually use the

information we were learning in class for something interesting. I found this particularly helpful

in an online course because oftentimes you can feel disconnected. In this case, being forced to

work as a group and having a second, third, or even fourth set of eyes looking at your work and

providing feedback was extremely valuable in my learning. I feel this has endless applications in

the real world, and the logical application of this form of mathematics felt much more tangible to

me in my occupation as a threat analyst in information technology.