Lecture1 Data

22
MAT 271 E Probability and Statistics Dr. Özge Kürkçüoğlu-Levitas [email protected]

Transcript of Lecture1 Data

Page 1: Lecture1 Data

MAT 271 E Probability and Statistics

Dr. Özge Kürkçüoğlu-Levitas

[email protected]

Page 2: Lecture1 Data

Syllabus • Purpose of this course:

– To introduce the counting techniques – To introduce the concept of probability – To introduce the basic elements of probability – To make aware of the students about the use of probability in

Statistics

• Contents:

– Counting Techniques, Concept of Probability, Probability Function, Probability Density Function, Bernoulli, Binom, Poisson Disributions, Exponential, Gamma, Normal Density Functions, Random Variables of Multiple Dimensions, The Concept of Estimator and Properties of Estimators, Maxsimum Likelihood Function, Test of Hypothesis, Ki-Square Test, t-test, F-test.

Page 3: Lecture1 Data

• Textbooks: – Şaşmaz, D. Ali, İstatistik ve Olasılık, İTÜ Kimya-Metalürji Fakülyesi,

Kimya Mühendisliği Bölümü, 2011 – Freund, E. J., Modern Elementary Statistics, 7th Edition, Prentice Hall,

1988

• Technology Resources: – Matlab – MS Excel

• Grading: – Midterm I 25% – Midterm II 25% – Homeworks 10% – Final 40%

• Attendence: min 70%, – closer to 100% better chance to increase your letter grade, smaller

than 70% decrease in your letter grade

Page 4: Lecture1 Data

Statistics is the science to

o Collect, organize, analyze and interpret data to draw valid conclusions

o Predict and forecast using data and statistical models

Widely used in natural sciences, economics, business, psychology, medicine.

• Weather forecast,

• Polls for election,

• Accident statistics,

• Health statistics.

• Stockmarkets.

Page 5: Lecture1 Data

Definitions

• Data: collections of observations (measurements, genders, survey responses)

• Population (or Universe): complete collection of all individuals (scores, people, measurements etc) to be studied.

• Sample: a subcollection of members selected from a population.

A survey on elections has 5 million respondents --> sample People eligible to vote in Turkey in 2011: 52 million --> population

Page 6: Lecture1 Data

• Data is useful for:

– Investigating a (scientific) subject

– Observing the efficiency of a production facility

– Meeting existing standards

– Trying different approaches in the phase of decision

– Personal curiosity

Page 7: Lecture1 Data

Two groups of statistical methods according to their purpose:

– Descriptive statistics are used to describe the

basic features of the data in a study: It simply describes what the data shows using charts and tables.

– Inferential statistics are used to reach conclusions and make generalizations that extend beyond the immediate data alone.

Page 8: Lecture1 Data

A statistical study involves:

– A context (description of the study, ex: weight of 100 male students interested in sports in high school A)

– Collecting data from the source

– Analyzing data with a sampling method

– Deducting a result

Page 9: Lecture1 Data

Methods for collecting data • Survey

– Questions to randomly selected people (sample) from a population.

– Questions, people, locations should be carefully determined

– Wrong data leads to wrong statistics

• Observation

– Collecting data without interfering

Page 10: Lecture1 Data

• Experiments

– Investigating the relationship between the input-output.

– The observer should not interfere

• Projection

– Usually used in psychology: it usually involves personal thoughts, behaviors, emotions of patients.

Although there are different ways of collecting data, all majors analyze them using same statistical methods Statistics is not customized for different majors, it

contains general methods.

Page 11: Lecture1 Data

Example 1:

• Education: In a random sample of 200 high school seniors in a large city, 137 said that they will go on to college. At the 0.05 level of significance, does this refute claim that 60% of all the high school seniors in this city will go on to college?

• Engineering: In a random sample of 200 transistors made by a given manufacturer, 137 passed an accelerated performance test. At the 0.05 level of significance, does this refute can claim that 60% of all transistors made by the manufacturer will pass the test?

• Food: In a random sample of 200 citrus trees exposed to -7oC frost, 137 showed a damage to their fruit. At the 0.05 level of significance, does this refute can claim that the 60% of all citrus trees exposed to a -7oC frost will show some damage to their fruits?

Page 12: Lecture1 Data

Should you believe a statistical study?

In Statistical Reasoning for Everyday Life, by Jeff Bennett and Mario Triola, eight guidelines for critically evaluating a statistical study:

(1) Identify the goal of the study, the population considered and the

type of study (2) Consider the source, with regard to a possibility of bias (3) Analyze the sampling method (4) Look for problems in defining or measuring the variables (5) Watch out for confounding varibles that could invalidate

conclusions (6) Consider the setting and wording of any survey (7) Check that graphs represent data fairly (8) Consider whether the conclusions achieve goals of study, and

whether they have practical significance.

Page 13: Lecture1 Data

Example 2: What is wrong with this survey?

In USA, Newsweek magazine ran a survey about the Napster Web site (a former pirate site for free mp3 download). Readers were asked: “Will you still use Napster if you have to pay a fee?” Readers could register their responses on the magazine’s Web site.

Among the 1873 responses received, 19% said yes, it is still cheper than buying CDs. Another 5% said yes, they felt more comfortable using it with a charge.

Voluntary response sample is one in which the respondents themselves decide whether to be included. This sample is biased and may not represent the reality, because people with strong opinions are more likely to participate.

Page 14: Lecture1 Data

Nature of statistical data • Continuous data can be measured. It is related with

physical measurement. It can have units such as kg, m etc.

– Ex: Amount of milk coming from cows, they can assume any value over a continuous span. (any value between 0-7000 L/ year)

• Discrete data can only be counted. Finite number of values are possible.

– Ex: ‘head’ and ‘tail’ of a coin; number of eggs that hens lay.

Ordinary arithmetic (e.g. addition, substraction) can be

applied to both continuous and discrete data.

Page 15: Lecture1 Data

• Nominal data (in latin nomen means ‘name’)

– Ex: “What is your marital status?”

a) Married b) Single c) Divorced d) Widow

-Nominal items can be made numerical for statistical analysis.

Page 16: Lecture1 Data

Interpretation of data

• Example 1: Analyze the overall performance of senior students Ayşe, Ahmet, Mehmet and Gül by ranking their scores.

History Mathematics English

Ayşe 89 51 40

Ahmet 61 56 54

Mehmet 40 70 55

Gül 13 77 72

Page 17: Lecture1 Data

Case 1: Ranking based on total sum of grades

• This approach does not reflect the success of the students

History Mathematics English Total Rank

Ayşe 89 51 40 180 1

Ahmet 61 56 54 171 2

Mehmet 40 70 55 169 3

Gül 13 77 72 162 4

Page 18: Lecture1 Data

Case 2: Ranking based on each subject

History Mathematics English Total Rank

Ayşe 1 4 4 9 4

Ahmet 2 3 3 8 3

Mehmet 3 2 2 7 2

Gül 4 1 1 6 1

Page 19: Lecture1 Data

Case 3: Ranking based on weights of subjects

• Same results with case 2.

History Mathematics English Total Rank

Weight 3 6 5

Ayşe 1*3=3 4*6=24 4*5=20 47 4

Ahmet 2*3=6 3*6=18 3*5=15 39 3

Mehmet 3*3=9 2*6=12 2*5=10 31 2

Gül 4*3=12 1*6=6 1*5=5 23 1

Page 20: Lecture1 Data

Case 4: Ranking based on weights of subjects (2)

• Same as in case 2 and 3

History Mathematics English Total Rank

Weight 3 6 5

Ayşe 89*3=267 51*6=306 40*5=200 773 4

Ahmet 61*3=183 56*6=336 54*5=270 789 3

Mehmet 40*3=120 270*6=420 55*5=275 815 2

Gül 13*3=39 77*6=462 72*5=360 841 1

Page 21: Lecture1 Data

• Example 2: “Drought in Turkey!: The newspaper claims that precipitation in Turkey has significantly diminished in the last 6 months.”

Center Average precipitation in 6

months(mm)

Average over

years(mm)

Average for last year

(mm)

Balıkesir 476 419 363

Bursa 461 471 574

Tekirdağ 241 389 434

Edirne 399 346 268

Kırklareli 298 349 270

Trabzon 385 469 574

Giresun 928 727 843

Samsun 566 411 551

Rize 1539 1315 1497

Konya 111 201 197

Karaman 136 223 233

Kırşehir 218 239 145

Niğde 205 194 176

Afyonkarahisar 180 234 239

Aydın 459 505 609

Denizli 282 406 435

İzmir 682 579 477

Manisa 537 603 507

Adana 361 515 582

Gaziantep 492 451 537

Şanlıurfa 356 392 357

Iğdır 123 104 152

Malatya 207 260 278

Muş 508 536 720

Page 22: Lecture1 Data

• If we look close enough, we cannot claim that the precipitation is significantly diminished. When these figures from the three data set are analyzed with ANOVA technique (ANalysis Of VAriance), there is no hint for decrease in precipitation. Moreover, in order to talk about for all regions in Turkey, the data should be expended in order to include all cities.