Statistics analysis with GNU PSPP161.246.38.75/download/rms/chap10_pspp.pdf ·...

Post on 07-Oct-2020

0 views 0 download

Transcript of Statistics analysis with GNU PSPP161.246.38.75/download/rms/chap10_pspp.pdf ·...

Statistics analysis with GNU PSPP

Asst.Prof.Dr.Supakit Nootyaskool

Information Technology, KMITL

Version: 3

PSPP

• Statistics analysis tool • Supporting input data file from SPSS • Syntax and menu similarity with SPSS.

– Cannot create data graph or chart • Use gnuplot software

• Free license under GPLv3 conditions • Run on various operating systems

– Linux (OpenBSD, NetBSD, FreeBSD) – Mac – Window

• www.gnuorg/software/pspp

Processing data

• Data from the questionnaire

• Data conversion

• Data analysis and discussion

• Interpret result

• Writing summary result

Questionnaire

Data conversion and data entering

PSPP: Data View & Variable View

Data type and data size

Frequency analysis and plotting histogram chart

Skewness ค่าความเบ้ Kurtosis ค่าความโด่ง S.E.mean ค่าความคลาดเคลื่อนมาตราฐานของค่าเฉล่ีย

Output

Descriptive: Crosstab

Descriptive: Crosstab

• Pearson chi-squared is a statistical test telling a difference between the sets.

• for large data set and unpaired data • PSPP set confidential at 95%

df = The degree of freedom The degree of freedom is an value giving by the number of observation minus the number of sample n = 2 3 3 2 1, total number of items = 5 we sample 4 df = 5 – 4 = 2

Comparative Mean

Difficult interpret?

• Age/Status – 30.5year old in regular study

• Sex/Status – 1.45 sex in regular study

• 1 = man, 2 = woman

Transform: Compute

PSPP:Part2

Asst.Prof. Dr.Supakit Nootyaskool

Information Technology, KMITL

Variable in Research

• Independent variable ตวัแปรต้น, ตวัแปรอิสระ

• Dependent variable ตวัแปรตาม: changing the

variable value depends (effect from) on independent variable.

Statistic

Descriptive

• freq.

• mean

• mode

• median

• variant

• standard deviation

Inference

• one-sample t-test

• independent sample t-test

• paired-sample t-test

• one-way ANOVA

• chi-square test

• correlation analysis

• regression analysis

Example1: snack foods

• A company that produce snack foods uses a machine to package by each of bags weight 454 g.

• The quality-assurance (QA) takes a random sample of 24 bags.

• Data: 465, 456, 438, 454, 447, 449, 442, 449, 446, 447, 468, 433, 454, 463, 450, 446, 447, 456, 452, 444, 447, 456, 456, 435

Descriptive statistic

• Data 465, 456, 438, 454, 447, 449, 442, 449, 446, 447, 468, 433, 454, 463, 450, 446, 447, 456, 452, 444, 447, 456, 456, 435

• Analysis

– Freq.

– Mean, Mode, Median

– Variant

– Standard deviation

T-test

• T-test is significance analysis by focus in the correlation coefficient of data.

• T-test uses to check the significant between two data sets.

• Uses both

– Testing independent of data

– Testing dependent of data

Inference

One group (constant)

Two group

Independent group

dependent group

More than two group

Correlation

Qualitative

Quantitative

Factor of dependent

variable

One-sample T test

Five steps for testing hypothesis

1. State null (H0) and alternate (Ha) hypotheses

2. Select a level of significant, traditionally,

level at 0.05 (95%) for consumer research project

level at 0.01 (99%) for quality assurance

level at 0.1 (10%) for political polling

Null hypothesis Does not reject H0 Reject H0

H0 is TRUE Correct decision Type I: Error

H0 is FALSE Type II: Error Correct decision

Null hypothesis Does not reject H0 เป็นโรคกระเพาะ

Reject H0 ไม่เป็นโรคกระเพาะ

H0 is TRUE เป็นจริง

Correct decision Peter ได้ยารักษา

Type I: Error Peter ไม่เป็นโรคกระเพาะ

แตห่มอให้ยารักษา

H0 is FALSE เป็นเทจ็

Type II: Error Peter เป็นโรคกระเพาะแต่หมอบอกว่าไม่เป็น ไม่ได้ยา

รักษา

Correct decision Peter ไม่เป็น

H0 สมมตุิฐานวา่ Peterเป็นโรคกระเพาะอาหารอกัเสบ

Null hypothesis Does not reject H0 ต้องมีรถไฟความเร็วสูง

Reject H0 ไม่จ าเป็นต้องมี

H0 is TRUE เป็นจริง

Correct decision ได้สร้างรถไฟ

Type I: Error จริงๆไม่ต้องสร้างแตผ่ลว่าสร้าง

ก็ต้องท า

H0 is FALSE เป็นเทจ็

Type II: Error อยากมีและจ าเป็นด้วยแตก็่ไม่

สร้าง

Correct decision ไม่สร้างรถไฟ ซึง่ไม่จ าเป็นจริงๆ

H0 สมมตุิฐานวา่เราต้องมีรถไฟความเร็วสงู

Five steps for testing hypothesis

3. Select the test statistic

z-test

𝑧 =𝑋 − 𝜇

𝜎/ 𝑛

t-test

Xbar = Sampling distribution Mu = normal distribution with mean Sigma = a standard deviation

Five steps for testing hypothesis

4. Formulate the decision rule

consideration z value with the critical value

Do not reject H0

Region of rejection

Critical value

Hypothesis test for one population mean

Null hypothesis Alternative hypothesis

H0: μ = u0

H0: μ >= u0

H0: μ <= u0

Ha: μ <> u0

Ha: μ < u0

Ha: μ > u0

u0 : constant

Read H-sub zero or Null hypothesis

Read H-sub one or Alternate hypothesis

KEY OF Difference between H0 and Ha

• Null hypothesis – main hypothesis – normal condition or situation

• Alternative hypothesis – difference way – researcher thinking

Example Electrical supply at 220v and tester think the electrical not 220v H0: u = 220v Ha: u <> 220v

Hypothesis test for one population mean

Null hypothesis Alternative hypothesis

H0: μ = u0

H0: μ >= u0

H0: μ <= u0

Ha: μ <> u0

Ha: μ < u0

Ha: μ > u0

u0 : constant

Example2: Weight of snack food

• H0: μ = 454 grams

– Meaning, the packaging machine is worked accuracy.

• Ha: μ <> 454 grams

– Meaning, the packaging machine is not worked accuracy.

• The significant level

– alpha = 0.05

Population mean (Mu) equal to 454grams

significant levels at 95%

Example2: Weight of snack food

• H0: μ = 454 grams

– Meaning, the packaging machine is worked accuracy.

• Ha: μ <> 454 grams

– Meaning, the packaging machine is not worked accuracy.

• The significant level

– alpha = 0.05

alpha = 0.05 p = 0.033 we found [alpha > p] = True ; 0.05 > 0.033 ,so that reject H0. Conclusion: The packaging machine do not properly to pack at 454grams at significant level 0.05

Example3: Calcium levels

• A nutritionist thinks the average person with income below the poverty level gets less than 800mg of calcium.

• Sample of 18 poverty peoples • Data

– 686, 433, 743, 647, 734, 641, 993, 620, 574, 634, 850, 858, 992, 775, 1113, 672, 879, 609

• Question: the data provide sufficient evidence to

conclude that the mean calcium intake of all people with income below the poverty level is less than 800mg?

poverty (n) ความยากจน intake (n) ปริมาณท่ีบริโภค

Example3: Calcium levels

• H0: μ >= 800mg

• Ha: μ < 800mg

• Significant levels

– alpha = 0.05

• data

– 686, 433, 743, 647, 734, 641, 993, 620, 574, 634, 850, 858, 992, 775, 1113, 672, 879, 609

• H0: μ >= 800mg; Ha: u < 800mg • alpha = 0.05 • p = 0.212 /2 (two way) = 0.106 • [alpha > p] = False ; 0.05 > 0.106 , so that not reject H0

• Conclusion: The mean calcium intake of the poverty people

is not less than 800mg at the 0.05 significance level.

Inference

One group (constant)

Two group,

Two samples

Independent group

dependent group

More than two group

Correlation

Qualitative

Quantitative

Factor of dependent

variable

Independent sample T-test

T-test: Two groups

Null hypothesis Alternative hypothesis

H0: μ1 = u2

H0: μ1 >= u2

H0: μ2 <= u2

Ha: μ1 <> μ2

Ha: μ1 < μ2

Ha: μ1 > μ2

Example4: Hospital costs

• Sample the costs per day between public hospital and private hospital

• public hospital

– 633, 616, 659, 535, 666, 675, 524, 746, 585, 748, 696, 609

• private hospital

– 790, 587, 997, 735, 852, 686, 839, 545, 724, 554, 889, 797, 722, 484, 579

Levene’s test for equality

• H0: u1 = u2; Ha: u1 <> u2

• alpha = 0.05

• p = 0.31/2 = 0.0155

• [alpha>p] = True, so that reject H0

Equal variance not assumed • H0: u1 >= u2; Ha: u1 < u2 • alpha = 0.05 • p = 0.086/2 = 0.043 • [alpha > p] = True , so that reject H0 • Conclude: average cost of public hospital is lower

than private hospital at 0.05 significant level.

Inference

One group (constant)

Two group

Independent group

dependent group

More than two group

Correlation

Qualitative

Quantitative

Factor of dependent

variable

Paired-sample T test

Two dependent group

• Mean value of two groups are dependent or having relation. For example to apply:

– Examination: Pre-test/Post-test

– Applying something: Before/After

– Similarity between someone or something: A/B

Example 5: running

• An exercise physiologist wants to determine whether a certain type of running program will reduce heat rates.

• Sample 15 people and keep data before running and after one year doing exercise.

Person Before After

1 68 67

2 76 77

3 74 74

4 71 74

5 71 69

6 72 70

7 75 71

8 83 77

Person Before After

9 75 71

10 74 74

11 76 73

12 77 68

13 79 71

14 75 72

15 75 77

Example 5: running

• H0: u1 <= u2; Ha: u1 > u2

• u1: the heart rate of before-variable

• u2: the heart rate of after-variable

• The significant level

– alpha = 0.05

• p = 0.018/2 = 0.009

• [alpha > p] = True, so that reject H0

• Conclude: the running program will reduce heart rate at 0.05 significance level.

Inference

One group (constant)

Two group

Independent group

dependent group

More than two group

Correlation

Qualitative

Quantitative

Factor of dependent

variable

ANOVA (ANalysis Of Variance)

Acceptance region/ Rejection region

Example 6: Bearing Vibration

• A hard-disk company tests vibration of bearing for installation in the hard-disk. The bearing from five brands, sampling 6 items, there are result

• The company studying, are bearing difference ?

Brand1 Brand2 Brand3 Brand4 Brand5

13.1 16.3 13.7 15.7 13.5

15 15.7 13.9 13.7 13.4

14 17.2 12.4 14.4 13.2

14.4 14.9 13.8 16 12.7

14 14.4 14.9 13.9 13.4

11.6 17.2 13.3 14.7 12.3

Example 6: Bearing Vibration

• Hypothesis

– H0: u1 = u2 = u3 = u4 = u5

– Ha: u1 <> u2 <> u3 <> u4 <> u5

• Significant level

– Alpha = 0.05

1) [F > FcriticalValue] = true 2) [alpha > p] = true So, Reject H0

Inference

One group (constant)

Two group

Independent group

dependent group

More than two group

Correlation

Qualitative

Quantitative

Factor of dependent

variable

Pearson correlation Spearman correlation

Regression analysis

Reference

• น.ท.หญิง วชัราพร เชยสวุรรณ์, “t-test”, เอกสาร slide, http://www.nmd.go.th/document/ppt/research/t_test2.ppt

• “การวิเคราะห์ข้อมลูทางสถิติเพ่ือการวิจยั ด้วย SPSS”, สาขาวิชาคณิตศาสตร์และเทคโนโลยี ม.เทคโนโลยีราชมงคลสวุรรณภมูิ, เอกสารสอน slide, http://www.rdi.rmutsb.ac.th/2011/download/spss.ppt

• Wipa Sae-Sia, “Analysis of Variance: ANOVA การวิเคราะห์ความแปรปรวณ” เอกสาร slide, http://hsmi.psu.ac.th/upload/forum/anova_ancova.ppt

• ฉตัรศิริ ปิยะพิมลสิทธ์ิ, “ การใช้ SPSS เพ่ือการวิเคราะห์ข้อมลู”, 2548, http://www.watpon.com/spss

• Douglas A. Lind, William G Marchal, Samuel A. Wathen, “Basic Statistics for Business and Economics”, McGraw-Hill international, 2012

Summary

• T-test

• Z-test

• H0

• Ha

• Error type I , II

• Significant level