SAS Workshop II
-
Upload
teguh-nugraha -
Category
Technology
-
view
987 -
download
3
description
Transcript of SAS Workshop II
![Page 1: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/1.jpg)
SAS Workshop II
By Teguh Nugraha
![Page 2: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/2.jpg)
Workshop II
12/12/2011ComLabs USDI ITB2
Yang akan kita pelajari: Creating and Printing A SAS Data Set Cleaning and Validating Data Computing Descriptive Statistics Confidence Interval Histogram, Boxplot, Stem-Leaf Hypothesis Test
![Page 3: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/3.jpg)
Creating and PrintingA SAS Data Set
Using DATA step and PROC PRINT
12/12/2011ComLabs USDI ITB3
![Page 4: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/4.jpg)
Sekilas Tentang SAS Data set
12/12/2011ComLabs USDI ITB4
Tipe variabel
![Page 5: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/5.jpg)
Reading Instream Data
12/12/2011ComLabs USDI ITB5
Dengan DATA step, kita dapat membuat data set barudengan menuliskan datanya di dalam SAS programDATA output-SAS-data-set;
INPUT variable <$> variable;DATALINES;instream data;
RUN;
Khusus untuk variabel dengan tipe karakter, kita tuliskantanda dollar $ setelah variabel tersebut
![Page 6: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/6.jpg)
Reading Instream Data
12/12/2011ComLabs USDI ITB6
ContohDATA work.beratbadan;
INPUT Nama $ Berat_Badan;DATALINES;
Teguh 60Huda 100Eric 55Johannes 58;RUN;
![Page 7: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/7.jpg)
Reading Instream Data
12/12/2011ComLabs USDI ITB7
Memberi spesifikasi LENGTH pada variabel bertipe karakterdan informat pada variabel
DATA work.beratbadan;LENGTH Nama $ 12;INPUT Nama $ Tanggal_Lahir :ddmmyy10. Berat_Badan;DATALINES;
Teguh 29/10/1991 60Huda 01/01/1990 100Eric 01/01/1991 55Johannes 01/01/1992 58;RUN;
![Page 8: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/8.jpg)
Printing Data
12/12/2011ComLabs USDI ITB8
Printing data subsetsalesPROC PRINT data=work.beratbadan NOOBS;RUN;
Print variabel-variabel tertentu sajaPROC PRINT data=work.beratbadan NOOBS;
VAR Nama Berat_Badan;RUN;
Print observasi tertentu sajaPROC PRINT data=work.beratbadan NOOBS;
VAR Nama Berat_Badan;WHERE Berat_Badan>70;
RUN;
![Page 9: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/9.jpg)
Validating and Cleaning Datafor better analysis
Validate data: using PROC PRINT, PROC FREQCleaning data: Programmatically or using PROC SORT
12/12/2011ComLabs USDI ITB9
![Page 10: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/10.jpg)
Missing Value
12/12/2011ComLabs USDI ITB10
Numeric missing value: dot “.” Character missing value: blank “”
![Page 11: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/11.jpg)
Validating Data
12/12/2011ComLabs USDI ITB11
Terkadang data yang kita miliki tidak seluruhnya valid Kita harus tahu data seperti apa yang tergolong tidak valid Contoh: Missing value Jenis kelamin bukan laki-laki atau perempuan Kemungkinan salah tulis data Data yang tidak logis Pencilan Duplikasi data
![Page 12: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/12.jpg)
Validating Data
12/12/2011ComLabs USDI ITB12
Using PROC PRINTPROC PRINT data=orion.nonsales;
VAR Employee_ID Gender Salary Job_Title CountryBirth_Date Hire_Date;
WHERE Employee_ID = . orGender not in ('F','M') orSalary not between 24000 and 500000 orJob_Title = ' ' orCountry not in ('AU','US') orBirth_Date > Hire_Date orHire_Date < '01JAN1974'd;
RUN;
![Page 13: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/13.jpg)
Validata Data
12/12/2011ComLabs USDI ITB13
Using PROC FREQ PROC FREQ dengan option NLEVEL akan menghitung
frekuensi data missing value Option /noprint pada TABLE statement menahan
prosedur agar analisis frekuensi dari setiap variabel tidaktampil
PROC FREQ data=orion.nonsales NLEVELS;TABLE _all_ / noprint;
RUN;
![Page 14: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/14.jpg)
Validating Data
12/12/2011ComLabs USDI ITB14
Untuk mengetahui data pencilan atau data ekstrim,kita dapat melihat statistik deskriptif menggunakan PROCUNIVARIATE
Prosedur untuk menampilkan statistik deskriptif akandipelajari nanti
![Page 15: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/15.jpg)
Cleaning DATA Programmaticaly
12/12/2011ComLabs USDI ITB15
Prosedur di bawah ini akan mengubah nilai data yang tidak valid pada orion.nonsaleslalu menyimpan hasilnya di work.clean
DATA work.clean;SET orion.nonsales;Country=upcase(Country);if Employee_ID=120106 then Salary=26960;else if Employee_ID=120115 then Salary=26500;else if Employee_ID=120191 then Salary=24015;else if Employee_ID=120107 then
Hire_Date='21JAN1995'd;else if Employee_ID=12011 then
Hire_Date='01NOV1978'd;else if Employee_ID=121011 then
Hire_Date='01JAN1998'd;RUN;
![Page 16: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/16.jpg)
Cleaning Data
12/12/2011ComLabs USDI ITB16
Removing Duplicates Using PROC SORT Prosedur di bawah ini akan menghilangkan duplikasi
Employee_ID pada orion.nonsalesdupes lalu menyimpanhasilnya di work.sorted
PROC SORT data=orion.nonsalesdupes out=work.sortednodupkey;
BY Employee_ID;RUN;
![Page 17: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/17.jpg)
Computing Descriptive Statistics
Using PROC MEANS and PROC UNIVARIATE
12/12/2011ComLabs USDI ITB17
![Page 18: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/18.jpg)
Statistik Deskriptif
12/12/2011ComLabs USDI ITB18
Tujuannya untuk mengetahui parameter-parameter darisuatu variabel numerik pada data, yaitu: Mean, Median, Mode Standard Deviation, Standard Error Coeff of Variation Sum, Sum Weights Minimum, Maksimum, Range Skewness, Kurtosis N Missing Value
![Page 19: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/19.jpg)
Contoh kasus
12/12/2011ComLabs USDI ITB19
Misalnya kita ingin mengetahui statistik deskriptif beratbadan (WEIGHT) orang-orang yang suka fitness
Eksekusi data_fitness.sas untuk membuat data set baruorion.fitness
![Page 20: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/20.jpg)
PROC MEANS
12/12/2011ComLabs USDI ITB20
Biasanya untuk mengetahui satu atau beberapa paramaterstatistik deskriptif sajaPROC MEANS data=orion.fitness
n mean median mode std var q1 q3 qrange;VAR Weight;
RUN;
Berdasarkan kategori variabel tertentu (Country, Gender)PROC MEANS data=orion.fitness
n mean median mode std var q1 q3 qrange;CLASS Gender;VAR Weight;
RUN;
![Page 21: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/21.jpg)
PROC UNIVARIATE
12/12/2011ComLabs USDI ITB21
Lebih lengkap dibandingkan dengan PROC MEANS. Biasanyadigunakan untuk menganalisis karakteristik data secaramenyeluruh
PROC UNIVARIATE data=orion.fitness;VAR Weight;CLASS Gender;
RUN;
Menampilkan semua analisis: tambahkan option ALLPROC UNIVARIATE data=orion.fitness ALL;
VAR Weight;CLASS Gender;
RUN;
![Page 22: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/22.jpg)
Confidence Interval
Using PROC MEANS or PROC UNIVARIATE
12/12/2011ComLabs USDI ITB22
![Page 23: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/23.jpg)
Confidence Interval
12/12/2011ComLabs USDI ITB23
Menghitung penaksiran selang untuk rataan, standar deviasi,dan variansi populasi
Bergantung pada tingkat kepercayaan (1-α) Misalnya kita ingin mengetahui taksiran selang rataan
berat para peserta fitness dengan tingkat kepercayaan95% (α = 0.05)
![Page 24: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/24.jpg)
Using PROC MEANS
12/12/2011ComLabs USDI ITB24
Gunakan option CLM dan ALPHA=PROC MEANS data=orion.fitness CLM ALPHA=0.05;
VAR Weight;TITLE ’95% Confidence Interval for Weight';
RUN;TITLE;
Kita dapat mengubah nilai alpha untuk tingkat kepercayaantertentu
![Page 25: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/25.jpg)
Using PROC UNIVARIATE
12/12/2011ComLabs USDI ITB25
Gunakan option CIBASIC(ALPHA=...)PROC UNIVARIATE data=orion.fitness cibasic(alpha=0.05);
VAR Weight;RUN;
Lihat hasilnya di bagian Basic Confidence Limits AssumingNormality
![Page 26: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/26.jpg)
Histogram, Boxplot, Stem-Leaf
Using PROC UNIVARIATE and PROC SGPLOT
12/12/2011ComLabs USDI ITB26
![Page 27: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/27.jpg)
Creating Histogram
12/12/2011ComLabs USDI ITB27
Dengan menggunakan PROC UNIVARIATE, kita jugadapat menampilkan histogram data
PROC UNIVARIATE data=orion.fitness noprint;HISTOGRAM Weight / normal(mu=est sigma=est);INSET skewness kurtosis / position=ne;
RUN;
Histogram tersebut dibandingkan dengan distribusinormal dengan danx S
![Page 28: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/28.jpg)
Box-plot dan Stem-leaf
12/12/2011ComLabs USDI ITB28
Kita dapat menampilkan Stem-Leaf dan Box-Plot datadengan menulis option plot pada PROC UNIVARIATE
PROC UNIVARIATE data=orion.fitness plot;var Weight;
RUN;
Hasilnya dapat dilihat di bagian Plot
![Page 29: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/29.jpg)
Box-plot
12/12/2011ComLabs USDI ITB29
Untuk menampilkan box-plot dalam bentuk file gambar,kita gunakan PROC SGPLOT
PROC SGPLOT data=orion.fitness;HBOX Weight / datalabel=Name;VBOX Weight / datalabel=Name;TITLE "Box Plots of Weight";
RUN;TITLE;
![Page 30: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/30.jpg)
Hypothesis Test
Using PROC UNIVARIATE
12/12/2011ComLabs USDI ITB30
![Page 31: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/31.jpg)
Hypothesis Test
12/12/2011ComLabs USDI ITB31
Uji hipothesis akan mengecek apakah rataan suatuvariabel/populasi sesuai dengan perkiraan(hipotesis) kita
Misalnya kita punya hipotesis awal bahwa rataan variabelweight pada data set orion.fitness sama dengan modusnya,yaitu 73.37. Lalu kita ingin mengecek kebenaran hipotesistersebut dengan tingkat kepercayaan 95% (α = 0.05)
Setelah pengujian dilakukan, kita cek p-value (Pr) Jika Pr < α maka hipotesis awal ditolak Jika Pr > α maka hipotesis awal tidak ditolak
![Page 32: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/32.jpg)
Hypothesis Test
12/12/2011ComLabs USDI ITB32
Tulis option mu0=73.37 pada PROC UNIVARIATE
PROC UNIVARIATE data=orion.fitness mu0=73.37 alpha=0.05;VAR weight;TITLE='Uji Hipotesis Apakah Rataan Weight=73.37‘;
RUN;TITLE;
Lihat hasilnya pada Test for Location
![Page 33: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/33.jpg)
Hypothesis Test
12/12/2011ComLabs USDI ITB33
Perhatikan hasil pada Test for Location Karena Pr < α maka tolak hipotesis awal (mu0=73.37) Jadi, rataan variabel weight pada data set orion.fitness bukan
73.37
![Page 34: SAS Workshop II](https://reader038.fdocuments.in/reader038/viewer/2022110114/54791557b4af9f3d2f8b475d/html5/thumbnails/34.jpg)
Hypothesis Test
12/12/2011ComLabs USDI ITB34
Agar output yang dihasilkan hanya bagian Test forLocation, kita tambahkan ODS statement seperti berikut
ODS select testsforlocation;PROC UNIVARIATE data=orion.fitness mu0=73.37 alpha=0.05;
VAR weight;TITLE='Uji Hipotesis Apakah Rataan Weight=73.37‘;
RUN;TITLE;