Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data...

36
Data Mining http://www.unhas.ac.id/amil/S1TIF/DM2020/ L4 Amil Ahmad Ilham

Transcript of Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data...

Page 1: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Data Mininghttp://www.unhas.ac.id/amil/S1TIF/DM2020/

L4

Amil Ahmad Ilham

Page 2: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Algoritma Data Mining (DM)

1. Estimation (Estimasi):• Linear Regression, Neural Network, Support Vector Machine, etc

2. Prediction/Forecasting (Prediksi/Peramalan):• Linear Regression, Neural Network, Support Vector Machine, etc

3. Classification (Klasifikasi):• Naive Bayes, K-Nearest Neighbor, C4.5, ID3, CART, Linear Discriminant Analysis,

Logistic Regression, etc

4. Clustering (Klastering):• K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-Means, etc

5. Association (Asosiasi):• FP-Growth, A Priori, Coefficient of Correlation, Chi Square, etc

2

Page 3: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Evaluasi Model RegresiEvaluasi Model Regresi

Page 4: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Evaluasi Regresi

𝑅𝑀𝑆𝐸 =1

𝑛

𝑗=1

𝑛

𝑦𝑗 − 𝑦𝑗2

Root Mean Squared Error:

Populer karena memberikan nilai dengan skala yang sama dengan vector respon y.

𝑀𝐴𝐸 =1

𝑛

𝑗=1

𝑛

𝑦𝑗 − 𝑦𝑗

Mean Absolute Error:

Mencerminkan rata-rata error.

𝑀𝑆𝐸 =1

𝑛

𝑗=1

𝑛

𝑦𝑗 − 𝑦𝑗2

Mean Squared Error:

Populer karena lebih mengutamaerror yang besar.

Error

Page 5: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Pendekatan Evaluasi (Train/Test Split)

Engine Size Cylinder

Fuel Consumption Co2 Em

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 212

Train

Test

Nilai aktual

Prediksi

6 234

7 256

8 267

9 210

Nilai prediksibandingkan

Page 6: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Pendekatan Evaluasi (Train/Test Split)

• Contoh:

Page 7: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Model Non-linierModel Non-linier

Page 8: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Haruskah menggunakan regresi linier?

8

Data GDP 1960 -2014

Page 9: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Haruskah menggunakan regresi linier?

9

Data GDP 1960 -2014 Tampak seperti fungsi eksponensial atau logistik.

Page 10: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Berbagai jenis regresi

10

Cari fungsi model yang paling cocok

Page 11: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Regresi linier vs non-linier

• Bagaimana cara mengetahui sebuah problem itu linier atau non-linier?

• Inspeksi secara visual (hitung koefisien korelasi antara variable terikat dan bebas; > 0,7 berarti linier)

• Berdasarkan akurasi (karena tidak dapat memodelkan hubungan dengan parameter linier)

• Bagaimana memodelkan data, jika scatter plot menunjukkan sifat non-linier?

• Regresi polynomial

• Regresi non-linier

• “transformasi” data…

11

Page 12: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

TutorialTutorial

12

Page 13: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Linier

• y = ax + b

• Contoh y = 2x + 3

• Gambarkan secara manual grafik y untuk -5 <= x <= 5

13

Page 14: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Linier

y = 2x + 3

Menggunakan Jupyter Notebook, plot y untuk -5 <= x <= 5• import numpy as np• import matplotlib.pyplot as plt• %matplotlib inline• x = np.arange(-5.0, 5.0, 0.5)• y = 2*(x) + 3• plt.plot(x,y, 'r') • plt.ylabel(‘y')• plt.xlabel(‘x')• plt.show()

14

Page 15: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Linier

Misalkan datanya digenerate secara random:• x = np.arange(-5.0, 5.0, 0.4)

• y = 2*(x) + 3

• y_random = 2 * np.random.normal(size=x.size)

• ydata = y + y_random

• plt.plot(x, ydata, ‘bo')

• plt.plot(x,y, 'r')

• plt.ylabel(‘y')

• plt.xlabel(‘x')

• plt.show()

15

Page 16: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier

16

Page 17: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (polynomial)

• y = ax3 + bx2 + cx + d

• Contoh: y = x3 + 2x2 + 3x + 4

• Gambarkan secara manual grafik y untuk -5 <= x <= 5

17

Page 18: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (polynomial)y = x3 + 2x2 + 3x + 4

• Misalkan datanya digenerate secara random:

18

Page 19: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Quadratic)y = x2

• Misalkan datanya digenerate secara random:

19

Page 20: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Exponential)y = ex

• Misalkan datanya digenerate secara random:

20

Page 21: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Logarithmic)y = log x

• Misalkan datanya digenerate secara random:

21

Page 22: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Sigmoidal/Logistic)

• Misalkan datanya digenerate secara random:

22

Page 23: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Sigmoidal/Logistic)

• Misalkan datanya digenerate secara random:

23

Page 24: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Persamaan Non-Linier (Sigmoidal/Logistic)

• Misalkan datanya digenerate secara random:

24

Page 25: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Studi Kasus Regresi Non-linier

• Download file china_gdp.csv di http://www.unhas.ac.id/amil/S1TIF/DM2020/• Klik kanan file => Save Link As => Save as type: All Files

• Buka file baru di Jupyter Notebook

25

Page 26: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Melihat dataset

• Run new jupyter notebook

26

Page 27: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Plotting dataset

27

Page 28: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Memilih model yang cocok dengan dataset(?)

28

Dataset

Exponential Sigmoidal/Logistic

Page 29: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Memilih model yang cocok dengan dataset(?)

29

Dataset

Sigmoidal/Logistic

Page 30: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Membuat Model (Sigmoidal/Logistic)

30

Page 31: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Testing Model (Optional)

31

Page 32: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Normalisasi data

Page 33: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Menentukan nilai beta_1 dan beta_2

33

Page 34: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Hitung y_prediksi

Page 35: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Plot Hasil Regresi Non Linier

35

Page 36: Data Mining - Universitas Hasanuddinunhas.ac.id/amil/S1TIF/DM2020/04 DM 2020.pdf · Algoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support

Tugas• Buat program untuk menghasilkan

36GDP China 1960 - 2014 Prediksi GDP China 2015 - 2030