Supervised Learning

Ciri – ciri Supervised Learning menggunakan data latih (data train) untuk melakukan prediksi. Berikut merupakan perbandingan penulisan code beberapa algoritma supervised learning di R dan Python.

k-Nearest Neighbor
k-Nearest Neighbor bertujuan untuk melakukan prediksi dengan tujuan klasfikasi objek. Untuk menerapkan k-NN dibutuhkan beberapa input yaitu

  • Banyak tetangga terdekat (misal = 3)
  • data train = X_train
  • data test = X_test
  • variabel outcome train = y_train
  • variabel outcome test = y_test
# R
library(class)
knn(train = X_train, test = X_test, cl = y_train , k=3)
# Python
from sklearn.neighbors import KNeighborsClassifier
knn_result = KNeighborsClassifier(n_neighbors=3)
knn_result.fit(X_train,y_train)
predict = knn_result.predict(X_test)
predict

Naive Bayes Classifier
Algoritma ini memprediksi objek dengan tujuan klasifikasi berdasarkan nilai peluang yang paling tinggi.

## R code
library(naivebayes) 

# Data train vs data test
set.seed(123)
ind <- sample(x=2,size = nrow(dt),replace = T,prob = c(0.8,0.2)) 
train <- dt[ind==1,]
test <- dt[ind==2,]

# Fitting Model
model <- naive_bayes(admit ~ .,data=train)
predict(model,test) 
#Python code
import pandas as pd
import numpy as np

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_train)
print(y_pred )

Support Vector Machine
Prinsip metode ini adalah berusaha menemukan hyperplane (garis pemisah) antar kategori sedemikian hingga margin antar kategori adalah maximum (find the widest range). Kita butuh beberapa input yaitu

  • data train = X_train
  • data test = X_test
  • variabel outcome train = y_train
  • variabel outcome test = y_test
## Python Code
# Library for analysis
import numpy as np
import pandas as pd
from sklearn import svm

# Library for visuals
import matplotlib.pyplot as plt
import seaborn as sns ;sns.set(font_scale=1.2)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

from sklearn.svm import SVC
model = SVC()
model.fit(X_train,y_train)
model.score(X_test,y_test)
## R code 
# Load data
dt <- iris
head(dt)

# Split data into train vs test
library(caTools)
set.seed(123)
dt$status = sample.split(dt,SplitRatio = 0.80)

# get data train & data test
train = subset(dt,status==TRUE)
test = subset(dt,status==FALSE)

# Fit model
model <- svm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width ,train,kernel='linear')
print(model)

# Making Prediction
pred1 <- predict(model,train)
pred2 <- predict(model,test)

# Evaluasi
# vs data train
full_data1 <- cbind(train,pred1)
head(full_data1)
table(full_data1$Species,full_data1$pred)

full_data2 <- cbind(test,pred2)
table(full_data2$Species,full_data2$pred2)