Ciri – ciri Supervised Learning menggunakan data latih (data train) untuk melakukan prediksi. Berikut merupakan perbandingan penulisan code beberapa algoritma supervised learning di R dan Python.
k-Nearest Neighbor
k-Nearest Neighbor bertujuan untuk melakukan prediksi dengan tujuan klasfikasi objek. Untuk menerapkan k-NN dibutuhkan beberapa input yaitu
- Banyak tetangga terdekat (misal = 3)
- data train = X_train
- data test = X_test
- variabel outcome train = y_train
- variabel outcome test = y_test
# R library(class) knn(train = X_train, test = X_test, cl = y_train , k=3)
# Python from sklearn.neighbors import KNeighborsClassifier knn_result = KNeighborsClassifier(n_neighbors=3) knn_result.fit(X_train,y_train) predict = knn_result.predict(X_test) predict
Naive Bayes Classifier
Algoritma ini memprediksi objek dengan tujuan klasifikasi berdasarkan nilai peluang yang paling tinggi.
## R code library(naivebayes) # Data train vs data test set.seed(123) ind <- sample(x=2,size = nrow(dt),replace = T,prob = c(0.8,0.2)) train <- dt[ind==1,] test <- dt[ind==2,] # Fitting Model model <- naive_bayes(admit ~ .,data=train) predict(model,test)
#Python code import pandas as pd import numpy as np from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() y_pred = gnb.fit(X_train, y_train).predict(X_train) print(y_pred )
Support Vector Machine
Prinsip metode ini adalah berusaha menemukan hyperplane (garis pemisah) antar kategori sedemikian hingga margin antar kategori adalah maximum (find the widest range). Kita butuh beberapa input yaitu
- data train = X_train
- data test = X_test
- variabel outcome train = y_train
- variabel outcome test = y_test
## Python Code # Library for analysis import numpy as np import pandas as pd from sklearn import svm # Library for visuals import matplotlib.pyplot as plt import seaborn as sns ;sns.set(font_scale=1.2) from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2) from sklearn.svm import SVC model = SVC() model.fit(X_train,y_train) model.score(X_test,y_test)
## R code # Load data dt <- iris head(dt) # Split data into train vs test library(caTools) set.seed(123) dt$status = sample.split(dt,SplitRatio = 0.80) # get data train & data test train = subset(dt,status==TRUE) test = subset(dt,status==FALSE) # Fit model model <- svm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width ,train,kernel='linear') print(model) # Making Prediction pred1 <- predict(model,train) pred2 <- predict(model,test) # Evaluasi # vs data train full_data1 <- cbind(train,pred1) head(full_data1) table(full_data1$Species,full_data1$pred) full_data2 <- cbind(test,pred2) table(full_data2$Species,full_data2$pred2)