Machine Learning Algorithms

**1. Prediction **

predict function

e.g.

> predicted_values <- predict(lm_model, newdata=as.data.frame(cbind(x1_test, x2_test)))

**2. Apriori **

install *arules* package

the dataset must be a binary incidence matrix

e.g.

> dataset <- read.csv(“C:\\Datasets\mushroom.csv”, header=TRUE)

> mushroom_rules <- apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9))

> summary(mushroom_rules)

> inspect(mushroom_rules)

**3. Logistic Regression**

No extra package is needed.

>glm_mod <- glm(y~ x1+x2, family=binomial(link=”logit”), data=as.data.frame(cbind(y,x1,x2)))

**4. K-Means Clustering**

No extra package is needed.

If X is the data matrix and m is the number of clusters, then the command is:

> kmeans_model <- kmeans(x=X, centers=m)

**5. k-Nearst Neighbor Classification**

intall *class* package

Let X_train and X_test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples.

For k equal to K, the command is:

> knn_model <- kn(train=X_train, test=X_test, cl=as.factor(labels), k=K)

Then knn_model is a factor vector of class attributes for the test set.

**6. Naive Bayes**

Install *e1071* package

> nB_model <- naiveBayes(y~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)))

**7. Decision Trees (CART)**

CART is implemented in the *rpart* package.

> cart_model <- rpart(y ~ x2 + x2, data=as.data.frame(cbind(y, x1, x2)), method=”class”)

**8. AdaBoost**

There are a number of different boosting functions in R.

One implementation that uses decision trees as base classifiers. Thus the *rpart* package should be loaded.

The boosting function **ada** is in the *ada* package.

Let X be the matrix of features, and labels be a vector of 0-1 class labels.

> boost_model <- ada(x=X, y=labels)

**9. Support Vector Machines (SVM)**

*e1071* package

Let X be the matrix features, and labels be a vector of 0-1 class labels.

Let the regularization parameter be C.

> svm_model <- sum(x=X, y=as.factor(labels), kernel = “radial”, cost=c)

> summary(svm_model)