Machine Learning

Machine Learning in R, Packages

Machine Learning Algorithms

1. Prediction 

predict function

e.g.

> predicted_values <- predict(lm_model, newdata=as.data.frame(cbind(x1_test, x2_test)))

2. Apriori 

install arules package

the dataset must be a binary incidence matrix

e.g.

> dataset <- read.csv(“C:\\Datasets\mushroom.csv”, header=TRUE)

> mushroom_rules <- apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9))

> summary(mushroom_rules)

> inspect(mushroom_rules)

3. Logistic Regression

No extra package is needed.

>glm_mod <- glm(y~ x1+x2, family=binomial(link=”logit”), data=as.data.frame(cbind(y,x1,x2)))

4. K-Means Clustering

No extra package is needed.

If X is the data matrix and m is the number of clusters, then the command is:

> kmeans_model <- kmeans(x=X, centers=m)

5. k-Nearst Neighbor Classification

intall class package

Let X_train and X_test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples.

For k equal to K, the command is:

> knn_model <- kn(train=X_train, test=X_test, cl=as.factor(labels), k=K)

Then knn_model is a factor vector of class attributes for the test set.

6. Naive Bayes

Install e1071 package

> nB_model <- naiveBayes(y~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)))

7. Decision Trees (CART)

CART is implemented in the rpart package.

> cart_model <- rpart(y ~ x2 + x2, data=as.data.frame(cbind(y, x1, x2)), method=”class”)

8. AdaBoost

There are a number of different boosting functions in R.

One implementation that uses decision trees as base classifiers. Thus the rpart package should be loaded.

The boosting function ada is in the ada package.

Let X be the matrix of features, and labels be a vector of 0-1 class labels.

> boost_model <- ada(x=X, y=labels)

9. Support Vector Machines (SVM)

e1071 package

Let X be the matrix features, and labels be a vector of 0-1 class labels.

Let the regularization parameter be C.

> svm_model <- sum(x=X, y=as.factor(labels), kernel = “radial”, cost=c)

> summary(svm_model)

Key terminology in Machine Learning

Features/ attributes/ features: features or attributes are the individual measurements that, when combined with other features, make up a training example. 

Instance: made up of features

Training set: the set of training examples we’ll use to train our machine learning algorithms.

Target variable: what we’ll be trying to predict with our machine learning algorithms.

Knowledge representation, what the machine has learned. May be in the form of a set of rules; it may be a probability distribution or an example from the training set.