Programming Language

Machine Learning in R, Packages

Machine Learning Algorithms

1. Prediction 

predict function

e.g.

> predicted_values <- predict(lm_model, newdata=as.data.frame(cbind(x1_test, x2_test)))

2. Apriori 

install arules package

the dataset must be a binary incidence matrix

e.g.

> dataset <- read.csv(“C:\\Datasets\mushroom.csv”, header=TRUE)

> mushroom_rules <- apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9))

> summary(mushroom_rules)

> inspect(mushroom_rules)

3. Logistic Regression

No extra package is needed.

>glm_mod <- glm(y~ x1+x2, family=binomial(link=”logit”), data=as.data.frame(cbind(y,x1,x2)))

4. K-Means Clustering

No extra package is needed.

If X is the data matrix and m is the number of clusters, then the command is:

> kmeans_model <- kmeans(x=X, centers=m)

5. k-Nearst Neighbor Classification

intall class package

Let X_train and X_test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples.

For k equal to K, the command is:

> knn_model <- kn(train=X_train, test=X_test, cl=as.factor(labels), k=K)

Then knn_model is a factor vector of class attributes for the test set.

6. Naive Bayes

Install e1071 package

> nB_model <- naiveBayes(y~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)))

7. Decision Trees (CART)

CART is implemented in the rpart package.

> cart_model <- rpart(y ~ x2 + x2, data=as.data.frame(cbind(y, x1, x2)), method=”class”)

8. AdaBoost

There are a number of different boosting functions in R.

One implementation that uses decision trees as base classifiers. Thus the rpart package should be loaded.

The boosting function ada is in the ada package.

Let X be the matrix of features, and labels be a vector of 0-1 class labels.

> boost_model <- ada(x=X, y=labels)

9. Support Vector Machines (SVM)

e1071 package

Let X be the matrix features, and labels be a vector of 0-1 class labels.

Let the regularization parameter be C.

> svm_model <- sum(x=X, y=as.factor(labels), kernel = “radial”, cost=c)

> summary(svm_model)

JHU R Programming Course Note

Gathering long list of file names:

files_list <- list.files(directory, full.name=TRUE)

Construct a data.frame by rowbind

dat <- rbind(dat,read.csv(files_list[i]))

#Understanding lexical scoping
#<- operator vs. <<- operator
crazy <- function() {
x <<- 3.14 #variable x in the containing environment is updated to be 3.14
print(x) #no local variable x exists within functin ‘crazy’ R searches the containing environments
{print(x);
x <- 42; print(x) #local variable x is declared and assigned the value 42; overrides the variable x in
} #the containing environment
print(x) #since local variable x now exists within the function there is no need to search the containing
} #environment

x #variable x outside the function is updated value after the first statement within function crazy()
#the super-assignment operator does not update a variable of the same name inside an inner function but the innermost environment
#inherits any changes unless a local variable of the same name exists within the inner function as demonstrated
#by x <- 42;

crazy <- function() {
x <- 42
x <<- 3.14
print(x)
}

#> x <-0
#> x
#[1] 0
#> crazy()
#[1] 42
#> x
#[1] 3.14

#Declare and define a function named crazy()
crazy <- function() {
x <- 3.14 #asigns the value 3.14 to local variable x not the variable x in the containing environment
print(x)

{print(x);
x <<- 42; # assigns the value 42 to variable x in the containing environment
print(x)
}
print(x)
}

#> x <-0
#> x
#[1] 0
#> crazy()
#[1] 3.14
#[1] 3.14
#[1] 3.14
#[1] 3.14
#> x
#[1] 42

makeVector <- function(x = numeric()){
m <- NULL
set <- function(y){
x <<- y
m <<- NULL
}
get <- function()x
setmean <- function(mean)m <<-mean
getmean <- function()m
list(set=set, get=get,
setmean=setmean,
getmean=getmean)
}

cachemean <- function(x, …){
m <- x$getmean()
if(!is.null(m)){
message(“getting cached data”)
return(m)
}
data <- x$get()
m <- mean(data,…)
x$setmean(m)
m
}

Split and Lapply

        bystate <- split(rawData, rawData$State)

        result <- lapply(bystate, function(x) x[order(x[,11], x[,2]),])

        hospital<- lapply(result, function(x) x[num,c(2,7)])

        rank <- do.call(rbind, hospital)