Machine Learning in R, Packages

Machine Learning Algorithms

1. Prediction 

predict function

e.g.

> predicted_values <- predict(lm_model, newdata=as.data.frame(cbind(x1_test, x2_test)))

2. Apriori 

install arules package

the dataset must be a binary incidence matrix

e.g.

> dataset <- read.csv(“C:\\Datasets\mushroom.csv”, header=TRUE)

> mushroom_rules <- apriori(as.matrix(dataset), parameter = list(supp = 0.8, conf = 0.9))

> summary(mushroom_rules)

> inspect(mushroom_rules)

3. Logistic Regression

No extra package is needed.

>glm_mod <- glm(y~ x1+x2, family=binomial(link=”logit”), data=as.data.frame(cbind(y,x1,x2)))

4. K-Means Clustering

No extra package is needed.

If X is the data matrix and m is the number of clusters, then the command is:

> kmeans_model <- kmeans(x=X, centers=m)

5. k-Nearst Neighbor Classification

intall class package

Let X_train and X_test be matrices of the training and test data respectively, and labels be a binary vector of class attributes for the training examples.

For k equal to K, the command is:

> knn_model <- kn(train=X_train, test=X_test, cl=as.factor(labels), k=K)

Then knn_model is a factor vector of class attributes for the test set.

6. Naive Bayes

Install e1071 package

> nB_model <- naiveBayes(y~ x1 + x2, data=as.data.frame(cbind(y,x1,x2)))

7. Decision Trees (CART)

CART is implemented in the rpart package.

> cart_model <- rpart(y ~ x2 + x2, data=as.data.frame(cbind(y, x1, x2)), method=”class”)

8. AdaBoost

There are a number of different boosting functions in R.

One implementation that uses decision trees as base classifiers. Thus the rpart package should be loaded.

The boosting function ada is in the ada package.

Let X be the matrix of features, and labels be a vector of 0-1 class labels.

> boost_model <- ada(x=X, y=labels)

9. Support Vector Machines (SVM)

e1071 package

Let X be the matrix features, and labels be a vector of 0-1 class labels.

Let the regularization parameter be C.

> svm_model <- sum(x=X, y=as.factor(labels), kernel = “radial”, cost=c)

> summary(svm_model)

Dispersion correction treatment in DFT

DFT, approximations must be made for how electrons interact with each other.

Standard XC functionals include:

  • Local density approximation (LDA)
  • Generalized gradient approximation (GGA) functionals
  • Hybrid XC functionals

Standard XC functionals do not describe dispersion because:

  1. instantaneous density fluctuations are not considered
  2. they are “short-sighted” in that they consider only local properties to calculate the XC energy

Ground-Binding with incorrect asymptotics

The ground method does not describe the long range asymptotics and give incorrect shapes of binding curves and underestimate the binding of well separated molecules.

The result with LDA for dispersion bonded systems have limited and inconsistent accuracy and the asymptotic form of the interaction is incorrect.

Step one-Simple C6 corrections (DFT-D)

The basic requirement for DFT-based dispersion scheme: it yields reasonable −1/r6 asymptotic behavior for the interaction of particles in the gas phase, where r is the distance between the particles.

Approach: add an additional energy term which accounts for the missing long range attraction.

Four shortcomings:

  • the C6/r^6 dependence represents only the leading term of the correction and neglects both many-body dispersion effects and faster decaying terms such as the C8/r^8 or C10/r^10
  • It is not clear where one should obtain theC6 coefficients. The reliance on experimental data (ionization potentials and polarizabilities) limits the set of elements that can be treated to those present in typical organic molecules.
  • C6 coefficients are kept constant during the calculation, and so effects of different chemical states of the atom or the influence of its environment are neglected.
  • C6/r^6 function diverges for small separations (small r) and this divergence must be removed.

With the simple correction schemes the dispersion correction diverges at short inter-atomic separations and so must be “damped”. The damping function f(rAB, A, B) is equal to one for large r and decreases Edisp to zero or to a constant for small r.

Issues with damping function:

  • The shape of the underlying binding curve is sensitive to the XC functional used and so the damping functions must be adjusted so as to be compatible with each exchange-correlation or exchange functional.
  • This fitting is also sensitive to the definition of atomic size and must be done carefully since the damping function can actually affect the binding energies even more than the asymptotic C6 coefficients.
  • The fitting also effectively includes the effects of C8/r^8 or C10/r^10 and higher contributions.

Step two – Environment-dependent corrections

The simple “DFT-D” schemes: the dispersion coefficients are predetermined and constant quantities. The errors introduced by this approximation can be large.

The unifying concept:

The dispersion coefficient of an atom in a molecule depends on the effective volume of the atom. When the atom is “squeezed”, its electron cloud becomes less polarizable leading to a decrease of the C6 coefficients.

Three step 2 methods:

  • DFT-D3 of Grimme

Capture the environmental dependence of the C6 coefficients by considering the number of neighbors each atom has.

  • vdW(TS) of Tkatchenko and Scheffler

Relies on reference atomic polarizabilities and reference atomic C6 coefficients to calculate the dispersion energy.

During the calculation on the system of interest the electron density of a molecule is divided between the individual atoms and for each atom its corresponding density is compared to the density of a free atom.

  • BJ by Becke-Johnson

Based on the fact that around an electron at r1 there will be a region of electron density depletion, the so-called XC hole. This creates asymmetric electron density and thus non-zero dipole and higher-order electrostatic moments, which causes polarization in other atoms to an extent given by their polarizability.

C6 coefficients are altered through two effects:

  1. The polarizabilities of atoms in molecules are scaled from their reference atom values according to their effective atomic volumes.
  2. The dipole moments respond to the chemical environment through changes of the exchange hole, although this effect seems to be difficult to quantify precisely.

Step three – Long-range density functionals

Approaches that do not rely on external input parameters but rather obtain the dispersion interaction directly from the electron density.

Termed non-local correlation functionals since they add non-local (i.e., long range) correlations to local or semi-local correlation functionals.

JHU R Programming Course Note

Gathering long list of file names:

files_list <- list.files(directory, full.name=TRUE)

Construct a data.frame by rowbind

dat <- rbind(dat,read.csv(files_list[i]))

#Understanding lexical scoping
#<- operator vs. <<- operator
crazy <- function() {
x <<- 3.14 #variable x in the containing environment is updated to be 3.14
print(x) #no local variable x exists within functin ‘crazy’ R searches the containing environments
{print(x);
x <- 42; print(x) #local variable x is declared and assigned the value 42; overrides the variable x in
} #the containing environment
print(x) #since local variable x now exists within the function there is no need to search the containing
} #environment

x #variable x outside the function is updated value after the first statement within function crazy()
#the super-assignment operator does not update a variable of the same name inside an inner function but the innermost environment
#inherits any changes unless a local variable of the same name exists within the inner function as demonstrated
#by x <- 42;

crazy <- function() {
x <- 42
x <<- 3.14
print(x)
}

#> x <-0
#> x
#[1] 0
#> crazy()
#[1] 42
#> x
#[1] 3.14

#Declare and define a function named crazy()
crazy <- function() {
x <- 3.14 #asigns the value 3.14 to local variable x not the variable x in the containing environment
print(x)

{print(x);
x <<- 42; # assigns the value 42 to variable x in the containing environment
print(x)
}
print(x)
}

#> x <-0
#> x
#[1] 0
#> crazy()
#[1] 3.14
#[1] 3.14
#[1] 3.14
#[1] 3.14
#> x
#[1] 42

makeVector <- function(x = numeric()){
m <- NULL
set <- function(y){
x <<- y
m <<- NULL
}
get <- function()x
setmean <- function(mean)m <<-mean
getmean <- function()m
list(set=set, get=get,
setmean=setmean,
getmean=getmean)
}

cachemean <- function(x, …){
m <- x$getmean()
if(!is.null(m)){
message(“getting cached data”)
return(m)
}
data <- x$get()
m <- mean(data,…)
x$setmean(m)
m
}

Split and Lapply

        bystate <- split(rawData, rawData$State)

        result <- lapply(bystate, function(x) x[order(x[,11], x[,2]),])

        hospital<- lapply(result, function(x) x[num,c(2,7)])

        rank <- do.call(rbind, hospital)

Dispersion corrected DFT theory

Commonly used DFs do not describe the long-range dispersion interactions correctly. All semilocal DFs and conventional hybrid functionals asymptotically cannot provide the correct -C6/R^6 dependence of the dispersion interaction energy on the interatomic (molecular) distance R.

The failure of standard DFs comes from its inability to describe instantaneous electron correlations. In  more precise picture, electromagnetic zero-point energy fluctuations in the vacuum lead to ‘virtual’ excitations to allowed atomic or molecular electronic states. The corresponding densities interact electrostatically. They are not represented by conventional (hybrid) functionals that only consider electron exchange but do not employ virtual orbitals.

The computationally most efficient basic approaches to account for London dispersion effects in DFT includes:

  • Nonlocal vdW-DFs
  • ‘pure’ [semilocal (hybrid] DFs, which are highly parameterized forms of standard meta hybrid approximations (e.g., the M0XX family of functionals)
  • DFT-D methods (atom pairwise sum over  -C6/R^6 potentials)
  • Dispersion-correcting atom-centered one-electron potentials (1ePOT, called DCACP or in local variants LAP or DCP)

vdW-DF and Related Methods

(vdW-DF(2004), vdW-DF (2010), VV09, AND VV10)

Currently most widely used form, a nonempirical way to compute the dispersion energy. A supermolecular calculation of the total energy of the complex and the fragments is performed to obtain the interaction energy.

Approximation:

Total exchange-correlation energy Exc = Ex^LDA/GGA + Ec^LDA/GGA+Ec^NL

LDA (local density approximation) or GGA (semilocal) type are used for the short-ranged parts;

Ec^NL represents the nonlocal term describing the dispersion energy

Nonlocal vs Local (examination on the terminology)

In the DFT community, the dispersion energy is understood as an inherently nonlocal property. It must be described by a kernel which depends on two electron coordinates simultaneously.

In a WF picture, long-range dispersion has no nonlocal component.

Physically, dispersion is the Coulomb interaction between (local, fragment centered) transition densities. These can be plotted like ‘normal’ densities and have no ‘mysterious’ nonlocal character (except that virtual orbitals are needed for their construction).

Summary

  • Typically, Ec^NL is computed non-self-consistently, i.e., it is simply an add-on to the self-consistent filed (SCF)-DF energy similar as in DFT-D.
  • vdW-DF works better with short-range components that are basically repulsive such as Hartree-Fock.
  • Dispersion effects are naturally included via the charge density so that charge-transfer dependence of dispersion is automatically included in a physically sound manner. If performed self-consistently, the correction in turn also changes the density.
  • Whether double-counting effects of correlation at short range are present in the mentioned vdW-DFs is currently unknown.

Reference

Grimme, S., Density functional theory with London dispersion corrections. Wiley Interdisciplinary Reviews: Computational Molecular Science 2011, 1 (2), 211-228.

Key terminology in Machine Learning

Features/ attributes/ features: features or attributes are the individual measurements that, when combined with other features, make up a training example. 

Instance: made up of features

Training set: the set of training examples we’ll use to train our machine learning algorithms.

Target variable: what we’ll be trying to predict with our machine learning algorithms.

Knowledge representation, what the machine has learned. May be in the form of a set of rules; it may be a probability distribution or an example from the training set. 

 

 

 

SOCIAL PSYCHOLOGY LECTURE NOTES 1.2

Hindsight bias:

Also known as the knew-it-all-along effect or creeping determinism, is the inclination, after an event has occurred, to see the event as having been predictable, despite there having been little or no objective basis for predicting it, prior to its occurrence. 

Hindsight bias may cause memory distortion, where the recollection and reconstruction of content can lead to false theoretical outcomes. 

Change blindness:

Change blindness is a surprising perceptual phenomenon that occurs when a change in a visual stimulus is introduced and the observer does not notice it. 

Motion aftereffect:

MAE is a visual illusion experienced after viewing a moving visual stimulus for a time (tens of milliseconds to minutes) with stationary eyes, and then fixating a stationary stimulus. The stationary stimulus appears to move in the opposite direction to the original (physically moving) stimulus. The motion aftereffect is believed to be the result of motion adaptation. 

    Explanation:  Neurons coding a particular movement reduce their responses with time of exposure to a constantly moving stimulus; this is neural adaptation. Neural adaptation also reduces the spontaneous, baseline activity of these same neurons when responding to a stationary stimulus. One theory is that perception of stationary objects, for example rocks beside a waterfall, is coded as the balance among the baseline responses of neurons coding all possible directions of motion. Neural adaptation of neurons stimulated by downward movement reduces their baseline activity, tilting the balance in favor of upward movement. 

    An alternative explanation for the MAE is based on an increase in excitability of neurons having a preference for a direction that is opposite to the adapting direction. Adapting direction-selective neurons hyperpolarize due to long duration intracellular sodium and calcium ion accumulation. This causes extracellular imbalances and an increase in brain tissue excitability; which spreads via ionic diffusion in extracellular space and glial assisted mechanisms. This causes the opposite direction neurons to spike when a stationary stimulus is presented, because these neurons have no hyperpolarizing intracellular imbalances but get surrounded by depolarizing extracellular imbalances. 

  

Social Psychology Lecture Notes 1.1

Basic Types of Psychology: 

 Abnormal, Biological, Cognitive, Comparative, Cultural, Differential, Developmental, Evolutionary, Experimental, Mathematical, Neuropsychology, Personality, Positive, Quantitative, Social

Social psychology: is the scientific study of how people’s thoughts, feelings, and behaviors are influenced by the actual, imagined, or implied presence of other. 

The terms thoughts, feelings, and behaviors include all psychological variables that are measurable in a human being. The statement that others’ presence may be imagined or implied suggests that we are prone to social influence even when no other people are present, such as when watching television, or following internalized cultural norms. 

Social psychologists typically explain human behavior as a result of the interaction of mental states and immediate social situations. 

As a broad generalization, American researchers traditionally have focused more on the individual, whereas Europeans have paid more attention to group level phenomena. 

Intrapersonal Phenomena

  • Attitudes

Because people are influenced by the situation, general attitudes are not always good predictors of specific behavior. 

Attitudes that are well remembered and central to our self-concept, however, are more likely to lead to behaviors, and measures of general attitudes do predict patterns of behavior over time.

  • Persuasion

Persuasion is an active method of influence that attempts to guid people toward the adoption of an attitude, idea, or behavior by rational or emotive means. Persuasion relies on “appeals” rather than strong pressure or coercion. 

  • Social cognition

Attributions are the explanations we make for people’s behavior, either our own behavior or the behavior of others. 

The self-serving bias: the tendency to attribute dispositional causes for successes, and situational causes for failure, particularly when self-esteem is threatened. 

Heuristics are cognitive short cuts. Instead of weighting all the evidence when making a decision, people rely on heuristics to save time and energy. 

  • Self-concept

Interpersonal phenomena

  • Social influence

Conformity: the tendency to act or think like other members of a group.  Individual variation among group members plays a key role in the dynamic of how willing people will be to conform.

Compliance: any change in behavior that is due to request or suggestion from another person. 

Obedience: a change in behavior that is the result of a direct order or command from another person. 

Self-fulfilling prophecy: a prediction that, in being made, actually causes itself to become true. 

  • Group dynamics

Norms: implicit rules and expectations for group members to follow.

Roles: implicit rules and expectations for specific members within the group. 

Relations: patterns of liking within the group, and also differences in prestige or status. 

  • Relations with others

Hostile aggression

Instrumental aggression

  • Interpersonal attraction

According to social exchange theory, relationships are based on rational choice and cost-benefit analysis. If one partner’s costs begin to outweigh his or her benefits, that person may leave the relationship, especially if there are good alternatives available. With time, long term relationships tend to become communal rather than simply based on exchange. 

 

 

 

GETTING CLEAN DATA: Reading local flat files

Reading  local CSV files

  if (!file.exists(“data”)){

        dir.create(“data”)

    }

    fileUrl <- “https://web_address&#8221;

    download.file(fileUrl, destfile = “cameras,csv”, method = “curl”)

    dataDownloaded <- data()

So now the data have been downloaded from the website, and actually is sitting on my computer, it’s local data to my computer.

The most common way that they’re loaded is with the read.table function:

Loading flat files – read.table():

  • The main function for reading data into R
  • Flexible and robust but requires more parameter
  • Reads the data into RAM – big data can cause problems
  • Important parameter: file, header, sep, row.names, nrows
  • Related: read.csv(), read.csv2()

Example:

cameraData <- read.table(“./data/cameras.csv”, sep = “,”, header = TRUE)

some important parameters:

  • quote: tell R whether there are any quoted values, quote = “” means no quotes
  • na.strings: set the character that represents a missing value
  • nrows: how many rows to read of the file
  • skip: number of lines to skip before starting to read

Reading Excel files

Download the excel file to load:

    if(!file.exists(“data”)){dir.create(“data”)}

    fileUrl <- “https://web_address&#8221;

    download.file(fileUrl, destfile = “./data/cameras.xlsx”, method = “curl”)

    dateDownloaded <- data()

The R library that is useful for this is the xlsx package.

    library(xlsx)

    cameraData <- read.xlsx(“./data/cameras.xlsx”, sheetIndex = 1, header = TRUE)

You can read specific rows and specific columns.

colIndex <- 2:3

    rowIndex <- 1:4

    cameraDataSubset <- read.xlsx(“./data/cameras.xlsx”, sheetIndex = 1, colIndex = colIndex, rowIndex = rowIndex)

**Notes**

  • The write.xlsx function will write out an Excel file with similar arguments
  • read.xlsx2  is much faster than read.xlsx but for reading subsets of rows may be slightly unstable
  • The XLConnect package has more options for writing and manipulating Excel files
  • The XLConnect vignette is a good place to start for that package
  • In general it is advised to store your data in either  a database or in comma separated files (.csv) or tab separated files (.tab/.txt) se they are easier to distribute

 

GETTING CLEAN DATA: Downloading files

Knowing your working directory:

getwd() : gets the working directory, tells you what directory you’re currently in

    setwd(): sets a different working directory that you might want to move to.

Checking for and creating directories:

file.exists(“directoryName”): will check to see if the directory exists

dir.create(“directoryName”): will create a directory if it doesn’t exist

example (checking for a “data” directory and creating it if it doesn’t exist):

if (!file.exists(“data”)) {

            dir.create(“data”)

        }

Getting data from the internet – download.file():

Downloads a file from the internet

parameters: url: the place that you’re going to be getting data from.

destfile: the destinaiton file where the data is going to go.

method: needs to be specified particularly when dealing with https.

Useful for downloading tab-limited, CSV files, Excel files.

Download a file from the web:

fileUrl <- “https://address&#8221;

download.file(fileUrl, destfile = “./data/cameras.csv”, method = “curl”)

list.files(“./data”)

**Notes**

  • If the url starts with http you can use download.file()
  • If the url starts with https on Mac you may need to set method = “curl”
  • If the file is big, this might take a while
  • Be sure to record when you downloaded