GETTING CLEAN DATA: Reading local flat files

Reading  local CSV files

  if (!file.exists(“data”)){

        dir.create(“data”)

    }

    fileUrl <- “https://web_address&#8221;

    download.file(fileUrl, destfile = “cameras,csv”, method = “curl”)

    dataDownloaded <- data()

So now the data have been downloaded from the website, and actually is sitting on my computer, it’s local data to my computer.

The most common way that they’re loaded is with the read.table function:

Loading flat files – read.table():

  • The main function for reading data into R
  • Flexible and robust but requires more parameter
  • Reads the data into RAM – big data can cause problems
  • Important parameter: file, header, sep, row.names, nrows
  • Related: read.csv(), read.csv2()

Example:

cameraData <- read.table(“./data/cameras.csv”, sep = “,”, header = TRUE)

some important parameters:

  • quote: tell R whether there are any quoted values, quote = “” means no quotes
  • na.strings: set the character that represents a missing value
  • nrows: how many rows to read of the file
  • skip: number of lines to skip before starting to read

Reading Excel files

Download the excel file to load:

    if(!file.exists(“data”)){dir.create(“data”)}

    fileUrl <- “https://web_address&#8221;

    download.file(fileUrl, destfile = “./data/cameras.xlsx”, method = “curl”)

    dateDownloaded <- data()

The R library that is useful for this is the xlsx package.

    library(xlsx)

    cameraData <- read.xlsx(“./data/cameras.xlsx”, sheetIndex = 1, header = TRUE)

You can read specific rows and specific columns.

colIndex <- 2:3

    rowIndex <- 1:4

    cameraDataSubset <- read.xlsx(“./data/cameras.xlsx”, sheetIndex = 1, colIndex = colIndex, rowIndex = rowIndex)

**Notes**

  • The write.xlsx function will write out an Excel file with similar arguments
  • read.xlsx2  is much faster than read.xlsx but for reading subsets of rows may be slightly unstable
  • The XLConnect package has more options for writing and manipulating Excel files
  • The XLConnect vignette is a good place to start for that package
  • In general it is advised to store your data in either  a database or in comma separated files (.csv) or tab separated files (.tab/.txt) se they are easier to distribute

 

Leave a comment