Using Data from other Sources

Andrew Shaughnessy, Christopher Prener, Elizabeth Hasenmueller

2017-12-17

While driftR is designed to work seamlessly with output from YSI Multiparameter V2 Sonde instruments, it can also be used to correct data from other sources. There are only a few steps that would be needed to get data into a tidy driftR format. Below are example data after they have been imported using the dr_readSonde() function. This is the expected format that driftR requires, so data from other sources must be modified to this configuration.

# A tibble: 1,527 x 11
        Date     Time  Temp SpCond    pH  pHmV Chloride AmmoniumN NitrateN  Turbidity    DO
       <chr>    <chr> <dbl>  <dbl> <dbl> <dbl>    <dbl>     <dbl>    <int>      <dbl> <dbl>
 1 9/18/2015 12:10:49 14.76  0.754  7.18 -36.4    51.22      3.35        0        3.7 92.65
 2 9/18/2015 12:15:50 14.64  0.750  7.14 -34.1    49.62      6.29        0       -0.2 93.73
 3 9/18/2015 12:20:51 14.57  0.750  7.14 -33.9    49.75      7.84        0       -0.1 93.95
 4 9/18/2015 12:25:51 14.51  0.749  7.13 -33.9    50.32      7.67        0       -0.2 93.23
 5 9/18/2015 12:30:51 14.50  0.749  7.13 -33.6    50.74      7.13        0        0.0 92.74
 6 9/18/2015 12:35:51 14.63  0.749  7.13 -33.5    50.84      6.49        0        0.0 93.71
 7 9/18/2015 12:40:51 14.69  0.749  7.13 -33.6    50.66      5.78        0       -0.2 94.56
 8 9/18/2015 12:45:51 14.66  0.749  7.12 -33.3    50.23      5.32        0       -0.2 94.16
 9 9/18/2015 12:50:52 14.65  0.749  7.12 -33.3    50.49      4.89        0       -0.2 93.58
10 9/18/2015 12:55:51 14.69  0.749  7.12 -33.1    50.04      4.60        0       -0.2 93.80
# ... with 1,517 more rows

The sections below detail pre-processing steps that you may have to take to prepare your data for use with driftR.

Importing Data

Data come in a variety of formats, and importing them into R can occasionally be a challenge.

All of the example code below assumes that you have a data frame named waterData.

Metadata

No metadata should be stored in the observations. If metadata are present, remove them using the following technique. (This example assumes that metadata is stored in row 1):

waterData <- waterData[-1,]

If there are multiple lines of metadata, they can be removed like so:

waterData <- waterData[-c(1, 2),]

Tibbles

Given the typically large data sets for these intruments, we encourage (but do not enforce) data to be stored as tibbles. Tibbles are the tidyverse implementation of data frames. They print in a more organized manner and they behave in a more stable fashion. To convert your data to a tibble, use the function as_tibble():

library(tibble)

waterTibble <- as_tibble(waterData)

Variable Names

Variable names should be short and descriptive. We recommend using camelCase or snake_case to name variables. Use the rename() function from dplyr to accomplish this. The function accepts the data frame name followed by a comma and the new name set equal to the old name:

waterTibble <- rename(waterTibble, date = a.very.long.date.variable.name)

If you have a number of variables to rename, you can pipe them together:

waterTibble <- waterTibble %>%
  rename(date = a.very.long.date.variable.name) %>%
  rename(time = a.very.long.time.variable.name)

Specific Variables

Date

driftR expects date data to be stored in a character vector. The date variable should be formatted as either as mm/dd/yyyy or yyyy-mm-dd. To convert your date data to mm/dd/yyyy:

dates <- c("2016/12/30", "2016/12/31", "2017/01/01", "2017/01/02", "2017/01/03")
cleanDate <- as.Date(dates, format = "%Y/%m/%d")
cleanDate <- strftime(cleanDate, format = "%m/%d/%Y")

The format argument for as.Date() will need to be adapted based on the structure of the date data that you have. To convert your date data to yyy-mm-dd, alter the third line of code:

dates <- c("2016/12/30", "2016/12/31", "2017/01/01", "2017/01/02", "2017/01/03")
cleanDate <- as.Date(dates, format = "%Y/%m/%d")
cleanDate <- as.character(cleanDate)

If you have additional need to work with your date data after processing, we strongly suggest using the lubridate package.

Time

driftR expects time data to be stored in a character vector. The time variable should be formatted in hh:mm:ss format using a 24-hour clock. To convert your time data to hh:mm:ss in a 24-hour format:

times <- c("10:06 AM", "3:24 PM", "1:08 PM", "12:00 PM", "3:38 AM")
cleanTime <- format(strptime(times, "%I:%M %p"), format="%H:%M:%S")

Temperature

driftR makes no direct use of the Temp data included in output. The weathermetrics package includes functions for conversions between Celsius and Fahrenheit.

Other Variables

Beyond date and time data, all variables should be stored as either double, integer, or numeric values:

waterTibble$measure <- as.numeric(waterTibble$measure)
waterTibble$measure <- as.integer(waterTibble$measure)
waterTibble$measure <- as.double(waterTibble$measure)

Building Pipes with Date, Time, and Other Functions

You can integrate all of the non-tidyverse functions described here into mutate() function calls. The mutate() function is from the dplyr package.

waterTibble <- waterTibble %>%
  mutate(temp = as.double(temp)) %>%
  mutate(pH = as.double(pH)) %>%
  mutate(NitrateN = as.integer(NitrateN))

Removing Unnecessary Variables

Finally, if there are unnecessary variables left in your data set at the end of the pre-processing stage, you can use the select() function from dplyr to remove them. The function accepts the data frame name followed by a comma and a list of the variables to be removed inside -c(varlist):

waterTibble <- select(waterTibble, -c(a.very.long.pH.variable.name, a.very.long.chloride.variable.name))

Like all other dplyr functions, select() can be included in a pipe as well.