While driftR
is designed to work seamlessly with output from YSI Multiparameter V2 Sonde instruments, it can also be used to correct data from other sources. There are only a few steps that would be needed to get data into a tidy driftR
format. Below are example data after they have been imported using the dr_readSonde()
function. This is the expected format that driftR
requires, so data from other sources must be modified to this configuration.
# A tibble: 1,527 x 11
Date Time Temp SpCond pH pHmV Chloride AmmoniumN NitrateN Turbidity DO
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 9/18/2015 12:10:49 14.76 0.754 7.18 -36.4 51.22 3.35 0 3.7 92.65
2 9/18/2015 12:15:50 14.64 0.750 7.14 -34.1 49.62 6.29 0 -0.2 93.73
3 9/18/2015 12:20:51 14.57 0.750 7.14 -33.9 49.75 7.84 0 -0.1 93.95
4 9/18/2015 12:25:51 14.51 0.749 7.13 -33.9 50.32 7.67 0 -0.2 93.23
5 9/18/2015 12:30:51 14.50 0.749 7.13 -33.6 50.74 7.13 0 0.0 92.74
6 9/18/2015 12:35:51 14.63 0.749 7.13 -33.5 50.84 6.49 0 0.0 93.71
7 9/18/2015 12:40:51 14.69 0.749 7.13 -33.6 50.66 5.78 0 -0.2 94.56
8 9/18/2015 12:45:51 14.66 0.749 7.12 -33.3 50.23 5.32 0 -0.2 94.16
9 9/18/2015 12:50:52 14.65 0.749 7.12 -33.3 50.49 4.89 0 -0.2 93.58
10 9/18/2015 12:55:51 14.69 0.749 7.12 -33.1 50.04 4.60 0 -0.2 93.80
# ... with 1,517 more rows
The sections below detail pre-processing steps that you may have to take to prepare your data for use with driftR
.
Data come in a variety of formats, and importing them into R
can occasionally be a challenge.
csv
, tsv
, txt
, or another delimited file format, we recommend using the readr
package.readxl
package.haven
package.RODBC
package. This will require a Windows computer, 32-bit R
, and either Microsoft Access or the appropriate drivers installed.R
.All of the example code below assumes that you have a data frame named waterData
.
No metadata should be stored in the observations. If metadata are present, remove them using the following technique. (This example assumes that metadata is stored in row 1):
waterData <- waterData[-1,]
If there are multiple lines of metadata, they can be removed like so:
waterData <- waterData[-c(1, 2),]
Given the typically large data sets for these intruments, we encourage (but do not enforce) data to be stored as tibbles. Tibbles are the tidyverse
implementation of data frames. They print in a more organized manner and they behave in a more stable fashion. To convert your data to a tibble, use the function as_tibble()
:
library(tibble)
waterTibble <- as_tibble(waterData)
Variable names should be short and descriptive. We recommend using camelCase
or snake_case
to name variables. Use the rename()
function from dplyr
to accomplish this. The function accepts the data frame name followed by a comma and the new name set equal to the old name:
waterTibble <- rename(waterTibble, date = a.very.long.date.variable.name)
If you have a number of variables to rename, you can pipe them together:
waterTibble <- waterTibble %>%
rename(date = a.very.long.date.variable.name) %>%
rename(time = a.very.long.time.variable.name)
driftR
expects date data to be stored in a character vector. The date variable should be formatted as either as mm/dd/yyyy
or yyyy-mm-dd
. To convert your date data to mm/dd/yyyy
:
dates <- c("2016/12/30", "2016/12/31", "2017/01/01", "2017/01/02", "2017/01/03")
cleanDate <- as.Date(dates, format = "%Y/%m/%d")
cleanDate <- strftime(cleanDate, format = "%m/%d/%Y")
The format
argument for as.Date()
will need to be adapted based on the structure of the date data that you have. To convert your date data to yyy-mm-dd
, alter the third line of code:
dates <- c("2016/12/30", "2016/12/31", "2017/01/01", "2017/01/02", "2017/01/03")
cleanDate <- as.Date(dates, format = "%Y/%m/%d")
cleanDate <- as.character(cleanDate)
If you have additional need to work with your date data after processing, we strongly suggest using the lubridate
package.
driftR
expects time data to be stored in a character vector. The time variable should be formatted in hh:mm:ss
format using a 24-hour clock. To convert your time data to hh:mm:ss
in a 24-hour format:
times <- c("10:06 AM", "3:24 PM", "1:08 PM", "12:00 PM", "3:38 AM")
cleanTime <- format(strptime(times, "%I:%M %p"), format="%H:%M:%S")
driftR
makes no direct use of the Temp
data included in output. The weathermetrics
package includes functions for conversions between Celsius and Fahrenheit.
Beyond date and time data, all variables should be stored as either double, integer, or numeric values:
waterTibble$measure <- as.numeric(waterTibble$measure)
waterTibble$measure <- as.integer(waterTibble$measure)
waterTibble$measure <- as.double(waterTibble$measure)
You can integrate all of the non-tidyverse functions described here into mutate()
function calls. The mutate()
function is from the dplyr
package.
waterTibble <- waterTibble %>%
mutate(temp = as.double(temp)) %>%
mutate(pH = as.double(pH)) %>%
mutate(NitrateN = as.integer(NitrateN))
Finally, if there are unnecessary variables left in your data set at the end of the pre-processing stage, you can use the select()
function from dplyr
to remove them. The function accepts the data frame name followed by a comma and a list of the variables to be removed inside -c(varlist)
:
waterTibble <- select(waterTibble, -c(a.very.long.pH.variable.name, a.very.long.chloride.variable.name))
Like all other dplyr
functions, select()
can be included in a pipe as well.