3 Import Data
3.1 Load Packages
Before we can work with our data, we need to load the relevant packages. One of the most important ones for this task is tidyverse
, which we learned about yesterday. We’ll import it here:
3.2 Read in Data
Next, we must import our data.
The common function for this is read_csv
from readr
(which is nested in tidyverse
). For our demonstrations, we will use the WQ_P8D7.csv file housed in the data folder.
When importing data, we must specify the filepath where it’s housed. There are multiple ways to do this. We could hard-code in the path, which is called the absolute path:
df_wq <- read_csv('C:/R/IntrotoR/data/WQ_P8D7.csv')
Error: 'C:/R/IntrotoR/data/WQ_P8D7.csv' does not exist.
However, this code is very specific and will break when used on other computers.
If we instead house data in a Project, we can make use of relative filepaths. These are ideal because anyone who uses the Project can run the code:
df_wq <- read_csv('data/WQ_P8D7.csv')
If you received an error here, this is because your Rmd files are not being evaluated at the Project level.
One way to fix this is to change your options in Tools
> Global Options
> R Markdown
> evaluate chunks in directory
to Project
.
You can also use the here
package instead, which automatically evaluates at the Project level:
Here, we used the read_csv
function, which takes .csv files by default. But what is a csv?
“csv” stands for “comma separated values”, where the comma is called a delimiter; it tells the code where to separate the data cells. If you want to use a different delimiter, you can use the read_delim
function (also from the readr
package):
read_delim('data/delim_ex.txt', delim = '|') # data separated by |
# A tibble: 2 × 4
col headers are first
<chr> <chr> <chr> <chr>
1 here is an example
2 of a different delimiter
for tab delimited data (a fairly frequent format), there’s read_tsv
:
read_tsv('data/tab_ex.tsv')
# A tibble: 2 × 4
col headers are first
<chr> <chr> <chr> <chr>
1 here is an example
2 of a different delimiter
Excel files (.xlsx) are unique because they’re not solely defined by their delimiters, which allows for more complicated file formatting. To import these, we use read_excel
from the readxl
package:
library(readxl)
read_excel('data/excel_ex.xlsx', sheet = 'Sheet1') # read the first sheet (by name)
# A tibble: 2 × 4
col headers are first
<chr> <chr> <chr> <chr>
1 here is an example
2 of an excel file
read_excel('data/excel_ex.xlsx', sheet = 2) # read the second sheet (by number)
# A tibble: 2 × 4
col headers are first
<chr> <chr> <chr> <chr>
1 here is an example
2 of excel sheet 2
We now have a data frame object called df_wq
. We can use head
to see what the first few rows of the data frame look like:
head(df_wq)
# A tibble: 6 × 20
Station Date Chla Pheophytin TotAlkalinity DissAmmonia
<chr> <date> <dbl> <dbl> <dbl> <dbl>
1 P8 2020-01-16 0.64 0.5 98 0.15
2 D7 2020-01-22 0.67 0.87 82 0.21
3 P8 2020-02-14 1.46 0.69 81 0.25
4 D7 2020-02-20 2.15 0.5 86 0.14
5 P8 2020-03-03 1.4 0.56 80 0.11
6 D7 2020-03-06 1.89 1.13 93 0.22
# ℹ 14 more variables: DissNitrateNitrite <dbl>, DOC <dbl>, TOC <dbl>,
# DON <dbl>, TotPhos <dbl>, DissOrthophos <dbl>, TDS <dbl>, TSS <dbl>,
# TKN <dbl>, Depth <dbl>, Secchi <dbl>, Microcystis <dbl>,
# SpCndSurface <dbl>, WTSurface <dbl>
And glimpse
to see information about the columns:
glimpse(df_wq)
Rows: 62
Columns: 20
$ Station <chr> "P8", "D7", "P8", "D7", "P8", "D7", "P8", "D7", "P8…
$ Date <date> 2020-01-16, 2020-01-22, 2020-02-14, 2020-02-20, 20…
$ Chla <dbl> 0.64, 0.67, 1.46, 2.15, 1.40, 1.89, 4.73, 1.74, 6.4…
$ Pheophytin <dbl> 0.50, 0.87, 0.69, 0.50, 0.56, 1.13, 1.25, 0.89, 0.8…
$ TotAlkalinity <dbl> 98.0, 82.0, 81.0, 86.0, 80.0, 93.0, 59.0, 78.0, 63.…
$ DissAmmonia <dbl> 0.150, 0.210, 0.250, 0.140, 0.110, 0.220, 0.050, 0.…
$ DissNitrateNitrite <dbl> 2.800, 0.490, 1.700, 0.480, 1.600, 0.380, 1.070, 0.…
$ DOC <dbl> 3.90, 0.27, 2.80, 0.39, 2.00, 0.19, 2.80, 1.20, 3.1…
$ TOC <dbl> 4.10, 0.32, 2.50, 0.41, 2.10, 0.20, 2.80, 1.20, 3.1…
$ DON <dbl> NA, NA, NA, NA, NA, NA, 0.30, 0.20, 0.30, 0.10, 0.5…
$ TotPhos <dbl> 0.310, 0.082, 0.130, 0.130, 0.190, 0.100, 0.188, 0.…
$ DissOrthophos <dbl> 0.200, 0.071, 0.130, 0.065, 0.140, 0.082, 0.177, 0.…
$ TDS <dbl> 380, 9500, 340, 5800, 290, 8700, 280, 7760, 227, 11…
$ TSS <dbl> 8.9, 38.0, 2.2, 18.0, 1.4, 28.0, 6.6, 35.6, 5.3, 23…
$ TKN <dbl> 0.520, 0.480, 0.430, 0.250, 0.400, 0.200, 0.400, 0.…
$ Depth <dbl> 28.9, 18.8, 39.0, 7.1, 39.0, 7.2, 37.1, 5.2, 36.7, …
$ Secchi <dbl> 116, 30, 212, 52, 340, 48, 100, 40, 160, 44, 120, 6…
$ Microcystis <dbl> 1, 1, 1, 1, 1, 1, 3, 2, 3, 2, 4, 2, 3, 2, 2, 1, 1, …
$ SpCndSurface <dbl> 667, 15532, 647, 11369, 530, 16257, 503, 12946, 404…
$ WTSurface <dbl> 9.67, 9.97, 11.09, 12.51, 13.97, 13.81, 23.46, 21.1…