Explore the environmental predictor variables for the drought traits analysis. We know that we are going to include annual total inflow as one of the predictors, so we need to determine if we are going to include one or more additional predictor variables. The possibilities are nitrate + nitrite, ammonia, ortho-phosphate, salinity, Secchi depth, and water temperature. We’ll first explore if any of these additional variables are strongly correlated with each other or with inflow. Next we’ll run a Principal Component Analysis (PCA) on all possible environmental variables to be used for quantitative variable selection.
# Load packages
library(tidyverse)
library(ggforce)
library(corrr)
library(vegan)
library(ggvegan)
library(here)
Display current versions of R and packages used for this analysis:
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.1 (2022-06-23 ucrt)
## os Windows 10 x64 (build 19044)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_United States.utf8
## ctype English_United States.utf8
## tz America/Los_Angeles
## date 2023-04-11
## pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.2.2)
## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.2.2)
## callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.2)
## cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2)
## cluster 2.1.4 2022-08-22 [2] CRAN (R 4.2.1)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.2)
## corrr * 0.4.4 2022-08-16 [1] CRAN (R 4.2.2)
## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
## digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
## dplyr * 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1)
## evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
## fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
## farver 2.1.1 2022-07-06 [1] CRAN (R 4.2.1)
## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.2)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.2)
## fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.2)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
## ggforce * 0.4.1 2022-10-04 [1] CRAN (R 4.2.1)
## ggplot2 * 3.4.1 2023-02-10 [1] CRAN (R 4.2.2)
## ggrepel 0.9.3 2023-02-03 [1] CRAN (R 4.2.2)
## ggvegan * 0.1-0 2023-03-02 [1] Github (gavinsimpson/ggvegan@99e0ad6)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
## gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.1)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.1)
## hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.1)
## htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
## htmlwidgets 1.6.1 2023-01-07 [1] CRAN (R 4.2.2)
## httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.2)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.1)
## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.2)
## knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
## later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
## lattice * 0.20-45 2021-09-22 [1] CRAN (R 4.2.1)
## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
## lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.2.2)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
## MASS 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.2)
## Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.2)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
## mgcv 1.8-42 2023-03-02 [1] CRAN (R 4.2.2)
## mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1)
## nlme 3.1-162 2023-01-31 [2] CRAN (R 4.2.2)
## permute * 0.9-7 2022-01-27 [1] CRAN (R 4.2.2)
## pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.2)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
## pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.2)
## polyclip 1.10-4 2022-10-20 [1] CRAN (R 4.2.1)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.1)
## processx 3.8.0 2022-10-26 [1] CRAN (R 4.2.2)
## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.1)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
## ps 1.7.2 2022-10-26 [1] CRAN (R 4.2.2)
## purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
## Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.2)
## readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.2)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
## rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
## rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2)
## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.1)
## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
## sass 0.4.5 2023-01-24 [1] CRAN (R 4.2.2)
## scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.1)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.2)
## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2)
## stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
## tibble * 3.2.0 2023-03-08 [1] CRAN (R 4.2.2)
## tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.2)
## timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.2)
## tweenr 2.0.2 2022-09-06 [1] CRAN (R 4.2.1)
## tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.1)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.1)
## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
## vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2)
## vegan * 2.6-4 2022-10-11 [1] CRAN (R 4.2.2)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
## xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1)
## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
##
## [1] C:/R/win-library/4.2
## [2] C:/Program Files/R/R-4.2.1/library
##
## ──────────────────────────────────────────────────────────────────────────────
df_env <- read_csv(here("drought_variables/drought_variables.csv"))
Prepare the data by only including inflow, the additional
environmental predictor variables, and Year. Also, remove any rows with
NA
values. Note that we only have data for the additional
environmental predictor variables from 1975-2021, and we’re missing
ortho-phosphate data for 2021 for some reason, which we should probably
figure out at some point.
df_env_c <- df_env %>%
select(where(is.numeric) & !starts_with(c("drought", "wy"))) %>%
drop_na() %>%
rename(
Inflow = inflow_annual_cfs,
OrthoPhos = Phos
)
# View data frame
print(df_env_c, n = 50)
## # A tibble: 46 × 8
## year Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1975 12491673 0.228 0.0562 0.0694 0.669 39.0 15.2
## 2 1976 5427799 0.289 0.0253 0.0905 2.90 55.3 16.4
## 3 1977 2801183 0.438 0.0746 0.109 5.35 60.1 16.4
## 4 1978 13804177 0.433 0.0771 0.0904 1.87 35.6 16.4
## 5 1979 8486821 0.342 0.0898 0.0812 2.05 41.0 16.2
## 6 1980 16738845 0.253 0.0723 0.0724 1.23 40.7 15.9
## 7 1981 7491449 0.326 0.108 0.0940 2.84 44.2 16.8
## 8 1982 23662483 0.270 0.0787 0.0633 0.503 39.4 15.2
## 9 1983 35750998 0.241 0.0646 0.0517 0.205 38.6 15.5
## 10 1984 16094830 0.271 0.0788 0.0623 1.18 47.7 16.2
## 11 1985 6785937 0.315 0.115 0.0869 2.73 60.8 15.9
## 12 1986 18181441 0.310 0.100 0.0773 1.68 44.1 16.3
## 13 1987 5715101 0.386 0.117 0.0830 3.33 56.4 16.7
## 14 1988 5687510 0.376 0.116 0.103 3.62 57.1 16.7
## 15 1989 7146348 0.351 0.131 0.0994 2.91 54.2 16.2
## 16 1990 5100082 0.366 0.132 0.106 3.59 62.3 16.8
## 17 1991 4325899 0.460 0.185 0.129 3.88 60.8 16.1
## 18 1992 4485048 0.439 0.156 0.119 3.66 65.8 17.2
## 19 1993 12737227 0.386 0.116 0.0714 1.61 50.3 16.2
## 20 1994 5248159 0.412 0.168 0.0840 2.85 66.3 16.2
## 21 1995 24319298 0.326 0.0958 0.0598 1.06 47.8 15.7
## 22 1996 15808885 0.377 0.101 0.0631 1.44 52.9 16.6
## 23 1997 20196669 0.413 0.0996 0.0612 1.76 47.2 17.0
## 24 1998 24662592 0.381 0.0891 0.0598 0.748 41.8 15.6
## 25 1999 13823025 0.392 0.113 0.0496 1.38 47.9 15.3
## 26 2000 12679620 0.369 0.126 0.0621 1.95 55.2 16.2
## 27 2001 6400003 0.483 0.177 0.0730 3.12 52.3 16.6
## 28 2002 7913068 0.471 0.165 0.0707 2.36 53.3 16.5
## 29 2003 10815877 0.405 0.164 0.0624 1.81 56.0 16.8
## 30 2004 11159980 0.441 0.168 0.0664 2.29 56.2 16.8
## 31 2005 11440260 0.380 0.118 0.0579 1.70 55.7 16.4
## 32 2006 24383128 0.298 0.0928 0.0447 1.39 54.2 16.1
## 33 2007 6440319 0.471 0.117 0.0554 2.72 68.4 16.4
## 34 2008 5697637 0.567 0.131 0.0700 2.91 62.6 16.3
## 35 2009 5891634 0.450 0.120 0.0626 3.01 76.0 16.4
## 36 2010 8227003 0.395 0.100 0.0565 2.29 70.2 16.0
## 37 2011 17587766 0.246 0.0686 0.0419 1.06 66.4 15.4
## 38 2012 6777160 0.377 0.113 0.0593 2.65 78.4 16.4
## 39 2013 7090023 0.424 0.107 0.0743 2.68 73.7 16.8
## 40 2014 3709976 0.520 0.149 0.0933 4.09 87.4 17.3
## 41 2015 4651121 0.506 0.109 0.0902 3.95 85.2 18.0
## 42 2016 8457060 0.358 0.102 0.0656 2.71 80.2 16.9
## 43 2017 28101886 0.290 0.0884 0.0643 1.28 60.2 16.2
## 44 2018 7522799 0.375 0.132 0.0791 2.55 86.5 16.6
## 45 2019 16907901 0.306 0.0939 0.0657 1.35 76.4 16.2
## 46 2020 5462397 0.367 0.130 0.0765 2.99 89.4 17.0
Since some of the relationships between variables may be non-linear, we’ll also log transform all variables and run a correlation analysis on them to compare.
df_env_log <- df_env_c %>% mutate(across(-year, log))
# View data frame
glimpse(df_env_log)
## Rows: 46
## Columns: 8
## $ year <dbl> 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984…
## $ Inflow <dbl> 16.34057, 15.50704, 14.84555, 16.44048, 15.95403, 16.63324…
## $ Nitrate <dbl> -1.4774099, -1.2404450, -0.8251284, -0.8374630, -1.0734005…
## $ Ammonia <dbl> -2.878292, -3.675562, -2.595685, -2.562602, -2.410605, -2.…
## $ OrthoPhos <dbl> -2.667965, -2.402899, -2.216439, -2.403575, -2.511163, -2.…
## $ Salinity <dbl> -0.40207632, 1.06482472, 1.67689579, 0.62691983, 0.7176610…
## $ Secchi <dbl> 3.664131, 4.012121, 4.095989, 3.571128, 3.714008, 3.705299…
## $ Temperature <dbl> 2.718686, 2.795736, 2.797777, 2.799385, 2.782706, 2.765021…
df_corr_orig <- df_env_c %>%
select(-year) %>%
correlate()
## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
fashion(df_corr_orig, leading_zeros = TRUE)
## term Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
## 1 Inflow -0.59 -0.47 -0.59 -0.84 -0.52 -0.54
## 2 Nitrate -0.59 0.66 0.30 0.65 0.41 0.60
## 3 Ammonia -0.47 0.66 0.30 0.45 0.34 0.43
## 4 OrthoPhos -0.59 0.30 0.30 0.74 0.12 0.43
## 5 Salinity -0.84 0.65 0.45 0.74 0.54 0.66
## 6 Secchi -0.52 0.41 0.34 0.12 0.54 0.53
## 7 Temperature -0.54 0.60 0.43 0.43 0.66 0.53
df_corr_log <- df_env_log %>%
select(-year) %>%
correlate()
## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
fashion(df_corr_log, leading_zeros = TRUE)
## term Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
## 1 Inflow -0.62 -0.36 -0.69 -0.88 -0.57 -0.56
## 2 Nitrate -0.62 0.66 0.33 0.69 0.43 0.62
## 3 Ammonia -0.36 0.66 0.21 0.45 0.37 0.41
## 4 OrthoPhos -0.69 0.33 0.21 0.63 0.12 0.47
## 5 Salinity -0.88 0.69 0.45 0.63 0.60 0.70
## 6 Secchi -0.57 0.43 0.37 0.12 0.60 0.53
## 7 Temperature -0.56 0.62 0.41 0.47 0.70 0.53
df_env_c %>%
select(-year) %>%
ggplot(aes(x = .panel_x, y = .panel_y)) +
geom_point(size = 1) +
facet_matrix(
vars(everything()),
layer.diag = FALSE
) +
theme(axis.text.x = element_text(angle = 90))
df_env_log %>%
select(-year) %>%
ggplot(aes(x = .panel_x, y = .panel_y)) +
geom_point(size = 1) +
facet_matrix(
vars(everything()),
layer.diag = FALSE
) +
theme(axis.text.x = element_text(angle = 90))
df_corr_orig %>%
rplot(print_cor = TRUE) +
ggtitle(
"Correlation Plot - Pearson's correlation coefficients",
subtitle = "Original values"
) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
df_corr_log %>%
rplot(print_cor = TRUE) +
ggtitle(
"Correlation Plot - Pearson's correlation coefficients",
subtitle = "log-transformed values"
) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Inflow has a negative correlation with all the additional environmental predictor variables, the strongest being salinity. For most of the variables besides salinity, there is a lot of variation at the lowest inflow values with decreasing variation as inflow increases. This is less so with the log-transformed values. Salinity also has fairly strong positive correlations with most other environmental variables. Looking at associations among the remaining variables, nitrate has strong positive correlations with ammonia and water temperature, while ortho-phosphate and Secchi depth have the weakest associations with all other variables.
Let’s run a PCA on all of the environmental predictor variables
including inflow using the vegan
package. Again, we’ll run
the analysis on the original and log-transformed values and compare
their results.
pca_orig <- df_env_c %>%
select(-year) %>%
rda(scale = TRUE)
summary(pca_orig, axes = 0)
##
## Call:
## rda(X = ., scale = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 7 1
## Unconstrained 7 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 4.1326 0.9857 0.7807 0.46962 0.35659 0.19869 0.07608
## Proportion Explained 0.5904 0.1408 0.1115 0.06709 0.05094 0.02838 0.01087
## Cumulative Proportion 0.5904 0.7312 0.8427 0.90981 0.96075 0.98913 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:
The first two axes of the PCA represent 73% of the data with the first axis representing most of it at 59%. Let’s take a look at a scree plot and overlay a broken stick model.
screeplot(pca_orig, bstick = TRUE)
Only the first axis is has an eigenvalue greater than its corresponding length of the stick. This isn’t too surprising looking at the results above. Now let’s look at a bi-plot of the years and variables for the first two axes. We’ll scale the plot for the environmental variables which is the default.
# Pull out site scores and add original data back to use for bi-plot formatting
df_pca_orig_sites <- pca_orig %>%
fortify(axes = 1:2) %>%
as_tibble() %>%
filter(Score == "sites") %>%
bind_cols(df_env_c) %>%
left_join(
df_env %>% select(year, wy_index_sac, drought_category),
by = join_by(year)
) %>%
select(-c(Score, Label))
# Create bi-plot
autoplot(
pca_orig,
layers = "species",
title = "Original values, scaling for environmental variables"
) +
geom_text(
data = df_pca_orig_sites,
aes(x = PC1, y = PC2, label = year, color = year),
size = 3
) +
scale_color_viridis_c(name = "Year", option = "plasma", end = 0.9) +
theme_bw()
Let’s look at the same bi-plot but use the Sacramento Valley WY Index to color the years:
autoplot(
pca_orig,
layers = "species",
title = "Original values, scaling for environmental variables"
) +
geom_text(
data = df_pca_orig_sites,
aes(x = PC1, y = PC2, label = year, color = wy_index_sac),
size = 3
) +
scale_color_viridis_c(name = "Sac WY Index", option = "plasma", end = 0.9) +
theme_bw()
And finally the same bi-plot with the Drought Synthesis drought classifications for the color:
autoplot(
pca_orig,
layers = "species",
title = "Original values, scaling for environmental variables"
) +
geom_text(
data = df_pca_orig_sites,
aes(x = PC1, y = PC2, label = year, color = drought_category),
size = 3
) +
scale_color_viridis_d(name = "Drought Category", direction = -1) +
theme_bw()
pca_log <- df_env_log %>%
select(-year) %>%
rda(scale = TRUE)
summary(pca_log, axes = 0)
##
## Call:
## rda(X = ., scale = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 7 1
## Unconstrained 7 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 4.2007 1.0446 0.7518 0.45476 0.33767 0.13291 0.07758
## Proportion Explained 0.6001 0.1492 0.1074 0.06497 0.04824 0.01899 0.01108
## Cumulative Proportion 0.6001 0.7493 0.8567 0.92169 0.96993 0.98892 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:
Similar to the PCA using the original values, the first two axes of the PCA represent 75% of the data with the first axis representing most of it at 60%.
screeplot(pca_log, bstick = TRUE)
Again, only the first axis is has an eigenvalue greater than its corresponding length of the stick.
Here is the bi-plot for the first two axes using the log-transformed values:
# Pull out site scores and add original data back to use for bi-plot formatting
df_pca_log_sites <- pca_log %>%
fortify(axes = 1:2) %>%
as_tibble() %>%
filter(Score == "sites") %>%
bind_cols(df_env_log) %>%
left_join(
df_env %>% select(year, wy_index_sac, drought_category),
by = join_by(year)
) %>%
select(-c(Score, Label))
# Create bi-plot
autoplot(
pca_log,
layers = "species",
title = "Log-transformed values, scaling for environmental variables"
) +
geom_text(
data = df_pca_log_sites,
aes(x = PC1, y = PC2, label = year, color = year),
size = 3
) +
scale_color_viridis_c(name = "Year", option = "plasma", end = 0.9) +
theme_bw()
Same bi-plot but use the Sacramento Valley WY Index for the colors:
autoplot(
pca_log,
layers = "species",
title = "Log-transformed values, scaling for environmental variables"
) +
geom_text(
data = df_pca_log_sites,
aes(x = PC1, y = PC2, label = year, color = wy_index_sac),
size = 3
) +
scale_color_viridis_c(name = "Sac WY Index", option = "plasma", end = 0.9) +
theme_bw()
And the same bi-plot with the Drought Synthesis drought classifications for the colors:
autoplot(
pca_log,
layers = "species",
title = "Log-transformed values, scaling for environmental variables"
) +
geom_text(
data = df_pca_log_sites,
aes(x = PC1, y = PC2, label = year, color = drought_category),
size = 3
) +
scale_color_viridis_d(name = "Drought Category", direction = -1) +
theme_bw()
The PCA using log-transformed values looks similar to the one using original values except its flipped upside down and Ammonia and Nitrate are a little further apart.
Overall, Inflow and Salinity are directly opposite each other which isn’t surprising and is confirmed by their strong negative relationship. It would be best to choose just one of these as an environmental predictor in our analysis. Temperature is also positioned somewhat opposite from Inflow, but their correlation coefficient isn’t as strong, so we could keep Temperature. For the remaining variables, we can try to include all of them for now. Looking at the PCA bi-plot using the original units, Ammonia and Nitrate are fairly close to each other and the Nitrate arrow is a little longer, suggesting that we should include Nitrate and drop Ammonia. However, Nitrate and Ammonia are positioned a little further apart in the PCA bi-plot using log-transformed values, so maybe we keep both as predictors for now.