Purpose

Explore the environmental predictor variables for the drought traits analysis. We know that we are going to include annual total inflow as one of the predictors, so we need to determine if we are going to include one or more additional predictor variables. The possibilities are nitrate + nitrite, ammonia, ortho-phosphate, salinity, Secchi depth, and water temperature. We’ll first explore if any of these additional variables are strongly correlated with each other or with inflow. Next we’ll run a Principal Component Analysis (PCA) on all possible environmental variables to be used for quantitative variable selection.

Global code and functions

# Load packages
library(tidyverse)
library(ggforce)
library(corrr)
library(vegan)
library(ggvegan)
library(here)

Display current versions of R and packages used for this analysis:

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.1 (2022-06-23 ucrt)
##  os       Windows 10 x64 (build 19044)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_United States.utf8
##  ctype    English_United States.utf8
##  tz       America/Los_Angeles
##  date     2023-04-11
##  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version  date (UTC) lib source
##  bslib         0.4.2    2022-12-16 [1] CRAN (R 4.2.2)
##  cachem        1.0.7    2023-02-24 [1] CRAN (R 4.2.2)
##  callr         3.7.3    2022-11-02 [1] CRAN (R 4.2.2)
##  cli           3.6.0    2023-01-09 [1] CRAN (R 4.2.2)
##  cluster       2.1.4    2022-08-22 [2] CRAN (R 4.2.1)
##  colorspace    2.1-0    2023-01-23 [1] CRAN (R 4.2.2)
##  corrr       * 0.4.4    2022-08-16 [1] CRAN (R 4.2.2)
##  crayon        1.5.2    2022-09-29 [1] CRAN (R 4.2.1)
##  devtools      2.4.5    2022-10-11 [1] CRAN (R 4.2.1)
##  digest        0.6.31   2022-12-11 [1] CRAN (R 4.2.2)
##  dplyr       * 1.1.0    2023-01-29 [1] CRAN (R 4.2.2)
##  ellipsis      0.3.2    2021-04-29 [1] CRAN (R 4.2.1)
##  evaluate      0.20     2023-01-17 [1] CRAN (R 4.2.2)
##  fansi         1.0.4    2023-01-22 [1] CRAN (R 4.2.2)
##  farver        2.1.1    2022-07-06 [1] CRAN (R 4.2.1)
##  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.2.2)
##  forcats     * 1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
##  fs            1.6.1    2023-02-06 [1] CRAN (R 4.2.2)
##  generics      0.1.3    2022-07-05 [1] CRAN (R 4.2.1)
##  ggforce     * 0.4.1    2022-10-04 [1] CRAN (R 4.2.1)
##  ggplot2     * 3.4.1    2023-02-10 [1] CRAN (R 4.2.2)
##  ggrepel       0.9.3    2023-02-03 [1] CRAN (R 4.2.2)
##  ggvegan     * 0.1-0    2023-03-02 [1] Github (gavinsimpson/ggvegan@99e0ad6)
##  glue          1.6.2    2022-02-24 [1] CRAN (R 4.2.1)
##  gtable        0.3.1    2022-09-01 [1] CRAN (R 4.2.1)
##  here        * 1.0.1    2020-12-13 [1] CRAN (R 4.2.1)
##  hms           1.1.2    2022-08-19 [1] CRAN (R 4.2.1)
##  htmltools     0.5.4    2022-12-07 [1] CRAN (R 4.2.2)
##  htmlwidgets   1.6.1    2023-01-07 [1] CRAN (R 4.2.2)
##  httpuv        1.6.9    2023-02-14 [1] CRAN (R 4.2.2)
##  jquerylib     0.1.4    2021-04-26 [1] CRAN (R 4.2.1)
##  jsonlite      1.8.4    2022-12-06 [1] CRAN (R 4.2.2)
##  knitr         1.42     2023-01-25 [1] CRAN (R 4.2.2)
##  later         1.3.0    2021-08-18 [1] CRAN (R 4.2.1)
##  lattice     * 0.20-45  2021-09-22 [1] CRAN (R 4.2.1)
##  lifecycle     1.0.3    2022-10-07 [1] CRAN (R 4.2.1)
##  lubridate   * 1.9.2    2023-02-10 [1] CRAN (R 4.2.2)
##  magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.2.1)
##  MASS          7.3-58.2 2023-01-23 [2] CRAN (R 4.2.2)
##  Matrix        1.5-3    2022-11-11 [1] CRAN (R 4.2.2)
##  memoise       2.0.1    2021-11-26 [1] CRAN (R 4.2.1)
##  mgcv          1.8-42   2023-03-02 [1] CRAN (R 4.2.2)
##  mime          0.12     2021-09-28 [1] CRAN (R 4.2.0)
##  miniUI        0.1.1.1  2018-05-18 [1] CRAN (R 4.2.1)
##  munsell       0.5.0    2018-06-12 [1] CRAN (R 4.2.1)
##  nlme          3.1-162  2023-01-31 [2] CRAN (R 4.2.2)
##  permute     * 0.9-7    2022-01-27 [1] CRAN (R 4.2.2)
##  pillar        1.8.1    2022-08-19 [1] CRAN (R 4.2.1)
##  pkgbuild      1.4.0    2022-11-27 [1] CRAN (R 4.2.2)
##  pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.2.1)
##  pkgload       1.3.2    2022-11-16 [1] CRAN (R 4.2.2)
##  polyclip      1.10-4   2022-10-20 [1] CRAN (R 4.2.1)
##  prettyunits   1.1.1    2020-01-24 [1] CRAN (R 4.2.1)
##  processx      3.8.0    2022-10-26 [1] CRAN (R 4.2.2)
##  profvis       0.3.7    2020-11-02 [1] CRAN (R 4.2.1)
##  promises      1.2.0.1  2021-02-11 [1] CRAN (R 4.2.1)
##  ps            1.7.2    2022-10-26 [1] CRAN (R 4.2.2)
##  purrr       * 1.0.1    2023-01-10 [1] CRAN (R 4.2.2)
##  R6            2.5.1    2021-08-19 [1] CRAN (R 4.2.1)
##  Rcpp          1.0.10   2023-01-22 [1] CRAN (R 4.2.2)
##  readr       * 2.1.4    2023-02-10 [1] CRAN (R 4.2.2)
##  remotes       2.4.2    2021-11-30 [1] CRAN (R 4.2.1)
##  rlang         1.0.6    2022-09-24 [1] CRAN (R 4.2.1)
##  rmarkdown     2.20     2023-01-19 [1] CRAN (R 4.2.2)
##  rprojroot     2.0.3    2022-04-02 [1] CRAN (R 4.2.1)
##  rstudioapi    0.14     2022-08-22 [1] CRAN (R 4.2.1)
##  sass          0.4.5    2023-01-24 [1] CRAN (R 4.2.2)
##  scales        1.2.1    2022-08-20 [1] CRAN (R 4.2.1)
##  sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.2.1)
##  shiny         1.7.4    2022-12-15 [1] CRAN (R 4.2.2)
##  stringi       1.7.12   2023-01-11 [1] CRAN (R 4.2.2)
##  stringr     * 1.5.0    2022-12-02 [1] CRAN (R 4.2.2)
##  tibble      * 3.2.0    2023-03-08 [1] CRAN (R 4.2.2)
##  tidyr       * 1.3.0    2023-01-24 [1] CRAN (R 4.2.2)
##  tidyselect    1.2.0    2022-10-10 [1] CRAN (R 4.2.1)
##  tidyverse   * 2.0.0    2023-02-22 [1] CRAN (R 4.2.2)
##  timechange    0.2.0    2023-01-11 [1] CRAN (R 4.2.2)
##  tweenr        2.0.2    2022-09-06 [1] CRAN (R 4.2.1)
##  tzdb          0.3.0    2022-03-28 [1] CRAN (R 4.2.1)
##  urlchecker    1.0.1    2021-11-30 [1] CRAN (R 4.2.1)
##  usethis       2.1.6    2022-05-25 [1] CRAN (R 4.2.1)
##  utf8          1.2.3    2023-01-31 [1] CRAN (R 4.2.2)
##  vctrs         0.5.2    2023-01-23 [1] CRAN (R 4.2.2)
##  vegan       * 2.6-4    2022-10-11 [1] CRAN (R 4.2.2)
##  withr         2.5.0    2022-03-03 [1] CRAN (R 4.2.1)
##  xfun          0.37     2023-01-31 [1] CRAN (R 4.2.2)
##  xtable        1.8-4    2019-04-21 [1] CRAN (R 4.2.1)
##  yaml          2.3.7    2023-01-23 [1] CRAN (R 4.2.2)
## 
##  [1] C:/R/win-library/4.2
##  [2] C:/Program Files/R/R-4.2.1/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Import and Prepare Data

df_env <- read_csv(here("drought_variables/drought_variables.csv"))

Prepare the data by only including inflow, the additional environmental predictor variables, and Year. Also, remove any rows with NA values. Note that we only have data for the additional environmental predictor variables from 1975-2021, and we’re missing ortho-phosphate data for 2021 for some reason, which we should probably figure out at some point.

df_env_c <- df_env %>% 
  select(where(is.numeric) & !starts_with(c("drought", "wy"))) %>% 
  drop_na() %>% 
  rename(
    Inflow = inflow_annual_cfs,
    OrthoPhos = Phos
  )

# View data frame
print(df_env_c, n = 50)

## # A tibble: 46 × 8
##     year   Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
##    <dbl>    <dbl>   <dbl>   <dbl>     <dbl>    <dbl>  <dbl>       <dbl>
##  1  1975 12491673   0.228  0.0562    0.0694    0.669   39.0        15.2
##  2  1976  5427799   0.289  0.0253    0.0905    2.90    55.3        16.4
##  3  1977  2801183   0.438  0.0746    0.109     5.35    60.1        16.4
##  4  1978 13804177   0.433  0.0771    0.0904    1.87    35.6        16.4
##  5  1979  8486821   0.342  0.0898    0.0812    2.05    41.0        16.2
##  6  1980 16738845   0.253  0.0723    0.0724    1.23    40.7        15.9
##  7  1981  7491449   0.326  0.108     0.0940    2.84    44.2        16.8
##  8  1982 23662483   0.270  0.0787    0.0633    0.503   39.4        15.2
##  9  1983 35750998   0.241  0.0646    0.0517    0.205   38.6        15.5
## 10  1984 16094830   0.271  0.0788    0.0623    1.18    47.7        16.2
## 11  1985  6785937   0.315  0.115     0.0869    2.73    60.8        15.9
## 12  1986 18181441   0.310  0.100     0.0773    1.68    44.1        16.3
## 13  1987  5715101   0.386  0.117     0.0830    3.33    56.4        16.7
## 14  1988  5687510   0.376  0.116     0.103     3.62    57.1        16.7
## 15  1989  7146348   0.351  0.131     0.0994    2.91    54.2        16.2
## 16  1990  5100082   0.366  0.132     0.106     3.59    62.3        16.8
## 17  1991  4325899   0.460  0.185     0.129     3.88    60.8        16.1
## 18  1992  4485048   0.439  0.156     0.119     3.66    65.8        17.2
## 19  1993 12737227   0.386  0.116     0.0714    1.61    50.3        16.2
## 20  1994  5248159   0.412  0.168     0.0840    2.85    66.3        16.2
## 21  1995 24319298   0.326  0.0958    0.0598    1.06    47.8        15.7
## 22  1996 15808885   0.377  0.101     0.0631    1.44    52.9        16.6
## 23  1997 20196669   0.413  0.0996    0.0612    1.76    47.2        17.0
## 24  1998 24662592   0.381  0.0891    0.0598    0.748   41.8        15.6
## 25  1999 13823025   0.392  0.113     0.0496    1.38    47.9        15.3
## 26  2000 12679620   0.369  0.126     0.0621    1.95    55.2        16.2
## 27  2001  6400003   0.483  0.177     0.0730    3.12    52.3        16.6
## 28  2002  7913068   0.471  0.165     0.0707    2.36    53.3        16.5
## 29  2003 10815877   0.405  0.164     0.0624    1.81    56.0        16.8
## 30  2004 11159980   0.441  0.168     0.0664    2.29    56.2        16.8
## 31  2005 11440260   0.380  0.118     0.0579    1.70    55.7        16.4
## 32  2006 24383128   0.298  0.0928    0.0447    1.39    54.2        16.1
## 33  2007  6440319   0.471  0.117     0.0554    2.72    68.4        16.4
## 34  2008  5697637   0.567  0.131     0.0700    2.91    62.6        16.3
## 35  2009  5891634   0.450  0.120     0.0626    3.01    76.0        16.4
## 36  2010  8227003   0.395  0.100     0.0565    2.29    70.2        16.0
## 37  2011 17587766   0.246  0.0686    0.0419    1.06    66.4        15.4
## 38  2012  6777160   0.377  0.113     0.0593    2.65    78.4        16.4
## 39  2013  7090023   0.424  0.107     0.0743    2.68    73.7        16.8
## 40  2014  3709976   0.520  0.149     0.0933    4.09    87.4        17.3
## 41  2015  4651121   0.506  0.109     0.0902    3.95    85.2        18.0
## 42  2016  8457060   0.358  0.102     0.0656    2.71    80.2        16.9
## 43  2017 28101886   0.290  0.0884    0.0643    1.28    60.2        16.2
## 44  2018  7522799   0.375  0.132     0.0791    2.55    86.5        16.6
## 45  2019 16907901   0.306  0.0939    0.0657    1.35    76.4        16.2
## 46  2020  5462397   0.367  0.130     0.0765    2.99    89.4        17.0

Since some of the relationships between variables may be non-linear, we’ll also log transform all variables and run a correlation analysis on them to compare.

df_env_log <- df_env_c %>% mutate(across(-year, log))

# View data frame
glimpse(df_env_log)

## Rows: 46
## Columns: 8
## $ year        <dbl> 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984…
## $ Inflow      <dbl> 16.34057, 15.50704, 14.84555, 16.44048, 15.95403, 16.63324…
## $ Nitrate     <dbl> -1.4774099, -1.2404450, -0.8251284, -0.8374630, -1.0734005…
## $ Ammonia     <dbl> -2.878292, -3.675562, -2.595685, -2.562602, -2.410605, -2.…
## $ OrthoPhos   <dbl> -2.667965, -2.402899, -2.216439, -2.403575, -2.511163, -2.…
## $ Salinity    <dbl> -0.40207632, 1.06482472, 1.67689579, 0.62691983, 0.7176610…
## $ Secchi      <dbl> 3.664131, 4.012121, 4.095989, 3.571128, 3.714008, 3.705299…
## $ Temperature <dbl> 2.718686, 2.795736, 2.797777, 2.799385, 2.782706, 2.765021…

Correlations among variables

Pearson’s Correlation Coefficients

Original values

df_corr_orig <- df_env_c %>% 
  select(-year) %>% 
  correlate()

## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

fashion(df_corr_orig, leading_zeros = TRUE)

##          term Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
## 1      Inflow          -0.59   -0.47     -0.59    -0.84  -0.52       -0.54
## 2     Nitrate  -0.59            0.66      0.30     0.65   0.41        0.60
## 3     Ammonia  -0.47    0.66              0.30     0.45   0.34        0.43
## 4   OrthoPhos  -0.59    0.30    0.30               0.74   0.12        0.43
## 5    Salinity  -0.84    0.65    0.45      0.74            0.54        0.66
## 6      Secchi  -0.52    0.41    0.34      0.12     0.54               0.53
## 7 Temperature  -0.54    0.60    0.43      0.43     0.66   0.53

log-transformed values

df_corr_log <- df_env_log %>% 
  select(-year) %>% 
  correlate()

## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

fashion(df_corr_log, leading_zeros = TRUE)

##          term Inflow Nitrate Ammonia OrthoPhos Salinity Secchi Temperature
## 1      Inflow          -0.62   -0.36     -0.69    -0.88  -0.57       -0.56
## 2     Nitrate  -0.62            0.66      0.33     0.69   0.43        0.62
## 3     Ammonia  -0.36    0.66              0.21     0.45   0.37        0.41
## 4   OrthoPhos  -0.69    0.33    0.21               0.63   0.12        0.47
## 5    Salinity  -0.88    0.69    0.45      0.63            0.60        0.70
## 6      Secchi  -0.57    0.43    0.37      0.12     0.60               0.53
## 7 Temperature  -0.56    0.62    0.41      0.47     0.70   0.53

Plots

Scatterplots - original values

df_env_c %>% 
  select(-year) %>% 
  ggplot(aes(x = .panel_x, y = .panel_y)) +
  geom_point(size = 1) +
  facet_matrix(
    vars(everything()), 
    layer.diag = FALSE
  ) +
  theme(axis.text.x = element_text(angle = 90))

Scatterplots - log-transformed values

df_env_log %>% 
  select(-year) %>%  
  ggplot(aes(x = .panel_x, y = .panel_y)) +
  geom_point(size = 1) +
  facet_matrix(
    vars(everything()), 
    layer.diag = FALSE
  ) +
  theme(axis.text.x = element_text(angle = 90))

Correlation plot - original values

df_corr_orig %>% 
  rplot(print_cor = TRUE) + 
  ggtitle(
    "Correlation Plot - Pearson's correlation coefficients", 
    subtitle = "Original values"
  ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Correlation plot - log-transformed values

df_corr_log %>% 
  rplot(print_cor = TRUE) + 
  ggtitle(
    "Correlation Plot - Pearson's correlation coefficients", 
    subtitle = "log-transformed values"
  ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Inflow has a negative correlation with all the additional environmental predictor variables, the strongest being salinity. For most of the variables besides salinity, there is a lot of variation at the lowest inflow values with decreasing variation as inflow increases. This is less so with the log-transformed values. Salinity also has fairly strong positive correlations with most other environmental variables. Looking at associations among the remaining variables, nitrate has strong positive correlations with ammonia and water temperature, while ortho-phosphate and Secchi depth have the weakest associations with all other variables.

Principle Component Analysis (PCA)

Let’s run a PCA on all of the environmental predictor variables including inflow using the vegan package. Again, we’ll run the analysis on the original and log-transformed values and compare their results.

Original values

pca_orig <- df_env_c %>% 
  select(-year) %>%  
  rda(scale = TRUE)

summary(pca_orig, axes = 0)

## 
## Call:
## rda(X = ., scale = TRUE) 
## 
## Partitioning of correlations:
##               Inertia Proportion
## Total               7          1
## Unconstrained       7          1
## 
## Eigenvalues, and their contribution to the correlations 
## 
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Eigenvalue            4.1326 0.9857 0.7807 0.46962 0.35659 0.19869 0.07608
## Proportion Explained  0.5904 0.1408 0.1115 0.06709 0.05094 0.02838 0.01087
## Cumulative Proportion 0.5904 0.7312 0.8427 0.90981 0.96075 0.98913 1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:

The first two axes of the PCA represent 73% of the data with the first axis representing most of it at 59%. Let’s take a look at a scree plot and overlay a broken stick model.

screeplot(pca_orig, bstick = TRUE)

Only the first axis is has an eigenvalue greater than its corresponding length of the stick. This isn’t too surprising looking at the results above. Now let’s look at a bi-plot of the years and variables for the first two axes. We’ll scale the plot for the environmental variables which is the default.

# Pull out site scores and add original data back to use for bi-plot formatting
df_pca_orig_sites <- pca_orig %>% 
  fortify(axes = 1:2) %>% 
  as_tibble() %>% 
  filter(Score == "sites") %>% 
  bind_cols(df_env_c) %>% 
  left_join(
    df_env %>% select(year, wy_index_sac, drought_category), 
    by = join_by(year)
  ) %>% 
  select(-c(Score, Label))

# Create bi-plot
autoplot(
  pca_orig, 
  layers = "species", 
  title = "Original values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_orig_sites, 
  aes(x = PC1, y = PC2, label = year, color = year),
  size = 3
) +
scale_color_viridis_c(name = "Year", option = "plasma", end = 0.9) +
theme_bw()

Let’s look at the same bi-plot but use the Sacramento Valley WY Index to color the years:

autoplot(
  pca_orig, 
  layers = "species", 
  title = "Original values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_orig_sites, 
  aes(x = PC1, y = PC2, label = year, color = wy_index_sac),
  size = 3
) +
scale_color_viridis_c(name = "Sac WY Index", option = "plasma", end = 0.9) +
theme_bw()

And finally the same bi-plot with the Drought Synthesis drought classifications for the color:

autoplot(
  pca_orig, 
  layers = "species", 
  title = "Original values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_orig_sites, 
  aes(x = PC1, y = PC2, label = year, color = drought_category),
  size = 3
) +
scale_color_viridis_d(name = "Drought Category", direction = -1) +
theme_bw()

log-transformed values

pca_log <- df_env_log %>% 
  select(-year) %>%  
  rda(scale = TRUE)

summary(pca_log, axes = 0)

## 
## Call:
## rda(X = ., scale = TRUE) 
## 
## Partitioning of correlations:
##               Inertia Proportion
## Total               7          1
## Unconstrained       7          1
## 
## Eigenvalues, and their contribution to the correlations 
## 
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Eigenvalue            4.2007 1.0446 0.7518 0.45476 0.33767 0.13291 0.07758
## Proportion Explained  0.6001 0.1492 0.1074 0.06497 0.04824 0.01899 0.01108
## Cumulative Proportion 0.6001 0.7493 0.8567 0.92169 0.96993 0.98892 1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:

Similar to the PCA using the original values, the first two axes of the PCA represent 75% of the data with the first axis representing most of it at 60%.

screeplot(pca_log, bstick = TRUE)

Again, only the first axis is has an eigenvalue greater than its corresponding length of the stick.

Here is the bi-plot for the first two axes using the log-transformed values:

# Pull out site scores and add original data back to use for bi-plot formatting
df_pca_log_sites <- pca_log %>% 
  fortify(axes = 1:2) %>% 
  as_tibble() %>% 
  filter(Score == "sites") %>% 
  bind_cols(df_env_log) %>% 
  left_join(
    df_env %>% select(year, wy_index_sac, drought_category), 
    by = join_by(year)
  ) %>% 
  select(-c(Score, Label))

# Create bi-plot
autoplot(
  pca_log, 
  layers = "species", 
  title = "Log-transformed values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_log_sites, 
  aes(x = PC1, y = PC2, label = year, color = year),
  size = 3
) +
scale_color_viridis_c(name = "Year", option = "plasma", end = 0.9) +
theme_bw()

Same bi-plot but use the Sacramento Valley WY Index for the colors:

autoplot(
  pca_log, 
  layers = "species", 
  title = "Log-transformed values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_log_sites, 
  aes(x = PC1, y = PC2, label = year, color = wy_index_sac),
  size = 3
) +
scale_color_viridis_c(name = "Sac WY Index", option = "plasma", end = 0.9) +
theme_bw()

And the same bi-plot with the Drought Synthesis drought classifications for the colors:

autoplot(
  pca_log, 
  layers = "species", 
  title = "Log-transformed values, scaling for environmental variables"
) + 
geom_text(
  data = df_pca_log_sites, 
  aes(x = PC1, y = PC2, label = year, color = drought_category),
  size = 3
) +
scale_color_viridis_d(name = "Drought Category", direction = -1) +
theme_bw()

The PCA using log-transformed values looks similar to the one using original values except its flipped upside down and Ammonia and Nitrate are a little further apart.

Overall, Inflow and Salinity are directly opposite each other which isn’t surprising and is confirmed by their strong negative relationship. It would be best to choose just one of these as an environmental predictor in our analysis. Temperature is also positioned somewhat opposite from Inflow, but their correlation coefficient isn’t as strong, so we could keep Temperature. For the remaining variables, we can try to include all of them for now. Looking at the PCA bi-plot using the original units, Ammonia and Nitrate are fairly close to each other and the Nitrate arrow is a little longer, suggesting that we should include Nitrate and drop Ammonia. However, Nitrate and Ammonia are positioned a little further apart in the PCA bi-plot using log-transformed values, so maybe we keep both as predictors for now.

Explore Environmental Predictors

Dave Bosworth

April 11, 2023

Purpose

Global code and functions

Import and Prepare Data

Correlations among variables

Pearson’s Correlation Coefficients

Original values

log-transformed values

Plots

Scatterplots - original values

Scatterplots - log-transformed values

Correlation plot - original values

Correlation plot - log-transformed values

Principle Component Analysis (PCA)

Original values

log-transformed values