Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Authors: |
|
|
---|---|---|
Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
Type: | Resource | |
Storage: | The size of this resource is 348.9 MB | |
Created: | Jul 17, 2023 at 11:15 p.m. | |
Last updated: | Jun 05, 2024 at 12:37 p.m. | |
DOI: | 10.4211/hs.eddb06e91a914618a89a63bb2c2774e0 | |
Citation: | See how to cite this resource | |
Content types: | Single File Content |
Sharing Status: | Published |
---|---|
Views: | 320 |
Downloads: | 29 |
+1 Votes: | 1 other +1 this |
Comments: | No comments (yet) |
Abstract
Here we provide the data and R scripts to complete the analyses and create the figures presented in the manuscript titled, “Solute export patterns across the contiguous United States” by Kincaid et al. 2024 at Hydrological Processes. Importantly, this resource contains paired solute concentration (C) and discharge (Q) data for 11 solutes from CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024). This relational database was built upon the CAMELS dataset (https://doi.org/10.5194/hess-21-5293-2017), an existing dataset of catchment and hydroclimatic attributes from relatively undisturbed catchments across the contiguous United States. The version of CAMELS-Chem provided here has US Geological Survey (USGS) National Water Information System (NWIS) C and Q data for 506 catchments. C and Q measurements span from 1898 to 2020 with the first paired C-Q sample occurring in 1924. Solutes include aluminum (Al), calcium (Ca), chloride (Cl), dissolved organic C and N (DOC, DON), magnesium (Mg), nitrate (NO3), potassium (K), silica (Si), sodium (Na), and sulfate (SO4). Of note, a shorter version of the CAMELS-Chem database that spans from 1980 to 2018, but includes data for more stream water quality constituents and atmospheric deposition data is described in CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024) and available for download via Hydroshare (http://www.hydroshare.org/resource/841f5e85085c423f889ac809c1bed4ac).
The R scripts and data files provided in this resource are intended to allow users to replicate the tables and figures in the Kincaid et al. manuscript. Specifically, we provide all files to complete the analyses coded in in the R script 9_analyses_figures_for_manuscript.R. However, other R scripts and data files provided should allow users to replicate intermediate steps in the analyses as well. See the README file for more details, but analyses provided in the R scripts include: modeling C-Q relationships with the power-law function using data-driven Bayesian segmented regression; conducting hierarchical clustering to group catchments based on catchment attributes; building random forest models to select catchment attribute correlates of C-Q metrics; conducting flow-duration exceedance probability analyses; and general code for figures, tables, and other statistics presented in the Kincaid et al. manuscript.
The metadata for the CAMELS-Chem dataset (camels_chem_all_2022-02-25.csv) is available in camels_chem_metadata.csv
Subject Keywords
Coverage
Spatial
Temporal
Start Date: | |
---|---|
End Date: |
Content
README.txt
R workflow for Kincaid et al. 2024 in Hydrological Processes The code and data files provided in this repository are intended to allow users to replicate the tables and figures in the Kincaid et al. 2024 manuscript. Specifically, we provide all files to complete the analysis coded in 9_analyses_figures_for_manuscript.R. However, using the other scripts and data files provided, users should be able to replicate intermediate steps in the analysis as well. The metadata for the CAMELS-Chem dataset (camels_chem_all_2022-02-25.csv) is available in metadata/camels_chem_metadata.csv Table of contents: 1. Initial file directory structure 2. (How to) Fit C-Q using Bayesian Linear and Segmented Regression 3. (How to) Hierarchical clustering of CAMELS catchment attributes 4. (How to) Feature selection for correlates of C-Q model class/archetype and slope value 5. (How to) Flow-duration exceedance probability analysis 6. (How to) Figures, tables, and statistics for manuscript 1. Initial file directory structure: * Note: these folders and files should be located within an R project * /code/ o 1_summarize_camels_q.R o 2_prep_data_for_bayes_mcp.R o 3_fit_lm_bayes_mcp_high_n_sites.R o 3_fit_lm_bayes_mcp.R o 4_classify_cq_pattern.R o 5_hac_cluster_camels_catchments.R o 6_random_forest_prep_data.R o 7_random_forest_feature_select.R o 8_flow_duration_breakpoint_analysis.R o 9_analyses_figures_for_manuscript.R * /code/functions_fit_bayes/ o functions_fit_bayes_with_mcp_high_n_sites.R o functions_fit_bayes_with_mcp.R * /data/ o ansi_us_state_codes.csv o /camels_attributes/ * 8 files camels_clim.txt to camels_vege.txt o camels_chem_all_2022-02-25.csv o camels_cq_breakpoint_flowdur_perc.csv o camels_hac_clustered_all_attributes.csv o camels_modClasses_CQparams_imputedCAMELSattrs_forRF.csv o moatar_etal_2017_wrr_data.csv o table_of_camels_attributes.csv o USGS_gauge_info.csv o usgs_streamflow/ * 18 folders with .txt files with streamflow data * /fit_results/ o All files ending in CQ_data_and_classification.csv in fit_results * 11 files, 1 for each solute in the manuscript analysis * 20 files, 1-2 for each solute in the manuscript analysis * /results_random_forest/ o All files ending in rf_performance_metrics.csv * 11 files, 1 for each solute in the manuscript analysis o feature_importance/ * All files ending in rf_feature_importance.csv * 11 files, 1 for each solute in the manuscript analysis 2. Fit C-Q using Bayesian Linear and Segmented Regression 1. Summarize USGS discharge data to get minimum and maximum discharge values for each gauge site a. R script: i. 1_summarize_camels_q.R b. Data files required: i. usgs_streamflow folder containing discharge data from CAMELS website 1. Downloaded from the CAMELS website (https://gdex.ucar.edu/dataset/camels/file.html) on 12/14/21 c. Resulting file(s): i. USGS_dailyDischarge_range_all_sites.csv 2. Prepare concentration-discharge data from the CAMELS-Chem dataset (one CSV per solute; solute indicated by XX in resulting file name) a. R script: i. 2_prep_data_for_bayes_mcp.R b. Data files required: i. camels_chem_all_2022-02-25.csv ii. USGS_gauge_info.csv iii. ansi_us_state_codes.csv iv. USGS_dailyDischarge_range_all_sites.csv c. Resulting file(s): i. camels_chem_for_bayes_mcp_XX.csv 1. Note: XX above is the solute abbreviation 3. Repeat previous step for each solute of interest 4. Fit linear regression of log(C) ~ log(Q) using Bayesian analysis using JAGS via the mcp R package as the frontend a. R script: i. Primary R scripts: 1. 3_fit_lm_bayes_mcp.R a. Try running this script on your data, but if you have a site with too many paired C-Q measurements (e.g., n >1000) you may max out the memory capacity of the computer you are using for the analysis. If this is the case, I run the sites with high n using a different R script, which runs functions with fewer iterations, decreasing memory demand. b. Note: I often create a separate R script file for each solute I run and will include the solute abbreviation in the file name 2. 3_fit_lm_bayes_mcp_high_n_sites.R a. This is the alternative R script for sites with a large number of paired C-Q measurements (e.g., n > 1000). ii. Supporting R scripts with functions for fitting the linear regressions: 1. functions_fit_bayes_with_mcp.R a. Use when running 3_fit_lm_bayes_mcp.R b. Should be in folder called functions_fit_bayes 2. functions_fit_bayes_with_mcp_high_n_sites.R a. Use when running 3_fit_lm_bayes_mcp_high_n_sites.R b. Should be in folder called functions_fit_bayes b. Data files required: i. camels_chem_for_bayes_mcp_XX.csv 1. Note: XX above is the solute abbreviation c. Resulting file(s): i. XX_ fit_param_estimates.csv ii. XX_ fit_comparison_metrics.csv iii. XX_ fit_posterior_draws_subsample.csv iv. XX_ fit_residuals.csv v. XX_ plots_null_fit.pdf vi. XX_ plots_null_chains.pdf vii. XX_ plots_full_fit.pdf viii. XX_ plots_full_chains.pdf ix. XX_ plots_compare_null_to_full.pdf 1. Note: If running the high_n R scripts, the file names will include high_n_sites in the file names above after XX_ 5. Repeat previous step for each solute of interest 6. Classify the C-Q patterns into 1 of 13 C-Q model classes or archetypes (see Underwood et al. 2017) based on the Bayesian regressions we fit previously in fit_lm_bayes_mcp.R a. R script: i. 4_classify_cq_pattern.R 1. Note: I often create a separate R script file for each solute I run and will include the solute abbreviation in the file name b. Data files required: i. XX_ fit_param_estimates.csv ii. XX_ fit_comparison_metrics.csv iii. camels_chem_for_bayes_mcp_XX.csv c. Resulting file(s): i. XX_plots_CQ_classifications.pdf ii. XX_ CQ_data_and_classification.csv 7. Repeat previous step for each solute of interest 3. Hierarchical clustering of CAMELS catchment attributes 1. Use hierarchical clustering to cluster CAMELS gauges/catchments using CAMELS attributes a. R script: i. 5_hac_cluster_camels_catchments.R b. Data files required: i. All files ending in CQ_data_and_classification.csv in fit_results ii. All .txt files from camels_attributes folder 1. These files were downloaded from https://gdex.ucar.edu/dataset/camels/file.html in March 2022 iii. table_of_camels_attributes.csv iv. camels_chem_all_2022-02-25.csv c. Resulting file(s): i. camels_hac_clustered_all_attributes.csv 4. Feature selection for correlates of C-Q model class/archetype and slope value 1. Prepare data for random forest models a. R script: i. 6_random_forest_prep_data.R b. Data files required: i. All files ending in CQ_data_and_classification.csv in fit_results ii. All .txt files from camels_attributes folder 1. These files were downloaded from https://gdex.ucar.edu/dataset/camels/file.html in March 2022 iii. table_of_camels_attributes.csv iv. camels_chem_all_2022-02-25.csv c. Resulting file(s): i. camels_modClasses_CQparams_imputedCAMELSattrs_forRF.csv 2. Train random forest classification (CQ model classes) & regression (CQ slope) models to do feature selection on the CAMELS variables most important for predicting these response variables a. R script: i. 7_random_forest_feature_select.R 1. Note: I often create a separate R script file for each solute I run and will include the solute abbreviation in the file name b. Data files required: i. camels_modClasses_CQparams_imputedCAMELSattrs_forRF.csv c. Resulting file(s): i. XX_ rf_hyperparameters.csv ii. XX_ rf_performance_metrics.csv iii. XX_ rf_feature_importance.csv 3. Repeat previous step for each solute of interest 5. Flow-duration exceedance probability analysis 1. Estimate at what flow-duration exceedance probabilities that thresholds/breakpoints occur in the CQ relationships a. R script: i. 8_flow_duration_breakpoint_analysis.R b. Data files required: i. usgs_streamflow folder containing discharge data from CAMELS website 1. Downloaded from the CAMELS website (https://gdex.ucar.edu/dataset/camels/file.html) on 12/14/21 ii. All files ending in CQ_data_and_classification.csv in fit_results c. Resulting files(s): i. camels_cq_breakpoint_flowdur_perc.csv 7. Figures, tables, and statistics for manuscript 1. Create figures and do statistical analyses for Kincaid et al. 2023 WRR manuscript a. R script: i. 9_analyses_figures_for_manuscript.R b. Data files required: i. camels_chem_all_2022-02-25.csv ii. All files ending in CQ_data_and_classification.csv in fit_results iii. All files ending in metrics.csv in fit_results iv. All .txt files from camels_attributes folder 1. These files were downloaded from https://gdex.ucar.edu/dataset/camels/file.html in March 2022 v. camels_hac_clustered_all_attributes.csv vi. table_of_camels_attributes.csv vii. USGS_gauge_info.csv viii. camels_cq_breakpoint_flowdur_perc.csv ix. camels_modClasses_CQparams_imputedCAMELSattrs_forRF.csv x. moatar_etal_2017_wrr_data.csv 1. Moatar et al 2017 WRR supp info data c. Resulting file(s): i. table_s1_summary_conc_q_by_solute_and_cluster.csv ii. table_s3_fpc_q_thresholds.csv iii. table_s4_all_model_classifications.csv iv. table_s5_summary_gauges_per_cluster_by_modclass_and_slope.csv v. table_s5_summary_kw_slopes_by_cluster.csv vi. table_s6_summary_of_b_cv.csv vii. table_s7_summary_kw_slopes_by_cluster.csv viii. table_s8_summary_kw_cvratio_by_cluster.csv ix. fig_s2_usmap_cv_ratio.png x. fig_s2_cv_ratio_by_cluster.png xi. fig_1_prop_modclass_by_constit.png xii. fig_s4_cq_thresholds_boxplot.png xiii. fig_2_slopes_and_cv_example.png xiv. fig_3a_plot_map_clusters_5.png xv. fig_3b_all_zscores_clusters_5.png xvi. fig_63a_attribute_values_1.png xvii. fig_s6a_attribute_values_2.png xviii. fig_s6b_attribute_cat_dist.png xix. fig_4a_usmap_modclass.png xx. fig_4b_modclass_by_cluster_legend.png xxi. fig_4c_usmap_slope.png xxii. fig_4d_slope_by_cluster.png xxiii. fig_s7_mosaic_XX.png 1. One for each solute xxiv. fig_s8_corr_slope_horiz.png xxv. fig_5_feat_impt.png
Credits
Funding Agencies
This resource was created using funding from the following sources:
Agency Name | Award Title | Award Number |
---|---|---|
National Science Foundation | NSF EAR-2012123 | |
National Science Foundation | NSF EAR-2012080 | |
National Science Foundation | NSF OIA-2033995 |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment