Improving a streamflow regression model for Wisconsin streams


Authors:
Owners:
Resource type: Composite Resource
Storage: The size of this resource is 2.4 GB
Created: Apr 21, 2021 at 3:48 p.m.
Last updated: Jul 06, 2021 at 10:39 p.m.
DOI: 10.4211/hs.1d78d40efa2844cb9db2c19b67be464d
Citation: See how to cite this resource
Content types: Geographic Feature Content 
Sharing Status: Published
Views: 148
Downloads: 23
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

Streamflows derived from hydrological models are widely used in decision-making processes in a broad array of natural resources applications. With an increase in computational power and data availability, data-driven modeling methods are becoming more powerful and popular. While it is well-recognized that reasonable model uncertainty is important to support good decision-making, there remain substantial challenges in quantifying uncertainty in hydrological models. One challenge is an inequality in data availability. While large amounts of data are available for well-monitored streams, the vast majority of streams globally are ungauged, with very limited or no streamflow monitoring. In this study, I evaluated the accuracy of a mixed-effects model for streamflow (flow-duration curves) across the state of Wisconsin, the Natural Community Model (NCM), trained on continuously monitored streamflow stations. The NCM is used as the basis for scientific studies and management decisions in Wisconsin, but uncertainty in the NCM has not been quantified yet, and performance has not been assessed formally except at continuously monitored streamflow stations. There are about 4,000 streamflow monitoring stations in Wisconsin, but about 3,500 have fewer than 5 sporadic streamflow measurements. I used an index gauge approach to estimate long-term streamflow percentiles (with uncertainty) from short-term or sporadic streamflow monitoring. I then used these estimates to estimate a flow-duration curve for each short-term or sporadic streamflow station (with uncertainty). These flow-duration targets formed the basis for an assessment of NCM accuracy in ungauged streams. I developed a random forest model for NCM error that provides a qualitative understanding of sources of error in the NCM as well as a quantitative way to correct the NCM using information from the sporadic/short-term streamflow stations that could not be included in the original NCM training set. The updated NCM has significantly reduced error, and I defined a reasonable level of uncertainty to be used with the updated NCM in decision-making and research applications.

Subject Keywords

Resource Level Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
North Latitude
46.8617°
East Longitude
-86.5405°
South Latitude
42.2465°
West Longitude
-93.2201°

Temporal

Start Date:
End Date:

Content

README.md

Improving a regression model for streamflow in Wisconsin streams using sporadic flow measurements

Streamflow monitoring in Wisconsin is perfomed by the USGS, Wisconsin Department of Natural Resources, University of Wisconsin, the Wisconsin State Geological Survey, and other local sources. For streamflow data requests, contact XXX at the Wisconsin Department of Natural Resources. This data repository contains code used to generate results and models described by Lapides et al. (2021) and metadata for the streamflow resources. Available data in the Data directory in this repository include:

  • USGS_metadata.csv: Metadata about USGS gage sites in Wisconsin, USA.

  • USGS_gauge_longterm_stat_test: directory containing output from stream gauge length analysis. Types of output include:

    • sitenum_5yr.csv: Table of calculated percentiles (10, 25, 50 ,75, 90) for each of 100 subsets of length e.g., 5 years of the full record length at gauge sitenum.

    • sitenum_bias.csv: Table of absolute percent difference in metric for a record length of n years (given as the row number) compared to the full record length at gauge sitenum.

    • sitenum_cv.csv: Table of coefficient of variation among the calculated percentiles for a record length of n years (given as the row number).

    • sitenum_mse.csv: Table of mean squared error for record length of n years (given as the row number) compared to the full record length at gauge sitenum.

    • sitenum_smse.csv: Table of scaled mean squared error for record length of n years (given as the row number) compared to the full record length at gauge sitenum.

  • WI_streamflow_stations.csv: Metadata for all streamflow stations in WI except long-term USGS monitoring sites. Columns geom_x and geom_y are longitude and latitude, respectively.

  • USGS_ref_13yr.csv: Metadata for all streamflow stations in WI with at least 13 years of data. These sites are used as reference sites for the flow percentile assignment.

  • wd_hydro_va_upstr_topology_ref.csv: WI stream basin topology. Relates stream reaches by HYDROID to those upstream/downstream.

  • quant_geo.csv: Table of streamflow monitoring in Wisconsin except long-term monitoring stations.

  • fmdb_flow-clean.csv: Table of streamflow instantaneous measurements from the fisheries management database at the Wisconsin Department of Natural Resources.

  • xref_date_ncm_r2_extend.csv: Output of reference gauge assignment for Wisconsin streamflow stations. The USGS site id for each reference gauge is in the column titled index_gauge, and the station id is given by its HYDROID.

  • quant_geo.shp: shapefile of all streamflow monitoring in Wisconsin except long-term monitoring stations.

  • USGS_xref_HYDROID.csv: USGS metadata with cross-reference between USGS site id and HYDROID.

  • HUC8: Directory containing shapefile of HUC8 basins in Wisconsin.

  • flow_percentiles: directory containing output of flow percentile assignment program. Files are:

    • sitenum_percentiles.csv: Table of measured flow and assigned flow percentiles (annual, August, April, as appropriate) for site sitenum with an assumed 15% uncertainty range reported as percentile_min, percentile_max where percentile is 10, 25, 50, 75, 90.
  • percentiles_metadata.csv: Table of summary information about calculated flow percentiles for each sporadic or short-term flow station.

  • fit_flowdur_indexgauge.csv: Table of flow-duration curve fit information for reference gauges. Fits were performed using a log function, where y = alog(p-x)+b; y is the streamflow in CFS, p is an offset percentile, x is the streamflow percentile, and a and b are fit constants.

  • flowdur_fit_results.csv: Table of fit information for all streamflow stations. Streamflow at each perentile is given as CFS for each p10, p25, p50, p75, and p90. The error in each streamflow percentile inferred from measurements is given by p10_err, p25_err, p50_err, p75_err, p90_err. The sites are named by HYDROID and site_num. The number of measurements at each site is num_measurements. The fit parameters for the logarithmic fit function are fit_a, fit_b, and use_p. fit_R2, fit_median_err, and rel_residual assess the quality of the fit.

  • W23324_WD_HYDRO_VA_NC_FLOW_TEMP_SV: Shapefile of Natural Community Model output across the state of Wisconsin. REACHID and HYDROID label the station location. TRW_AREA is catchment area. NAT_COMM and TEMP_CLASS are categorizations of water bodies based on stream size and temperature. Each of the temperature statistics is given in degrees Celsius, and the streamflow (Q) statistics are given in CFS.

  • flowdur_targets_final_allinfo.csv: Table of flow-duration targets inferred from measured streamflow with Natural Community Model output for comparison. fracdiff and absdiff columns are the fractional difference and absolute difference between Natural Community Model and inferred streamflows. Natural Community Model streamflows are given as exceedence percentiles, e.g., Q10_ANNUAL, while inferred streamflow targets are given as streamflow percentiles, e.g., p90. frac_underestimate, frac_overestimate, and frac_noerror are the fraction of streamflow targets tdefined at each site that are overestimated, underestimated or accurately estimated. performance category and performance_category_number describe the general error type at each site.

  • spring_data.shp: Shapefile of spring locations and streamflow across Wisconsin.

  • whdplus_dana.csv: Landscape attribute information for all reaches with field measurements in Wisconsin.

  • Wisconsin Bedrock Depth: Shapefile of depth to bedrock across the state of Wisconsin.

  • random_forest_targets_by_site.csv: Table of inputs to random forest model. Each row is one site.

  • random_forest_targets_by_flow.csv: Table of inputs to random forest model. Each row is one streamflow target.

  • random_forest_NCM_error.joblib: A joblib file containing a trained random forest model to estimate error in the Natural Community Model as a function of landscape attributes.

  • whdplus_data_natcomm.csv: Table of landscape attributes for all reaches in Wisconsin included in the Natural Community Model.

  • NCM_uncertainty_table.csv: Table of updated annual Natural Community model with estimated uncertainty for all reaches included in the Natural Community Model. Percentiles in this table are exceedence percentiles for consisentcy with original Natural Community Model.

  • updated_NCM_august.csv: Table of updated august Natural Community Model with estimated uncertainty for all reaches in the NCM. Percentiles in this table are exceedence percentiles for consistency with the original Natural Community Model.

  • updated_NCM_april.csv: Table of updated april Natural Community Model with estimated uncertainty for all reaches in the NCM. Percentiles in this table are exceedence percentiles for consistency with the original Natural Community Model.

Data not included:

Code included in the Code directory in this repository are:

  • data_length_gauges_analysis.ipynb: A jupyter notebook that explores the relationship between streamflow record length and confidence in percentile calculations.

  • Index_gauge_assignment_final.ipynb: A jupyter notebook that identifies the set of reference gauges used in this study and assigns a reference gauge to each streamflow station in quant_geo (see Data directory).

  • Flow_percentiles.ipynb: A jupyter notebook that assigns long-term flow percentiles to sporadic streamflow measurements in Wisconsin based on the flow percentile at reference gauges on the same day.

  • NCM_percentile_comparison.ipynb: A jupyter notebook that analyzes overall performance of the Natural Community model in comparison to the estimated flow-duration targets across the state of Wisconsin.

  • random_forest_input_preparation.ipynb: A jupyter notebook that standardizes all information and combines landscape attributes with streamflow targets to produce an input table of information that can be used to train a random forest model.

  • random_forest_error_analysis.ipynb: A jupyter notebook that trains a random forest model to estimate uncertainty or error in the Natural Community Model and explores model performance and structure.

  • NCM_error_calculate.ipynb: A jupyter notebook that estimates error in the Natural Community Model using the trained random forest model based on user-defined inputs.

  • NCM_error_catalog_preparation.ipynb: A jupyter notebook that estimates NCM error on all NCM reaches using the random forest model and produces and updated version of the NCM that is more accurate and comes with an estimated uncertainty.

  • August_flow_percentiles.ipynb: A jupyter notebook that calculates flow-duration targets for August flows and updates NCM flows using a random forest model trained on August flow error.

  • April_flow_percentiles.ipynb: A jupyter notebook that calculates flow-duration targets for April flows and updates NCM flows using a random forest model trained on April flow error.

References:

Lapides, D. A. (In Review). "Using sporadic streamflow measurements to improve and evaluate a streamflow model in ungauged basins in Wisconsin."

Data Services

The following web services are available for data contained in this resource. Geospatial Feature and Raster data are made available via Open Geospatial Consortium Web Services. The provided links can be copied and pasted into GIS software to access these data. Multidimensional NetCDF data are made available via a THREDDS Data Server using remote data access protocols such as OPeNDAP. Other data services may be made available in the future to support additional data types.

How to Cite

Lapides, D. A. (2021). Improving a streamflow regression model for Wisconsin streams, HydroShare, https://doi.org/10.4211/hs.1d78d40efa2844cb9db2c19b67be464d

This resource is shared under the Creative Commons Attribution CC BY.

 http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required