Improving a streamflow regression model for Wisconsin streams
Authors:  ${ author.name  nameWithoutCommas } ${ author.organization } ${ author.email }  

Owners: 


Resource type:  Composite Resource  
Storage:  The size of this resource is 2.4 GB  
Created:  Apr 21, 2021 at 3:48 p.m.  
Last updated:  Jul 06, 2021 at 10:39 p.m.


DOI:  10.4211/hs.1d78d40efa2844cb9db2c19b67be464d  
Citation:  See how to cite this resource  
Content types:  Geographic Feature Content 
Sharing Status:  Published 

Views:  148 
Downloads:  23 
+1 Votes:  Be the first one to this. 
Comments:  No comments (yet) 
Abstract
Streamflows derived from hydrological models are widely used in decisionmaking processes in a broad array of natural resources applications. With an increase in computational power and data availability, datadriven modeling methods are becoming more powerful and popular. While it is wellrecognized that reasonable model uncertainty is important to support good decisionmaking, there remain substantial challenges in quantifying uncertainty in hydrological models. One challenge is an inequality in data availability. While large amounts of data are available for wellmonitored streams, the vast majority of streams globally are ungauged, with very limited or no streamflow monitoring. In this study, I evaluated the accuracy of a mixedeffects model for streamflow (flowduration curves) across the state of Wisconsin, the Natural Community Model (NCM), trained on continuously monitored streamflow stations. The NCM is used as the basis for scientific studies and management decisions in Wisconsin, but uncertainty in the NCM has not been quantified yet, and performance has not been assessed formally except at continuously monitored streamflow stations. There are about 4,000 streamflow monitoring stations in Wisconsin, but about 3,500 have fewer than 5 sporadic streamflow measurements. I used an index gauge approach to estimate longterm streamflow percentiles (with uncertainty) from shortterm or sporadic streamflow monitoring. I then used these estimates to estimate a flowduration curve for each shortterm or sporadic streamflow station (with uncertainty). These flowduration targets formed the basis for an assessment of NCM accuracy in ungauged streams. I developed a random forest model for NCM error that provides a qualitative understanding of sources of error in the NCM as well as a quantitative way to correct the NCM using information from the sporadic/shortterm streamflow stations that could not be included in the original NCM training set. The updated NCM has significantly reduced error, and I defined a reasonable level of uncertainty to be used with the updated NCM in decisionmaking and research applications.
Subject Keywords
Resource Level Coverage
Spatial
Temporal
Start Date:  

End Date: 
Content
README.md
Improving a regression model for streamflow in Wisconsin streams using sporadic flow measurements
Streamflow monitoring in Wisconsin is perfomed by the USGS, Wisconsin Department of Natural Resources, University of Wisconsin, the Wisconsin State Geological Survey, and other local sources. For streamflow data requests, contact XXX at the Wisconsin Department of Natural Resources. This data repository contains code used to generate results and models described by Lapides et al. (2021) and metadata for the streamflow resources. Available data in the Data directory in this repository include:

USGS_metadata.csv: Metadata about USGS gage sites in Wisconsin, USA.

USGS_gauge_longterm_stat_test: directory containing output from stream gauge length analysis. Types of output include:

sitenum_5yr.csv: Table of calculated percentiles (10, 25, 50 ,75, 90) for each of 100 subsets of length e.g., 5 years of the full record length at gauge sitenum.

sitenum_bias.csv: Table of absolute percent difference in metric for a record length of n years (given as the row number) compared to the full record length at gauge sitenum.

sitenum_cv.csv: Table of coefficient of variation among the calculated percentiles for a record length of n years (given as the row number).

sitenum_mse.csv: Table of mean squared error for record length of n years (given as the row number) compared to the full record length at gauge sitenum.

sitenum_smse.csv: Table of scaled mean squared error for record length of n years (given as the row number) compared to the full record length at gauge sitenum.


WI_streamflow_stations.csv: Metadata for all streamflow stations in WI except longterm USGS monitoring sites. Columns geom_x and geom_y are longitude and latitude, respectively.

USGS_ref_13yr.csv: Metadata for all streamflow stations in WI with at least 13 years of data. These sites are used as reference sites for the flow percentile assignment.

wd_hydro_va_upstr_topology_ref.csv: WI stream basin topology. Relates stream reaches by HYDROID to those upstream/downstream.

quant_geo.csv: Table of streamflow monitoring in Wisconsin except longterm monitoring stations.

fmdb_flowclean.csv: Table of streamflow instantaneous measurements from the fisheries management database at the Wisconsin Department of Natural Resources.

xref_date_ncm_r2_extend.csv: Output of reference gauge assignment for Wisconsin streamflow stations. The USGS site id for each reference gauge is in the column titled index_gauge, and the station id is given by its HYDROID.

quant_geo.shp: shapefile of all streamflow monitoring in Wisconsin except longterm monitoring stations.

USGS_xref_HYDROID.csv: USGS metadata with crossreference between USGS site id and HYDROID.

HUC8: Directory containing shapefile of HUC8 basins in Wisconsin.

flow_percentiles: directory containing output of flow percentile assignment program. Files are:
 sitenum_percentiles.csv: Table of measured flow and assigned flow percentiles (annual, August, April, as appropriate) for site sitenum with an assumed 15% uncertainty range reported as percentile_min, percentile_max where percentile is 10, 25, 50, 75, 90.

percentiles_metadata.csv: Table of summary information about calculated flow percentiles for each sporadic or shortterm flow station.

fit_flowdur_indexgauge.csv: Table of flowduration curve fit information for reference gauges. Fits were performed using a log function, where y = alog(px)+b; y is the streamflow in CFS, p is an offset percentile, x is the streamflow percentile, and a and b are fit constants.

flowdur_fit_results.csv: Table of fit information for all streamflow stations. Streamflow at each perentile is given as CFS for each p10, p25, p50, p75, and p90. The error in each streamflow percentile inferred from measurements is given by p10_err, p25_err, p50_err, p75_err, p90_err. The sites are named by HYDROID and site_num. The number of measurements at each site is num_measurements. The fit parameters for the logarithmic fit function are fit_a, fit_b, and use_p. fit_R2, fit_median_err, and rel_residual assess the quality of the fit.

W23324_WD_HYDRO_VA_NC_FLOW_TEMP_SV: Shapefile of Natural Community Model output across the state of Wisconsin. REACHID and HYDROID label the station location. TRW_AREA is catchment area. NAT_COMM and TEMP_CLASS are categorizations of water bodies based on stream size and temperature. Each of the temperature statistics is given in degrees Celsius, and the streamflow (Q) statistics are given in CFS.

flowdur_targets_final_allinfo.csv: Table of flowduration targets inferred from measured streamflow with Natural Community Model output for comparison. fracdiff and absdiff columns are the fractional difference and absolute difference between Natural Community Model and inferred streamflows. Natural Community Model streamflows are given as exceedence percentiles, e.g., Q10_ANNUAL, while inferred streamflow targets are given as streamflow percentiles, e.g., p90. frac_underestimate, frac_overestimate, and frac_noerror are the fraction of streamflow targets tdefined at each site that are overestimated, underestimated or accurately estimated. performance category and performance_category_number describe the general error type at each site.

spring_data.shp: Shapefile of spring locations and streamflow across Wisconsin.

whdplus_dana.csv: Landscape attribute information for all reaches with field measurements in Wisconsin.

Wisconsin Bedrock Depth: Shapefile of depth to bedrock across the state of Wisconsin.

random_forest_targets_by_site.csv: Table of inputs to random forest model. Each row is one site.

random_forest_targets_by_flow.csv: Table of inputs to random forest model. Each row is one streamflow target.

random_forest_NCM_error.joblib: A joblib file containing a trained random forest model to estimate error in the Natural Community Model as a function of landscape attributes.

whdplus_data_natcomm.csv: Table of landscape attributes for all reaches in Wisconsin included in the Natural Community Model.

NCM_uncertainty_table.csv: Table of updated annual Natural Community model with estimated uncertainty for all reaches included in the Natural Community Model. Percentiles in this table are exceedence percentiles for consisentcy with original Natural Community Model.

updated_NCM_august.csv: Table of updated august Natural Community Model with estimated uncertainty for all reaches in the NCM. Percentiles in this table are exceedence percentiles for consistency with the original Natural Community Model.

updated_NCM_april.csv: Table of updated april Natural Community Model with estimated uncertainty for all reaches in the NCM. Percentiles in this table are exceedence percentiles for consistency with the original Natural Community Model.
Data not included:
 Active Wells: A shapefile of active highcapacity wells in the state of Wisconsin. These data are available upon request from DRWaterUseRegistration@wisconsin.gov (see https://dnr.wisconsin.gov/topic/WaterUse/data.html).
Code included in the Code directory in this repository are:

data_length_gauges_analysis.ipynb: A jupyter notebook that explores the relationship between streamflow record length and confidence in percentile calculations.

Index_gauge_assignment_final.ipynb: A jupyter notebook that identifies the set of reference gauges used in this study and assigns a reference gauge to each streamflow station in quant_geo (see Data directory).

Flow_percentiles.ipynb: A jupyter notebook that assigns longterm flow percentiles to sporadic streamflow measurements in Wisconsin based on the flow percentile at reference gauges on the same day.

NCM_percentile_comparison.ipynb: A jupyter notebook that analyzes overall performance of the Natural Community model in comparison to the estimated flowduration targets across the state of Wisconsin.

random_forest_input_preparation.ipynb: A jupyter notebook that standardizes all information and combines landscape attributes with streamflow targets to produce an input table of information that can be used to train a random forest model.

random_forest_error_analysis.ipynb: A jupyter notebook that trains a random forest model to estimate uncertainty or error in the Natural Community Model and explores model performance and structure.

NCM_error_calculate.ipynb: A jupyter notebook that estimates error in the Natural Community Model using the trained random forest model based on userdefined inputs.

NCM_error_catalog_preparation.ipynb: A jupyter notebook that estimates NCM error on all NCM reaches using the random forest model and produces and updated version of the NCM that is more accurate and comes with an estimated uncertainty.

August_flow_percentiles.ipynb: A jupyter notebook that calculates flowduration targets for August flows and updates NCM flows using a random forest model trained on August flow error.

April_flow_percentiles.ipynb: A jupyter notebook that calculates flowduration targets for April flows and updates NCM flows using a random forest model trained on April flow error.
References:
Lapides, D. A. (In Review). "Using sporadic streamflow measurements to improve and evaluate a streamflow model in ungauged basins in Wisconsin."
Data Services
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment