Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Authors: |
|
|
---|---|---|
Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
Type: | Resource | |
Storage: | The size of this resource is 334.4 MB | |
Created: | Sep 01, 2022 at 3:50 a.m. | |
Last updated: | Feb 06, 2024 at 12:25 p.m. | |
DOI: | 10.4211/hs.63add4d5826a4b21a6546c571bdece10 | |
Citation: | See how to cite this resource |
Sharing Status: | Published |
---|---|
Views: | 2086 |
Downloads: | 1156 |
+1 Votes: | 2 others +1 this |
Comments: | No comments (yet) |
Abstract
The extensive construction of dams exerts significant human perturbance on river systems and largely changes surface water hydrology. However, reservoir operation has long been simplified or ignored in large-scale hydrological and water resources simulation, partially due to the inaccessibility of operation manuals for most reservoirs. This dataset provides empirical operation rules for 450+ large reservoirs in the Conterminous United States (CONUS), derived from daily inflow and storage records using the machine learning-based generic data-driven operation model (GDROM, Chen et al. 2022, Ad. in Wat. Resour.) Among the reservoirs, those mainly operated for flood control take the largest portion (43%), which are primarily located in Eastern and Central United States; followed by flooding control is irrigation (23%), mostly distributed in the Western United States. We also have hydropower reservoirs (17%) primarily located in the Southeastern United States and the Pacific Northwest, water supply reservoirs (9%), recreation reservoirs (5%), and navigation reservoirs (3%) in the various CONUS regions. The majority length of the records is 15+ years, most of which is sufficiently long to contain inter-annual operation patterns and long-term changes.
The dataset contains 1) the daily operation records from multiple data sources used for model training and validation, and 2) derived operation rules, expressed as "if-then" rules, for each of the 450+ reservoirs. The raw data were processed for training the GDROM, including a) computing "net inflow" to replace the observed inflow to account for storage change due to precipitation, evaporation, seepage, and interaction with groundwater (discharge and recharge); b) detecting and removing the dates with missing data to make continuous time series, and c) correcting outliers (e.g., those with abnormal sudden storage changes). In addition, for each of the reservoirs, the inflow, storage, and release are normalized by the maximum historical storage during the observation period, which enables comparing the extracted operation modules among reservoirs with various sizes. The normalization reduces the time required for hyperparameter tuning, especially the minimum impurity decrease, of which the range of candidate values is considerably decreased. The operation rules for each reservoir contain one or multiple representative operation modules and the hydroclimatic conditions under which the modules are applied. Both the modules and the module application conditions are derived from the Decision Tree; the data-driven model composed of the modules and module application conditions are provided as "if-then" statements.
Subject Keywords
Coverage
Spatial
Content
readme.md
If the readme.md
is not well rendered here, please download the readme.html
and open with a web browser.
- 1. Overview of the dataset
- 2. Data collection and pre-process
- 2.1. Data sources
- 2.2. Data pre-process for inflow
- 2.3. Data pre-process for model training
- 3. Breakdown of the contents
- 3.1. Organization of the dataset
- 3.2. Description of files
- 4. How to implement the operation rule?
- 4.1. Brief workflow
- 4.2. General procedures
- 4.3. Demo script to convert the downloaded text files into a code block
1. Overview of the dataset
This dataset contains the machine learning-based empirical operation rules for each reservoir, in the form of "if-then" statements. Additionally, the historical operaion records used for training are also provided. For each reservoir, the operation rule is represented as one or multiple operation modules and a module application condition. As for the operation module, it is assumed that daily total release volume (acre-feet) is the output, and daily inflow volume (acre-feet) and daily initial storage (acre-feet) is the input, which is demonstrated in Zhao and Cai, (2020) published on AWR. As for the module application condition, it is assumed that operation module is the output, and daily inflow volume, daily initial storage, day of year (DOY) and Palmer Drought Severity Index (PDSI) is the input, which is shown in Chen et al., (2022) published on AWR.
2. Data collection and pre-process
2.1. Data sources
The daily operation record is retrived from multiple sources, including the United States Bureau of Reclamation (USBR), the United States Army Corps of Engineers (USACE) and the ResOpsUS established by Steyeart et al. (2022). Additionally, the state-wide PDSI data is retrived from the National Oceanic and Atmospheric Administration (NOAA) website. Finally, we use the GRanD ID (Lehner et al., 2011) as the unique reservoir identifier, and for reservoirs not covered in the GRanD, we assign the unique ID number, which is elaborated in file desciptions below.
2.2. Data pre-process for inflow
Notebly, we assume that all the acquired storage data is recorded at the end of each day, hence we offset the storage for one day to represent the initial storage. In addition, given that inflow data is missed for a large portion of reservoirs, we calculate the "net inflow" based on water balance as the inflow volume to the reservoir. Specifically, the mass balance equation is:
$$S_{t+1} = S_t + I_t - R_t + (G_t - L_t)$$
where, $S_t$ is the initial storage at day $t$; $I_t$ is the total inflow to the reservoir during day $t$; $R_t$ is the water volume released during day $t$; $G_t$ represents the total water volume gained other than inflow, such as the precipitation and recharge from adjacent aquifers; $L_t$ denotes the total loss of water from the reservoir during day $t$, such as the loss through seepage and evaporation.
The GDROM intends to extract the dynamic operations from historical release, storage change, and inflow considering the mass balance. However, it is not uncommon to observe a complicated flow routing connected to the inundated reservoir, which causes a significant challenge to accurately observe the total inflow volume (Deng et al., 2015). In addition, collecting the water loss from the reservoir remains a vital issue, and the provided data comes with great uncertainty. By comparison, it has higher reliability and less uncertainty regarding the observation of storage volume and release volume, which is calculated from the easily observed water elevation and the controlled discharge, respectively. As a result, we found that inflow observation is unavailable for many reservoirs; it is not uncommon for those with inflow observation to observe the unclosed mass balance. In order to lower the barrier of data quality to training the GDROM, we follow the methods adopted by USBR (2021b) to compute the “net inflow” instead of using the observed inflow; specifically, we assume ignorable error in observed storage and release and calculate the “net inflow” from storage and release observations based on the mass balance. The “net inflow” accounts for the total gain and loss of water implicitly and provides the approximate water volume that purely enters the reservoir.
2.3. Data pre-process for model training
2.3.1. Data segmentation
Since the observed operation (i.e., daily release series) is assumed to follow the Markov process, the training samples must be continuous, while multiple continuous pieces are also acceptable. Thus, we detect and remove the missing dates and break the operation record into multiple continuous pieces from the missing data points. Note that the pieces without sufficient observations cannot capture the latent temporal dependencies of the release decisions, and thus only pieces with more than 100 continuous observations are retained for training. The segmented continuous time series are treated as independent samples during model training.
2.3.2. Outliers handling
In addition, there are some outliers with abnormal sudden storage changes in the operation records, which go with a large negative net inflow followed by a positive net inflow with a similar absolute value in two consecutive days, and vice versa. We assume that these outliers are caused by measurement errors (or documentation typos). To detect and remove the outliers existing with a reservoir, the days with negative net inflows with absolute values greater than the 3rd percentile of the entire net inflow series (absolute value) are considered outlier candidates. If the net inflow in the day before or after an outlier candidate is positive with a similar absolute value, then the adjacent day will be identified as outliers too. After identifying the days with outliers, the storage values for the outlier days are replaced with linearly interpolated values between the normal storage values in the days before and after the outliers. It is noted that the days with large negative net inflow values are not necessarily outliers, given that some reservoirs could experience a sudden storage drop during a sedimentation survey period.
2.3.3. Data normalization
In addition, for each reservoir, the inflow, storage, and release are normalized by the maximum historical storage during the observation period, which scales the values to enable comparing the extracted operation modules among different reservoirs, regardless of the reservoir size. In addition, the normalization reduces the time required for hyperparameter tuning, especially the minimum impurity decrease, of which the range of candidate values is considerably decreased.
3. Breakdown of the contents
3.1. Organization of the dataset
The folders and files are organized as below:
- readme.md
- readme.html
- reservoir_metadata.csv
- data_training/
- reservoirID.csv
- ...
- operation_rule/
- modules/
- reservoirID_moduleID.txt
- ...
- module_conditions/
- reservoirID.txt
- ...
- scripts/
- Reading_GDROM_R.R
- Reading_GDROM_Python.R
3.2. Description of files
The files and folders are described in detail:
readme.md
- The file documenting the metadata for the entire dataset.
readme.html
- The file that can be directly opened with a web browser in case that the
readme.md
is not well rendered.
reservoir_metadata.csv
- The file recording the medatadat for all reservoirs. Each row records a reservoir and each column specifies the specific attribute. Each column is described below.
ID
: the unique identifier used for the reservoir. For most reservoirs, we use the ID used in the GRanD; for those not covered in the GRanD, we assign them the ID starting from 10000.Dam name
: name of the dam.state
: the state where the dam is primarily located in.longitude
and "latitude": the coordinate of the dam.MAIN_USE
: the primary operation purpose of the dam, defined in the GRanD.USE_IRRI
: the priority of irrigation operation. "Main" means that irrigation is the primary operation purpose; "Sec" means that irrigation is the secondary operation purpose; blank means that the reservoir is not operated for irrigation.USE_ELEC
: the priority of electricity operation.USE_SUPP
: the priority of water supply operationUSE_FCON
: the priority of flood control operation.USE_RECR
: the priority of recreation operation.USE_NAVI
: the priority of navigation operation.Maximum Storage
: the maximum storage (acre-feet) during the available time period. It is used to normalize the variables.
data_training/
- This folder contains the normalized historical record used for model training.
- The record of each reservoir is stored in a csv file, named as
reservoirID.csv
.
operation_rule/
- This folder contains the extracted empirical operation rule for each reservoir.
- Two sub-folders are created:
modules/
andmodule_conditions/
, storing the representative operation modules and module application conditions, respectively.
operation_rule/modules/
- This folder contains the extracted representative operation modules for each reservoir.
- For a single reservoir, there may exist one or more modules, which have unique ID starting from 0.
- Each operation module is stored in a txt file, named as
reservoirID_moduleID.txt
. - The operation module is converted to "if-then" statements, with inflow and storage as conditions, and release as consequences.
- The unit for inflow, storage, and release is acre-feet.
operation_rule/module_conditions/
- This folder contains the module application conditions for each reservoir.
- For a single reservoir, there exists one file to specificy its module application condition.
- For each reservoir, the module application condition is stored in a txt file, named as
reservoirID.txt
. - The module application condition is converted to "if-then" statements, with inflow, storage, DOY, and PDSI as conditions, and module ID as consequences.
- The unit for inflow, storage, and release is acre-feet.
scripts/
- This folder conatins scripts used for processing the text-based operation rules before implementing with your model.
scripts/Reading_GDROM_R.R
: the script converting the operation rules in.txt
files into R functions.scripts/Reading_GDROM_Python.R
: the script converting the operation rules in.txt
files into Python functions.
4. How to implement the operation rule?
4.1. Brief workflow
Each reservoir covered in this dataset is associated with a unique ID, which is documented in the reservoir_metadata.csv
. For a specific reservoir, its complete operation rule consists of one or several operation modules and a module application condition, which is stored in the folder operation_rule/modules/
and operation_rule/module_conditions
, respectively. Please download corresponding modules and module application conditions, and convert the plain "if-then" statements to the programming language you are using.
The figure shows the flow of linking the GDROM-based operation rule with hydrological simulation models.
4.2. General procedures
Below we provide general steps to use the operation rules derived from the Generic Data-driven Reservoir Operation Model (GDROM) to add a reservoir component to a watershed hydrologic model.
- Step 0: Locate the reservoir(s) in the river network of your model. GDROM provides releases downstream of a reservoir based on upstream inflows. Hence, releases from an upstream reservoir would affect downstream reservoir operations and hydrology; ensure that the order of reservoirs along the stream network are defined correctly in your model.
- Step 1: find the IDs associated with each reservoir name in reservoir_metadata.csv.
- Step 2: Using the IDs, download the text files of “if-then” rules for all reservoir(s), including the module file and the module application condition file in the operation_rule folder.
- Step 3: Run a program to convert the downloaded text files into a code block that can be directly integrated with your hydrologic simulation model code. Sample R scripts are provided (Reading_GDROM_R.R and Reading_GDROM_Python.R, prepared by Anav Vora) to convert the text files to R or Python code respectively. The sample code requires specifying the reservoir ID from Step 1 for proceeding with the conversion.
- Conduct Steps 1 – 3 for all reservoirs included in your hydrologic model.
4.3. Demo script to convert the downloaded text files into a code block
We provide two demo R scripts to convert the text-based operation rules into R functions and Python functions, respectively. You can either copy the code block for use or download them from the scripts/
folder in this repository.
4.3.1. Converting to R code
```r
Author: Anav Vora
This function creates all GDROM-based functions automatically for the entered reservoir ID
Reservoir_ID=449 require(stringr)
We open the text file, and each line of the text file forms a row (single column, multiple rows)
WhichModule<-read.delim(paste0(Reservoir_ID,".txt"), header = FALSE) #Opens the text file corresponding to reservoir ID sink(paste0("Module_Detection_",Reservoir_ID,".R")) #We begin creating a R function file based on the text file opened in line 6 cat(paste0("Module_Detection_",Reservoir_ID,"<-function(Inflow,Storage,PDSI,DOY){")) #Function first line cat("\n") #new line creation for (i in 1:dim(WhichModule)[1]){ TempLine_Cond <- gsub("and","&&",WhichModule[i,1]) #this replaces all the "and" words in the text file with && (because combining if conditions in R uses &&) TempLine_Cond <- gsub("if ","(",TempLine_Cond) #Replacing "if" with "(" just to make an R type if statement later TempLine_Cond <- gsub(" then.","",TempLine_Cond) #Removing all text after the keyword "then" from the line TempLine_Res <- str_extract(WhichModule[i,1],"then.") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator "=" cat(paste0("if",TempLine_Cond,"){")) #Creating R type "if" command beginning cat("\n") cat(TempLine_Res) #Module based on if condition is extracted to TempLine_Res cat("\n") cat("}") cat("\n") } cat("return(module)") #The function that we have created returns the module to be used under specific conditions cat("\n") cat("}") sink()
We now wish to find out how many modules exist for the reservoir in consideration. Once the modules are determined,
their corresponding files shall be opened iteratively, and a function would be written for each of the modules
UniqueModules=unique(as.numeric(gsub("then module: ","",str_extract(WhichModule[,1],"then.*"))))
The above line first extracts all text after "then" for all rows in WhichModule using str_extract in the stringr package
Then, gsub is used to delete the "then module: " text from extracted strings. We only want the module number
The module number extracted has the character datatype. We convert to numeric first using as.numeric
Lastly, only unique values are determined from the extracted numeric module IDs
for (j in 1:length(UniqueModules)){ TempModule<-read.delim(paste0(Reservoir_ID,"",UniqueModules[j],".txt"), header = FALSE) #Opening a module text file sink(paste0("Module",Reservoir_ID,"",UniqueModules[j],".R")) #We begin creating a R function file based on the text file opened in line 38 cat(paste0("Module",Reservoir_ID,"_",UniqueModules[j],"<-function(Inflow,Storage){")) #Function first line cat("\n") if(dim(TempModule)[1]==1){ TempLine_Res <- str_extract(TempModule,"then.") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator cat(TempLine_Res) cat("\n") }else{ for(i in 1:dim(TempModule)[1]){ TempLine_Cond <- gsub("and","&&",TempModule[i,1]) #this replaces all the "and" words in the text file with && (because combining if conditions in R uses &&) TempLine_Cond <- gsub("if ","(",TempLine_Cond) #Replacing "if" with "(" just to make an R type if statement later TempLine_Cond <- gsub(" then.","",TempLine_Cond) #Removing all text after the keyword "then" from the line TempLine_Res <- str_extract(TempModule[i,1],"then.*") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator cat(paste0("if",TempLine_Cond,"){")) #Creating R type "if" command beginning cat("\n") cat(TempLine_Res) #Module based on if condition is extracted to TempLine_Res cat("\n") cat("}") cat("\n") } } cat("return(Release)") #The function that we have created returns the module to be used under specific conditions cat("\n") cat("}") sink() } ```
4.3.2. Converting to Python code
```r
Author: Anav Vora
This function creates all GDROM-based functions in python automatically for the entered reservoir ID
Reservoir_ID=449 require(stringr)
We open the text file, and each line of the text file forms a row (single column, multiple rows)
WhichModule<-read.delim(paste0(Reservoir_ID,".txt"), header = FALSE) #Opens the text file corresponding to reservoir ID sink(paste0("Module_Detection_",Reservoir_ID,".py")) #We begin creating a python function file based on the text file opened in line 6 cat(paste0("def Module_Detection_",Reservoir_ID,"(Inflow,Storage,PDSI,DOY):")) #Function definition first line cat("\n") #new line creation for (i in 1:dim(WhichModule)[1]){ TempLine_Cond <- gsub(" then.","",WhichModule[i,1]) #Removing all text after the keyword "then" from each line TempLine_Res <- str_extract(WhichModule[i,1],"then.") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator "=" cat(paste0(" ",TempLine_Cond,":")) #Creating python type "if" command beginning with required indentation cat("\n") cat(paste0(" ",TempLine_Res)) #Module based on if condition is extracted to TempLine_Res cat("\n") } cat(" return module") #The function that we have created returns the module to be used under specific conditions cat("\n") sink()
We now wish to find out how many modules exist for the reservoir in consideration. Once the modules are determined,
their corresponding files shall be opened iteratively, and a function would be written for each of the modules
UniqueModules=unique(as.numeric(gsub("then module: ","",str_extract(WhichModule[,1],"then.*"))))
The above line first extracts all text after "then" for all rows in WhichModule using str_extract in the stringr package
Then, gsub is used to delete the "then module: " text from extracted strings. We only want the module number
The module number extracted has the character datatype. We convert to numeric first using as.numeric
Lastly, only unique values are determined from the extracted numeric module IDs
for (j in 1:length(UniqueModules)){ TempModule<-read.delim(paste0(Reservoir_ID,"",UniqueModules[j],".txt"), header = FALSE) #Opening a module text file sink(paste0("Module",Reservoir_ID,"",UniqueModules[j],".py")) #We begin creating a python function file based on the text file opened in line 33 cat(paste0("def Module",Reservoir_ID,"_",UniqueModules[j],"(Inflow,Storage):")) #Function definition line cat("\n") if(dim(TempModule)[1]==1){ TempLine_Res <- str_extract(TempModule,"then.") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator cat(paste0(" ",TempLine_Res)) cat("\n") }else{ for(i in 1:dim(TempModule)[1]){ TempLine_Cond <- gsub(" then.","",TempModule[i,1]) #Removing all text after the keyword "then" from the line TempLine_Res <- str_extract(TempModule[i,1],"then.*") #Extracting all text after the keyword "then" and storing in a new variable TempLine_Res <- gsub("then ","",TempLine_Res) #Removing "then " from the extracted text TempLine_Res <- gsub(": ","=",TempLine_Res) #Replacing : with the assignment operator cat(paste0(" ",TempLine_Cond,":")) #Creating python type "if" command beginning with required indentation cat("\n") cat(paste0(" ",TempLine_Res)) #Module based on if condition is extracted to TempLine_Res cat("\n") } } cat(" return Release") #The function that we have created returns the module to be used under specific conditions cat("\n") sink() } ```
Related Publications
- Zhao, Q., & Cai, X. (2020). Deriving representative reservoir operation rules using a hidden Markov-decision tree model. Advances in Water Resources, 146, 103753. (Link)
- Chen, Y., Li, D., Zhao, Q., & Cai, X. (2022). Developing a generic data-driven reservoir operation model. Advances in Water Resources, 167, 104274. (Link)
- Li, D., Chen, Y., Zhao, Q., & Cai, X. (In Preparation)
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment