Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Development and Implementation of Database and Analyses for High Frequency Data


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 712.9 MB
Created: Dec 02, 2018 at 3:27 a.m.
Last updated: Jun 11, 2019 at 5:21 p.m.
Citation: See how to cite this resource
Sharing Status: Public
Views: 2138
Downloads: 104
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

For environmental data measured by a variety of sensors and compiled from various sources, practitioners need tools that facilitate data access and data analysis. Data are often organized in formats that are incompatible with each other and that prevent full data integration. Furthermore, analyses of these data are hampered by the inadequate mechanisms for storage and organization. Ideally, data should be centrally housed and organized in an intuitive structure with established patterns for analyses. However, in reality, the data are often scattered in multiple files without uniform structure that must be transferred between users and called individually and manually for each analysis. This effort describes a process for compiling environmental data into a single, central database that can be accessed for analyses. We use the Logan River watershed and observed water level, discharge, specific conductance, and temperature as a test case. Of interest is analysis of flow partitioning. We formatted data files and organized them into a hierarchy, and we developed scripts that import the data to a database with structure designed for hydrologic time series data. Scripts access the populated database to determine baseflow separation, flow balance, and mass balance and visualize the results. The analyses were compiled into a package of scripts in Python, which can be modified and run by scientists and researchers to determine gains and losses in reaches of interest. To facilitate reproducibility, the database and associated scripts were shared to HydroShare as Jupyter Notebooks so that any user can access the data and perform the analyses, which facilitates standardization of these operations.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
Place/Area Name:
Logan River Watershed
North Latitude
41.9903°
East Longitude
-111.4388°
South Latitude
41.7356°
West Longitude
-111.7711°

Temporal

Start Date:
End Date:

Content

readme.txt

This Read Me file details the functions contained within each of the python scripts shared as a part of this HydroShare resource. The scripts can be used to create and populate a SQLite database, perform data Quality Control, and data transformation on the database data, and conduct a flow balance, mass balance, base flow seperatation analysis on flow data. 



Database Creation: This script contains functionality to create a SQLite ODM database and to write data from csv files to an ODM SQLite database. There are functions for two types of possible input data: 1. A csv with data in serial format (one value per line) with all of the columns of a DataValues table, and 2. A csv with data in cross-tabular format (multiple values per line) in which the script then assigns values for the DataValues table columns. The serial function reads a csv containing information on how each data csv for input is set up. For instance, which row does the data start on and which column contains the DateTime and which column contains the data values. The tab function reads a csv containing the filenames for each tab formatted csv with data. These input csv files are demonstrated by the serial_data_format.csv and the tab_data_format.csv. Examples of serial and tab data files are demonstrated by the serial_data.csv and the tab_data.csv files. 

Database Functions: This script contains a function (returnseries) that queries the database to return a data frame of results that includes all of the columns from the DataValues table. The inputs to the returnseries function are SiteID, VariableID, QualityControlLevelID, and DateRange. The combination of these parameters uniquely defines a series. Though they were not relevant to this case study, because ODM also uses Methods and Sources to define data series, those parameters could be added if needed.  This script also contains a function (writeseries) that writes newly created results to the DataValues table in the database. The input dataframe to the writeseries function must be formatted like the DataValues table in the database.  

Quality Control: Within this script are four functions, correcttowaterlevel, offset_data, linear_drift, and update_metadata, that can be used to perform quality control operations on the raw data and then write the data back to the database using the writeseries function. The correcttowaterlevel function uses a dataframe containing raw total pressure data and a dataframe containing barometric pressure data to correct the total pressure data to water pressure. The water pressure is then converted to water level using Eqn. 1. Any dataset can be offset by a given value using the offset_data function. Linear sensor drift that commonly occurs with conductivity sensors and pH sensors can be corrected for using the linear_drift function. The use of these functions individually or in combination may produce a dataframe with datavalues requiring different metadata than the orginal dataframe; the update_metadata function can be used to change the values of any of the metadata columns in a dataframe originating from the DataValues table in database.

Discharge Generation: The script for discharge generation contains a function for converting water surface elevation data to discharge using the rating curve information at each site. The returnseries function can be used to query water surface elevation from the database. The dataframe can then be fed to the appropriate discharge function to perform a transformation using the site-specific stage-discharge relationship to derive discharge. The resulting data can be back to the database using the writeseries function.

Flow Analysis Tools: This script contains functions that perform the flow balance (Eqn. 3) and mass balance (Eqn. 4,5) analyses and plot the results. For flow and mass balance, the inputs are dataframes returned by the returnseries function containing the values for the flow, and in the case of the mass balance the chosen conservative tracer, of the upper and lower sites of the reach of interest, tributaries along the reach of interest, and any diversions. The inclusion of any tributaries and diversions is optional. An additional dataframe containing data for the concentration of the conservative tracer in the groundwater is required as an input to the mass balance function. These functions return dataframes that can then be plotted using the flowblanceplot or massbalanceplot functions for visualization.

Baseflow Separation: This script contains a function that performs baseflow separation according to Equation 2. The inputs are the SiteID of the highest elevation aquatic site, the SiteID of the site of interest, VariableIDs and QualityControlLevelIDs for discharge and specific conductance, and the date range. Data are obtained from the database using the returnseries function. The function estimates SCRO as the minimum specific conductance of the highest elevation site and SCBF as the maximum specific conductance of the site of interest and calculates baseflow at the site of interest. The output is a data frame with a time series of baseflow. The script also includes a plotting function to visualize the baseflow and total discharge.

Example Script:  An example script was also created that provides a step-by-step example of how to use the returnseries, flowbalance, massbalance, flowbalanceplot, massbalanceplot, baseflow, and baseflowplot functions. The script uses the SQLite database with values from the Logan River to demonstrate the functions use. The example script is provided as a Python script and as Jupyter Notebook. 

How to Cite

Tennant, H., A. S. Jones (2019). Development and Implementation of Database and Analyses for High Frequency Data, HydroShare, http://www.hydroshare.org/resource/bf57045c30054383a6df9bb8cab381d3

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required