Supporting data and tools for "An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data"


Authors:
Owners:
Resource type: Composite Resource
Storage: The size of this resource is 20.8 MB
Created: Jan 25, 2021 at 4:16 p.m.
Last updated: Jul 19, 2021 at 12:57 a.m.
DOI: 10.4211/hs.aaa7246437144f2390411ef9f2f4ebd0
Citation: See how to cite this resource
Sharing Status: Published
Views: 348
Downloads: 24
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

The files provided here are the supporting data and code files for the analyses presented in "An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data," an article in Environmental Modelling and Software (https://doi.org/10.1016/j.envsoft.2021.105137). The data included in this resource were processed using the Cyberinfrastructure for Intelligent Water Supply (CIWS) (https://github.com/UCHIC/CIWS-Server), and collected using the CIWS-Node (https://github.com/UCHIC/CIWS-WM-Node) data logging device. CIWS is an open-source, modular, generalized architecture designed to automate the process from data collection to analysis and presentation of high temporal residential water use data. The CIWS-Node is a low cost device capable of collecting this type of data on magnetically driven water meters. The code included allows replication of the analyses presented in the journal paper, and the raw data included allow for extension of the analyses conducted. The journal paper presents the architecture design and a prototype implementation for CIWS that was built using existing open-source technologies, including smart meters, databases, and services. Two case studies were selected to test functionalities of CIWS, including push and pull data models within single family and multi-unit residential contexts, respectively. CIWS was tested for scalability and performance within our design constraints and proved to be effective within both case studies. All CIWS elements and the case study data described are freely available for re-use.

Subject Keywords

Resource Level Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
North Latitude
41.7471°
East Longitude
-111.7883°
South Latitude
41.6953°
West Longitude
-111.8521°

Temporal

Start Date:
End Date:

Content

readme.md

Products included in this HydroShare resource:

  • Code corresponding to the Data Analytics Layer described within the related manuscript.
  • The anonymized quality controlled data collected and used for analyses described in the manuscript.
  • Python and R code used to create the figures and tables included in the System Test section of the manuscript.

Folders are organized as follows:

  1. DataAnalyticsLayer contains:

    • QC_data.csv: the quality controlled data collected for Case Study 1 described in the manuscript.
    • sites.csv: the characteristics of the site included in the study.
    • Training_Dataset.csv: manually labeled events at the property described in the manuscript, used to train the machine learning algorithm implemented.
    • InfluxDB_Loading.ipynb: A Jupyter Notebook with information required to set up InfluxDB, create an InfluxDB database with characteristics described in the manuscript, and upload QC_data.csv into this database.
    • settings.json: Information required to connect to the InfluxDB database created on the previous step.
    • da_functiuons.py: A simple application programming interface (API) for querying data from the InfluxDB database.
    • CIWS_Disaggregator.py: Code to filter, disaggreagate and classify the raw data into end uses of water.
    • data_analytics.ipynb: A Jupyter Notebook with code to analyze the raw data collected using the tools described on e and f.
  2. SystemTest contains:

    • data_loader.log: log file form the Data Loading Service (DLS) testing.
    • transfer_manager.log: log file from the Data Transfer Manager (DTM) test between 6 and 96 dataloggers.
    • transfer_manager_480.log: log file from the DTM test with 480 dataloggers.
    • data_poster_nf.log: log file from the Data Posting Service (DPS) test, n represents the number of files for each test, as described in the manuscript.
    • DataPosting_test.R, TransferManager_test.R, and DataLoader_test.R: R files to analyze the test from the DPS, DTM and DLS respectively.

All personally identifiable information was removed from the files published here to protect the identities of the study participants.

The R code provided in this resource was developed using R version 4.0.3. The following R packages are required for running the provided scripts:
  • lubridate - Version 1.7.8. Functions for working with dates/times.
  • tidyverse - Version 1.3.0. A collection of R packages designed for data science.
  • RColorBrewer - 1.1.2 Tool to manage colors with R.
The Python code in this resource was developed using Python 3.7.7. The following Python dependencies are required for running the provided scripts:
  • os - Functions for interacting with the operating system.
  • influx - Version 5.3.1. A client for interacting with InfluxDB.
  • pandas - Version 0.24.2. A fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
  • matplotlib - Version 3.3.3. A comprehensive library for creating static, animated, and interactive visualizations in Python.
  • seaborn - Version 0.11.1. A Python data visualization library based on matplotlib.
  • numpy - Version 0.16.4. A package for scientific computing with Python.
  • sklearn - Version 0.21.2 tools for predictive data analysis
  • json a lightweight data interchange format inspired by JavaScript object literal syntax (although it is not a strict subset of JavaScript

Instructions for Reproducing Results

To reproduce the results included in the DataAnalyticsLayer folder:

  1. Download the entire folder to a machine on which you have created a Python environment that satisfies the above dependencies.
  2. Install InfluxDB on a machine using the instructions available at https://docs.influxdata.com/influxdb/v1.8/introduction/install/. We used InfluxDB version 1.8.4 for our testing.
  3. Run the InfluxDB_Loading.ipynb Jupyter Notebook to create an InfluxDB database and load the data from QC_data.csv. All the configuration parameters for connecting to Influx are included in the settings.json file. Default parameters (hostname, port, username, password) for a new Influx installation are used, if these values have been previously modified, settings.json needs to be updated before running this script.
  4. Run the data_analytics.ipynb Jupyter Notebook to reproduce the results presented in the Data Analytics Layer section for case study 1.
Note: there are more detailed instructions within each Jupyter notebook script.

To reproduce the results included in the SystemTest folder:

  1. Download the entire folder. Leave the files together in the folder to ensure the paths to the files remain correct.
  2. Execute any R Script using R https://cran.r-project.org/ or R-Studio https://rstudio.com/.

References

Related Resources

This resource cites: Attallah, N., J. S. Horsburgh, C. J. Bastidas Pacheco (2021). Tools for Evaluating, Developing, and Testing Water End Use Disaggregation Algorithms, HydroShare, http://www.hydroshare.org/resource/1521eba67f1d4571ac5fe2b8c5e01035
The content of this resource serves as the data for: Camilo J. Bastidas Pacheco , Joseph C. Brewer, Jeffery S. Horsburgh, Juan Caraballo, 2021. An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data, Environmental Modelling and Software, https://doi.org/10.1016/j.envsoft.2021.105137.

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
National Science Foundation Cyberinfrastructure for Intelligent Water Supply (CIWS): Shrinking Big Data for Sustainable Urban Water 1552444

Contributors

People or Organizations that contributed technically, materially, financially, or provided general support for the creation of the resource's content but are not considered authors.

Name Organization Address Phone Author Identifiers
Joseph Brewer Utah State University

How to Cite

Bastidas Pacheco, C. J., J. S. Horsburgh, J. Caraballo, N. Attallah (2021). Supporting data and tools for "An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data", HydroShare, https://doi.org/10.4211/hs.aaa7246437144f2390411ef9f2f4ebd0

This resource is shared under the Creative Commons Attribution CC BY.

 http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required