Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Supporting data and tools for "An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data"
|This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (email@example.com) for information on this resource.
|The size of this resource is 13.7 MB
|Jan 25, 2021 at 4:16 p.m.
|Apr 17, 2023 at 3:33 p.m.
|See how to cite this resource
|Be the first one to this.
|No comments (yet)
The files provided here are the supporting data and code files for the analyses presented in "An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data," an article in Environmental Modelling and Software (https://doi.org/10.1016/j.envsoft.2021.105137). The data included in this resource were processed using the Cyberinfrastructure for Intelligent Water Supply (CIWS) (https://github.com/UCHIC/CIWS-Server), and collected using the CIWS-Node (https://github.com/UCHIC/CIWS-WM-Node) data logging device. CIWS is an open-source, modular, generalized architecture designed to automate the process from data collection to analysis and presentation of high temporal residential water use data. The CIWS-Node is a low cost device capable of collecting this type of data on magnetically driven water meters. The code included allows replication of the analyses presented in the journal paper, and the raw data included allow for extension of the analyses conducted. The journal paper presents the architecture design and a prototype implementation for CIWS that was built using existing open-source technologies, including smart meters, databases, and services. Two case studies were selected to test functionalities of CIWS, including push and pull data models within single family and multi-unit residential contexts, respectively. CIWS was tested for scalability and performance within our design constraints and proved to be effective within both case studies. All CIWS elements and the case study data described are freely available for re-use.
Products included in this HydroShare resource:
- Code corresponding to the Data Analytics Layer described within the related manuscript.
- The anonymized quality controlled data collected and used for analyses described in the manuscript.
- Python and R code used to create the figures and tables included in the System Test section of the manuscript.
Folders are organized as follows:
- QC_data.csv: the quality controlled data collected for Case Study 1 described in the manuscript.
- sites.csv: the characteristics of the site included in the study.
- Training_Dataset.csv: manually labeled events at the property described in the manuscript, used to train the machine learning algorithm implemented.
- InfluxDB_Loading.ipynb: A Jupyter Notebook with information required to set up InfluxDB, create an InfluxDB database with characteristics described in the manuscript, and upload QC_data.csv into this database.
- settings.json: Information required to connect to the InfluxDB database created on the previous step.
- da_functiuons.py: A simple application programming interface (API) for querying data from the InfluxDB database.
- CIWS_Disaggregator.py: Code to filter, disaggreagate and classify the raw data into end uses of water.
- data_analytics.ipynb: A Jupyter Notebook with code to analyze the raw data collected using the tools described on e and f.
- data_loader.log: log file form the Data Loading Service (DLS) testing.
- transfer_manager.log: log file from the Data Transfer Manager (DTM) test between 6 and 96 dataloggers.
- transfer_manager_480.log: log file from the DTM test with 480 dataloggers.
- data_poster_nf.log: log file from the Data Posting Service (DPS) test, n represents the number of files for each test, as described in the manuscript.
- DataPosting_test.R, TransferManager_test.R, and DataLoader_test.R: R files to analyze the test from the DPS, DTM and DLS respectively.
All personally identifiable information was removed from the files published here to protect the identities of the study participants.
The R code provided in this resource was developed using R version 4.0.3. The following R packages are required for running the provided scripts:
- lubridate - Version 1.7.8. Functions for working with dates/times.
- tidyverse - Version 1.3.0. A collection of R packages designed for data science.
- RColorBrewer - 1.1.2 Tool to manage colors with R.
The Python code in this resource was developed using Python 3.7.7. The following Python dependencies are required for running the provided scripts:
- os - Functions for interacting with the operating system.
- influx - Version 5.3.1. A client for interacting with InfluxDB.
- pandas - Version 0.24.2. A fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
- matplotlib - Version 3.3.3. A comprehensive library for creating static, animated, and interactive visualizations in Python.
- seaborn - Version 0.11.1. A Python data visualization library based on matplotlib.
- numpy - Version 0.16.4. A package for scientific computing with Python.
- sklearn - Version 0.21.2 tools for predictive data analysis
Instructions for Reproducing Results
To reproduce the results included in the DataAnalyticsLayer folder:
- Download the entire folder to a machine on which you have created a Python environment that satisfies the above dependencies.
- Install InfluxDB on a machine using the instructions available at https://docs.influxdata.com/influxdb/v1.8/introduction/install/. We used InfluxDB version 1.8.4 for our testing.
- Run the InfluxDB_Loading.ipynb Jupyter Notebook to create an InfluxDB database and load the data from QC_data.csv. All the configuration parameters for connecting to Influx are included in the settings.json file. Default parameters (hostname, port, username, password) for a new Influx installation are used, if these values have been previously modified, settings.json needs to be updated before running this script.
- Run the data_analytics.ipynb Jupyter Notebook to reproduce the results presented in the Data Analytics Layer section for case study 1.
Note: there are more detailed instructions within each Jupyter notebook script.
To reproduce the results included in the SystemTest folder:
|This resource is referenced by
|Camilo J. Bastidas Pacheco , Joseph C. Brewer, Jeffery S. Horsburgh, Juan Caraballo, 2021. An open source cyberinfrastructure for collecting, processing, storing and accessing high temporal resolution residential water use data, Environmental Modelling and Software, https://doi.org/10.1016/j.envsoft.2021.105137.
|The content of this resource references
|Attallah, N., C. J. Bastidas Pacheco (2023). Supporting data and tools for "An Open-source, Semi-supervised Water End Use Disaggregation and Classification Tool", HydroShare, https://doi.org/10.4211/hs.3143b3b1bdff48e0aaebcb4aedf02feb
This resource was created using funding from the following sources:
|National Science Foundation
|Cyberinfrastructure for Intelligent Water Supply (CIWS): Shrinking Big Data for Sustainable Urban Water
People or Organizations that contributed technically, materially, financially, or provided general support for the creation of the resource's content but are not considered authors.
|Utah State University
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.http://creativecommons.org/licenses/by/4.0/