Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Supporting data and tools for "Impact of data temporal resolution on quantifying residential end uses of water"


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 15.5 MB
Created: Apr 19, 2022 at 5:10 p.m.
Last updated: Aug 08, 2022 at 4:47 p.m.
DOI: 10.4211/hs.6625bdbde41c45c2b906f32be7ea70f0
Citation: See how to cite this resource
Sharing Status: Published
Views: 644
Downloads: 54
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

The files provided here are the supporting data and code files for the analyses presented in "Impact of data temporal resolution on quantifying residential end uses of water", an article submitted to the Water journal (https://www.mdpi.com/journal/water). The journal paper assessed how the temporal resolution at which water use data are collected impacts our ability to identify water end use events, calculate features of individual events, and classify events by end use. Additionally, we also explored implications for data management associated with collecting this type of data as well as methods and tools for analyzing and extracting information from it. The data were collected in the cities of Logan and Providence, Utah, USA in 2022 and are included in this resource. The code and data included in this resource allow replication of the analyses presented in the journal paper, and the raw data included allow for extension of the analyses conducted.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
North Latitude
41.7531°
East Longitude
-111.7471°
South Latitude
41.6864°
West Longitude
-111.8728°

Content

readme.md

Products included in this HydroShare resource:

  • Code to reproduce all analyses described within the related manuscript.
  • The anonymized pulse data used for analyses described in the manuscript.

Files are organized as follows

  1. FileSize_TempAg contains:

    • A folder named pulsedata with daily CSV files with pulse data for sites 1 and 2. These files are named: s_mmdd.csv, where s represents the site number, 001 or 002. and mmdd are the month and day. All data were collected in 2022.
    • A folder named tmp_agg with daily CSV files for temporally aggregated data for sites 1 and 2. These files are named: opn_2022-mm-dd_df_st_tr_ts.csv, where n represents option 1 or 2 (as defined in the article), mm-dd are the month and day, st represents the site number (001 or 002), and *t is the temporal aggregation of the data (1 s, 4 s, 5 s, or 10 s).

  2. LogEvents contains:

    • EventLog_001.csv: Original CSV file with events logged by Site 1 residents. The file has the following information: datetime: date and time where the event was identified by the user, userlabel: exact label given by the residents, counterval: number of event, label: end use.
    • EventLog_002.csv: Original CSV file with events logged by Site 2 residents. The file has the following information: datetime: date and time where the event was identified by the user, userlabel: exact label given by the residents, counterval: number of event, label: end use.
    • UserLabelledEvents.csv: Processed event files including events labelled by users at both sites that were found in the pulse data. The file has the following information:
      • datetime: event start date and time, YYYY-MM-DD HH:MM:SS in MST
      • id: an event counter
      • duration: event duration, in seconds
      • volume: event volume, in liters
      • average_fr_LPM: average flow rate, in liters per minute (LPM)
      • median_fr_LPM: median flow rate, LPM
      • maximum_fr_LPM: maxflow rate, LPM
      • mode_fr_LPM: mode flow rate, LPM
      • mode_freq_perc: percentage of values that are equal to the mode
      • iqr_fr_LPM: interquartile range, LPM
      • sd_fr_LPM: standard deviation of the flow rate, LPM
      • range_fr_LPM: max - min flow rate, LPM
      • ValuesCount: numer of values recorded in the event,
      • label_datetime: date time when the event was labelled by the resident
      • userlabel: label assigned by the residents
      • counterval: an event counter
      • label: end use
      • site: site where the event was labelled

  3. MeterReading_Logs contains:

    • MeterReadingsLog_001.xlsx: Manual meter readings conducted at site 1. The file has the following information: MR: a counter of meter readings, DateTime: date and time when the meter was read, Reading: actual meter reading, Volume: volume since last reading
    • MeterReadingsLog_002.xlsx: Manual meter readings conducted at site 2. The file has the following information: MR: a counter of meter readings, DateTime: date and time when the meter was read, Reading: actual meter reading, Volume: volume since last reading

  4. PulseData_Processed contains:

    • site_001_AllData.csv: Original CSV file with all the pulse data for site 1 used in the article. The file has the following information: datetime: exact date and time when a pulse was logged by the Pulse Datalogger, and pulse_spacing: time since las pulse, in milliseconds.
    • site_002_AllData.csv: Original CSV file with all the pulse data for site 2 used in the article. The file has the following information: datetime: exact date and time when a pulse was logged by the Pulse Datalogger, and pulse_spacing: time since las pulse, in milliseconds.

  5. RawMagnetometerData contains:

    • RawData_Magnetometer.csv: Raw magnetic field data. The magnetic field is sampled at 155 Hz. The magnetic field is expressed as an unsigned number that varies from 0 to 65,535 in the assigned range (± 4 gauss).

  6. RawPulseData contains:

    • The original CSV files recorded by the Pulse Datalogger at both sites.
    • The files are named: s_mmdd.csv, where s represents the site number, 001 or 002. and mmdd* are the month and day. All data were collected in 2022. The files contain a 3 line header with 1) Date (exact time when logged started), 2) Site: site where data was logged, and 3) ID: a datalogger ID. The files have only one value: time since last pulse (in milliseconds), and the first value is time since logging started (indicated in the 1st line of the header).

All personally identifiable information was removed from the files published here to protect the identities of the study participants.


The R code provided in this resource was developed using: R version 4.1.2 (2021-11-01). Platform: x86_64-apple-darwin17.0 (64-bit). Running under: macOS Monterey 12.0.1


The following R packages are required for running the provided scripts:

  • lubridate - Version 1.8.0 Functions for working with dates/times.
  • tidyverse - Version 1.3.1. A collection of R packages designed for data science.
  • readxl - Version 1.3.1. makes it easy to get data out of Excel and into R
  • scales - Version 1.1.1. Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends
  • cowplot - Version 1.1.1. It provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images.
  • ggh4x - Version 0.2.1. It provides some utility functions that don’t entirely fit within the ‘grammar of graphics’ concept —they can be a bit hacky— but can nonetheless be useful in tweaking your ggplots.

Instructions for Reproducing Results

To reproduce the results:

  1. Download the entire folder. Leave the files together in the folder to ensure the paths to the files remain correct.
  2. Execute FileSizing.R, DataAnalysis.R, DataVerification_QC.R, or RawData_Magnetometer.R using R https://cran.r-project.org/ or R-Studio https://rstudio.com/. The script Functions.R contains functions used in the other scripts and does not produce any output.

Related Resources

The content of this resource references GitHub repository for the pulse datalogger used to collect the data in this study: https://github.com/UCHIC/CIWS-Pulse-Logger
This resource is referenced by Bastidas Pacheco, C.J., Horsburgh, J.S.., Beckwith, A.J. (2022). Impact of temporal resolution on data for quantifying residential end uses of water. Submitted for publication in the Water journal.

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
National Science Foundation Cyberinfrastructure for Intelligent Water Supply (CIWS): Shrinking Big Data for Sustainable Urban Water 1552444
Utah Water Research Laboratory

How to Cite

Bastidas Pacheco, C. J., J. S. Horsburgh, A. S. Beckwith Jr. (2022). Supporting data and tools for "Impact of data temporal resolution on quantifying residential end uses of water", HydroShare, https://doi.org/10.4211/hs.6625bdbde41c45c2b906f32be7ea70f0

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required