Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.

This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.

Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository

Authors:
Owners:		This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type:	Resource
Storage:	The size of this resource is 50.9 MB
Created:	Oct 28, 2021 at 5:40 p.m. (UTC)
Last updated:	Jan 09, 2024 at 11:57 p.m. (UTC)
Citation:	See how to cite this resource

Sharing Status:	Public
Views:	2475
Downloads:	23
+1 Votes:	Be the first one to this.
Comments:	No comments (yet)

Abstract

Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/) Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/

Subject Keywords

Deleting all keywords will set the resource sharing status to private.

Content

Learn more about the BagIt download

Select a file to see file type metadata.

Credits

Funding Agencies

This resource was created using funding from the following sources:

Agency Name	Award Title	Award Number
National Science Foundation	Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW)	1931297
National Science Foundation	Collaborative Research: SI2-SSI: Cyberinfrastructure for Advancing Hydrologic Knowledge through Collaborative Integration of Data Science, Modeling and Analysis	1664061

How to Cite

Horsburgh, J. S. (2024). Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository, HydroShare, http://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/

Comments

There are currently no comments