Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 30.9 MB
Created: May 26, 2023 at 8:56 p.m.
Last updated: Sep 28, 2023 at 5:38 p.m.
Citation: See how to cite this resource
Sharing Status: Public
Views: 492
Downloads: 285
+1 Votes: 2 others +1 this
Comments: No comments (yet)

Abstract

Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.

Subject Keywords

Content

readme.md

Workshop Materials

This HydroShare resource contains the materials for a workshop delivered at the 2023 CUAHSI Biennial Colloquium in Lake Tahoe, California June 12-14, 2023.

Workshop Title: Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water

Files included in this resource:

  • Workshop_Slides_6-6-2023.pptx: These are the PowerPoint slides presented during the workshop session.
  • USGS_dataretrieval_Example_1.ipynb: A Jupyter notebook with example code for retrieving daily streamflow data for a USGS streamflow gage.
  • USGS_dataretrieval_Example_2.ipynb: A Jupyter notebook with example code for retrieving 15-minute realtime discharge data for a USGS streamflow gage. Additional code examples show how to retrieve multiple sites at once and how to deal with timestamps for the data.
  • USGS_Dataretrieval_Example_3.ipynb: A Jupyter notebook with example code for using additional functions from the USGS dataretrieval package.
  • HydroShare_SnowtoFlow_Example.ipynb: A Jupyter notebook that demonstates an analysis that retrieves streamflow data from USGS NWIS, snow water equivalent data from the NRCS SNOTEL system and then develops a regression model to predict peak streamflow from peak snow water equivalent. This notebook was developed for sharing in HydroShare.
  • HydroShare_hsclient_Example_1.ipynb: A Jupyter notebook that demonstrates how to create a new HydroShare resource, create and edit the metadata for the resource, and then add content files.
  • HydroShare_hsclient_Example_2.ipynb: A jupyter notebook that demonstrates how to retrieve data content from HydroShare using hsclient and then automatically load data into a performant data structure for visualization and analysis.

Running the Notebooks in this Resource

If you want to run the Jupyter notebooks in this resource, you will first need to have a HydroShare user account and be a member of the CUAHSI Cloud Computing Group in HydroShare. Once you have a HydroShare account, click here and then request to join the group. Once you are a member of the group, you will be able to access the CUAHSI JupyterHub server.

Once you are a member of the CUAHSI Cloud Computing Group and are signed into HydroShare, do the following to run the Jupyter Notebooks in this resource:

  1. Click on the "Open With" button at the top of this resource
  2. Select "CUAHSI Jupyter Hub".
  3. Agree to the terms of use and click the "Sign in with HydroShare" button.
  4. You will then authorize the CUAHSI JupyterHub server to use your HydroShare login.
  5. When presented with the Server Options page, select the "Python - v3.8" option and then click the "Start" button. It will take a few moments for your JupyterHub server to start.
  6. Once your JupyterHub server is running, you will see the Jupyter notebooks in the left panel of the window. Double click on one of the Jupyter notebooks to execute it.

Additional Example Notebooks

See the Related Resources metadata below for links to HydroShare resources containing additional example Jupyter notebooks demonstrating how to use the USGS dataretrieval Python package and the hsclient Python client package.

Related Resources

The content of this resource references Horsburgh, J. S., S. S. Black (2021). HydroShare Python Client Library (hsclient) Usage Examples, HydroShare, http://www.hydroshare.org/resource/7561aa12fd824ebb8edbee05af19b910
The content of this resource references Horsburgh, J. S., A. S. Jones, S. S. Black, T. O. Hodson (2022). USGS dataretrieval Python Package Usage Examples, HydroShare, http://www.hydroshare.org/resource/c97c32ecf59b4dff90ef013030c54264

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
National Science Foundation Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW) 1931297

How to Cite

Horsburgh, J. S., A. S. Jones, A. M. Castronova, S. Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water, HydroShare, http://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required