Waterhackweek2019 Data Access and Time-series Statistics Cyberseminar

Resource type: Composite Resource
Storage: The size of this resource is 3.7 MB
Created: Mar 20, 2019 at 11:55 p.m.
Last updated: Apr 01, 2019 at 6:39 a.m.
DOI: 10.4211/hs.9985b3cb38c94cee872b28f6dcdef739
Citation: See how to cite this resource
Sharing Status: Published
Views: 989
Downloads: 121
+1 Votes: Be the first one to  +1 this.  (You need to be logged in to rate this.)
Comments: No comments (yet)


Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.

Cyberseminar recording is available on Youtube at https://youtu.be/uQXuS1AB2M0

Subject Keywords

Deleting all keywords will set the resource sharing status to private.



Data access and time-series statistics cyberseminar

Repo for "Data access and time-series statistics" WaterHackWeek cyberseminar. This seminar took place on February 7, 2019. The seminar series, including abstracts and links to seminar recordings, is available at https://www.cuahsi.org/education/cyberseminars/waterhackweek-cyberseminar-series/

Conda environment and Jupyter

First make sure the miniconda or anaconda conda version is installed. See instructions below for miniconda. See below for installation instructions.

To install the conda environment used for running the Jupyter notebooks in this seminar, change to the directory where the environment.yml file is found, downloaded from this GitHub repository (or based on a git clone). Then, at the terminal, run: bash conda env create -f environment.yml An environment called whwtimeseries will be created. Note that this environment doesn't include Jupyter. It assumes you are running Jupyterlab (or jupyter notebook) using a different conda environment where Jupyterlab is installed.

Install miniconda and setup jupyter lab

Steps taken from https://geohackweek.github.io/preliminary/01-conda-tutorial/ Instructions for MacOSX and Windows are also available there.

On linux: ```bash

Install miniconda

url=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh wget $url -O miniconda.sh bash miniconda.sh -b -p $HOME/miniconda export PATH="$HOME/miniconda/bin:$PATH" conda update conda --yes

Create a conda environment with jupyterlab

conda create -n jupyterlab -c conda-forge python=3.6 jupyterlab nb_conda_kernels

Starting jupyter lab

source activate jupyterlab

Then run jupyter lab

jupyter lab ```

How to Cite

Mayorga, E., Y. Cheng (2019). Waterhackweek2019 Data Access and Time-series Statistics Cyberseminar, HydroShare, https://doi.org/10.4211/hs.9985b3cb38c94cee872b28f6dcdef739

This resource is shared under the Creative Commons Attribution CC BY.



There are currently no comments

New Comment