Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Developing Standardized Testing Datasets for Benchmarking Automated QC Algorithm Performance


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 1.4 GB
Created: Oct 30, 2024 at 2:27 a.m. (UTC)
Last updated: Dec 11, 2025 at 11:32 p.m. (UTC)
Citation: See how to cite this resource
Content types: CSV Content 
Sharing Status: Public
Views: 1185
Downloads: 67
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

Diagnose Aquatic Sensor Data

## Overview
This project is designed to diagnose and flag events in aquatic sensor data based on various conditions and thresholds. It processes raw data from aquatic sites and applies thresholds and logical conditions to identify different types of anomalies. The primary focus is to flag events that may indicate sensor anomalies, environmental conditions (e.g., frozen water), or technician site visits.

### Key Features
1. Event Detection: Detects and flags various event types, such as MNT (maintenance), LWT (low water table), SLM (sensor logger malfunction), PF (power failure), and VIN (visual inspection).
2. Data Quality Control: Uses thresholds to validate sensor readings, ensuring an accurate representation of water conditions.
3. Automated Labelling: Automatically labels events using a set of predefined indicators for anomaly detection.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
Place/Area Name:
Logan River Observetory
Longitude
-111.7957°
Latitude
41.7390°

Temporal

Start Date:
End Date:

Content

README.md

Summary

  • The main Python script automates all key steps for anomaly detection, categorization, and labeling.
  • A YAML configuration file (Initial_Settings.yaml) allows users to customize the analysis without modifying the script.
  • The pipeline generates labeled datasets, visual plots, and categorized anomaly event reports for quality-controlled water sensor data.
  • It generates standardised benchmark testing datasets to evaluate the existing QC methods/tools in water-related data.

Setup Instructions

1. Clone the Repository

bash git clone https://github.com/YourUsername/AutomatedAnomalyDetectionLabeling.git cd AutomatedAnomalyDetectionLabeling

2. Install Required Python Packages

bash pip install -r requirements.txt

3. Run the Anomaly Detection Script

bash AnomalyDetection for 'T&SpCond'_Ver.2.7_Feb.23.py

4. Configuration (Optional)

You can modify the Initial_Settings.yaml file to change: - site: The monitoring site name - variable: The variable of interest (e.g., SpCond, Temperature) - start_year, end_year: Time range for analysis - Any other parameters or plot options Once updated, rerun the script.

Repository Structure

AutomatedAnomalyDetectionLabeling/ │ ├── scripts/ │ ├── AnomalyDetection for 'T&SpCond'_Ver.2.7_Feb.23.py │ └── Initial_Settings.yaml │ ├── InputDatasets/ │ ├── FieldNote_data/ │ ├── Processed_data/ │ └── Raw_data/ │ ├── Results/ │ ├── Plots/ │ ├── requirements.txt │ └── README.md - scripts: provides script needed to run the pipeline. - InputDatasets: retrieve and store input files and essential outputs (e.g., field notes, corrected and raw data). * After running the script, this directory will automatically be created in your local repository. - results: provides .CSV files generated from the analysis, which contain the labeled datasets, list of the anomaly events, and complementary files. * After running the script, this directory will automatically be created in your local repository. - Plots: provides plots for visualizing the results. * After running the script, this directory will automatically be created in your local repository. - Note: The .idea/, .vscode/, and docs/ folders are not required to run the project. They contain editor settings and documentation for the GitHub Pages site.

HydroShare Repository

Explore the full workflow and products (e.g., benchmark dataset, plots) within the Logan River Observatory sites.

🔗 Logan River Observatory – HydroShare Repository

Documentation and webpage

🔗 GitHub Page

References

Related Geospatial Features

This HydroShare resource is linked to the following geospatial features

${ messageObj.message }
${value.text} ${value.text}

Click a point to search for features that overlap with that location.

Select a feature for more information.

How to Cite

Kahrizi, E., J. S. Horsburgh (2025). Developing Standardized Testing Datasets for Benchmarking Automated QC Algorithm Performance, HydroShare, http://www.hydroshare.org/resource/61a71043bc5240bea4baf3ec18872e9d

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required