Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Data and analysis pipeline for "Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 78.1 MB
Created: May 26, 2026 at 4:56 p.m. (UTC)
Last updated: May 26, 2026 at 5:24 p.m. (UTC)
Citation: See how to cite this resource
Content types: CSV Content 
Sharing Status: Public
Views: 99
Downloads: 2
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

Companion data deposit for Williams (2026, Limnology and Oceanography, in review). Contains the deduplicated bibliometric corpus (1,470 records from Web of Science, Scopus, and ProQuest; abstract text omitted per publisher terms), the curated per-cluster outputs that back every numeric finding in the manuscript (S1-S8 supplementary tables, network metrics, LDA topic-modeling results, AI-assisted thematic annotations), per-cluster evidence bundle, VOSviewer-format network exports, and a snapshot of the R analysis pipeline. See the top-level README.md and per-subdirectory READMEs for column-level metadata and the "Not runnable as written" notes on the code snapshot.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
Place/Area Name:
Great Salt Lake watershed
North Latitude
42.0000°
East Longitude
-110.5000°
South Latitude
39.9000°
West Longitude
-113.5000°

Content

README.md

Data and analysis pipeline for Williams (2026, in review)

Companion data deposit for:

Williams, G.P. (2026). Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research. Limnology and Oceanography. [DOI on acceptance]

What's in this deposit

  • Curated data tables (~25 CSVs) corresponding to the manuscript's supplementary references S1-S8 plus core pipeline outputs.
  • Deduplicated full corpus in both .csv and .rds, with the abstract column omitted (see "A note on abstract text" below).
  • VOSviewer-format network exports (combined search-set, normalized and raw-field variants).
  • Per-cluster evidence bundle (ZIP, 19 clusters, 38 CSVs), abstract-stripped.
  • Snapshot of the R analysis pipeline (~20 scripts) used to produce the numeric findings in the manuscript. NOT runnable as written; see code/README.md.
  • Per-subdirectory README files with column-level metadata for every CSV.

How to cite

Williams, G.P. (2026). Data and analysis pipeline for "Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research." HydroShare. [DOI on acceptance]

Also cite the published manuscript when using these data.

License

  • Data: CC BY 4.0
  • Code: MIT

Quick file map

data/ 01_search_and_corpus/ Search outputs + full deduplicated corpus 02_descriptive/ Corpus descriptive stats 03_networks/ Network metrics (Table 3) 04_topic_modeling/ LDA outputs (Table 10) 05_clusters_and_annotations/ Cluster index + AI-assisted annotations + per-cluster evidence bundle 06_authors_and_sources/ Author + source ranking tables (S5-S8) 07_temporal_trends/ Year x cluster x topic-share trends 08_normalization_maps/ Keyword / institution normalization rules 09_vosviewer_exports/ VOSviewer-format network exports code/ R-pipeline snapshot (see code/README.md) file_inventory.csv Reader-facing inventory: path, bytes, column count, SHA-256, short notes

Every subdirectory has its own README with per-file metadata.

A note on abstract text

Several analysis steps in the pipeline (topic modeling, keyword co-occurrence on abstract terms, LLM-assisted cluster annotation) rely on the full abstract text of each record. Abstracts are copyrighted publisher content and CANNOT be redistributed via this deposit. Every file in this deposit — including the corpus (.csv and .rds), the VOSviewer exports, and the per-cluster evidence bundle — ships with the abstract columns (AB, AB_raw, and abstract-derived variants) omitted.

To re-run the abstract-dependent analysis steps, download the matching records from Web of Science, Scopus, or ProQuest using your own institutional subscriptions and the search strings in the manuscript's Methods §2.1, then merge abstracts back in by DOI / UT.

The PRISMA counts in data/01_search_and_corpus/ document the corpus build at the metadata level, sufficient for replicating the analysis steps that don't need abstract text (descriptive statistics, network construction, author / source rankings, Bradford-law analysis).

Where the raw inputs live

The pipeline reads from Data/CleanData/ in the original project repo (deduplicated WoS / Scopus / ProQuest exports). The raw search exports themselves are NOT included here because of publisher redistribution terms.

Verification

  • file_inventory.csv — one row per deposited file with path, bytes, n_columns (CSVs only), sha256, and a short notes field. Use the SHA-256 to verify a downloaded file has not been corrupted.

Contact

G.P. Williams (Utah State University). ORCID: 0000-0002-2781-0738.

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
Wasatch Front Water Quality Council N/A N/A

How to Cite

Williams, G. (2026). Data and analysis pipeline for "Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research, HydroShare, http://www.hydroshare.org/resource/bc06bc5b0064422583e64c993828149a

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required