Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Data and analysis pipeline for "Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research
| Authors: |
|
|
|---|---|---|
| Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
| Type: | Resource | |
| Storage: | The size of this resource is 78.1 MB | |
| Created: | May 26, 2026 at 4:56 p.m. (UTC) | |
| Last updated: | May 26, 2026 at 5:24 p.m. (UTC) | |
| Citation: | See how to cite this resource | |
| Content types: | CSV Content |
| Sharing Status: | Public |
|---|---|
| Views: | 64 |
| Downloads: | 0 |
| +1 Votes: | Be the first one to this. |
| Comments: | No comments (yet) |
Abstract
Companion data deposit for Williams (2026, Limnology and Oceanography, in review). Contains the deduplicated bibliometric corpus (1,470 records from Web of Science, Scopus, and ProQuest; abstract text omitted per publisher terms), the curated per-cluster outputs that back every numeric finding in the manuscript (S1-S8 supplementary tables, network metrics, LDA topic-modeling results, AI-assisted thematic annotations), per-cluster evidence bundle, VOSviewer-format network exports, and a snapshot of the R analysis pipeline. See the top-level README.md and per-subdirectory READMEs for column-level metadata and the "Not runnable as written" notes on the code snapshot.
Subject Keywords
Coverage
Spatial
Content
README.md
Data and analysis pipeline for Williams (2026, in review)
Companion data deposit for:
Williams, G.P. (2026). Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research. Limnology and Oceanography. [DOI on acceptance]
What's in this deposit
- Curated data tables (~25 CSVs) corresponding to the manuscript's supplementary references S1-S8 plus core pipeline outputs.
- Deduplicated full corpus in both
.csvand.rds, with the abstract column omitted (see "A note on abstract text" below). - VOSviewer-format network exports (combined search-set, normalized and raw-field variants).
- Per-cluster evidence bundle (ZIP, 19 clusters, 38 CSVs), abstract-stripped.
- Snapshot of the R analysis pipeline (~20 scripts) used to produce
the numeric findings in the manuscript. NOT runnable as
written; see
code/README.md. - Per-subdirectory README files with column-level metadata for every CSV.
How to cite
Williams, G.P. (2026). Data and analysis pipeline for "Identifying Research Gaps and Directions from Published Literature: A Bibliometric and Thematic Synthesis of Utah Lake and Great Salt Lake Research." HydroShare. [DOI on acceptance]
Also cite the published manuscript when using these data.
License
- Data: CC BY 4.0
- Code: MIT
Quick file map
data/
01_search_and_corpus/ Search outputs + full deduplicated corpus
02_descriptive/ Corpus descriptive stats
03_networks/ Network metrics (Table 3)
04_topic_modeling/ LDA outputs (Table 10)
05_clusters_and_annotations/ Cluster index + AI-assisted annotations
+ per-cluster evidence bundle
06_authors_and_sources/ Author + source ranking tables (S5-S8)
07_temporal_trends/ Year x cluster x topic-share trends
08_normalization_maps/ Keyword / institution normalization rules
09_vosviewer_exports/ VOSviewer-format network exports
code/ R-pipeline snapshot (see code/README.md)
file_inventory.csv Reader-facing inventory: path, bytes,
column count, SHA-256, short notes
Every subdirectory has its own README with per-file metadata.
A note on abstract text
Several analysis steps in the pipeline (topic modeling, keyword
co-occurrence on abstract terms, LLM-assisted cluster annotation)
rely on the full abstract text of each record. Abstracts are
copyrighted publisher content and CANNOT be redistributed via this
deposit. Every file in this deposit — including the corpus
(.csv and .rds), the VOSviewer exports, and the per-cluster
evidence bundle — ships with the abstract columns (AB, AB_raw,
and abstract-derived variants) omitted.
To re-run the abstract-dependent analysis steps, download the matching records from Web of Science, Scopus, or ProQuest using your own institutional subscriptions and the search strings in the manuscript's Methods §2.1, then merge abstracts back in by DOI / UT.
The PRISMA counts in data/01_search_and_corpus/ document the
corpus build at the metadata level, sufficient for replicating the
analysis steps that don't need abstract text (descriptive
statistics, network construction, author / source rankings,
Bradford-law analysis).
Where the raw inputs live
The pipeline reads from Data/CleanData/ in the original project
repo (deduplicated WoS / Scopus / ProQuest exports). The raw search
exports themselves are NOT included here because of publisher
redistribution terms.
Verification
file_inventory.csv— one row per deposited file withpath,bytes,n_columns(CSVs only),sha256, and a shortnotesfield. Use the SHA-256 to verify a downloaded file has not been corrupted.
Contact
G.P. Williams (Utah State University). ORCID: 0000-0002-2781-0738.
Credits
Funding Agencies
This resource was created using funding from the following sources:
| Agency Name | Award Title | Award Number |
|---|---|---|
| Wasatch Front Water Quality Council | N/A | N/A |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment