Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.

This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.

Peters - GEODEEPDIVE: AUTOMATING THE LOCATION AND EXTRACTION OF DATA AND INFORMATION FROM DIGITAL PUBLICATIONS

Authors:
Owners:		This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type:	Resource
Storage:	The size of this resource is 20.3 MB
Created:	Dec 06, 2018 at 6:40 p.m. (UTC)
Last updated:	Jul 08, 2026 at 1:13 a.m. (UTC)
Citation:	See how to cite this resource

Sharing Status:	Public
Views:	3680
Downloads:	58
+1 Votes:	Be the first one to this.
Comments:	No comments (yet)

Abstract

PETERS, Shanan E.1, ROSS, Ian2, CZAPLEWSKI, John3 and LIVNY, Miron2, (1)Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706, (2)Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, (3)Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706

Modern scientific databases simplify access to data and information, but a large body of knowledge remains within the published literature and is therefore difficult to access and leverage at scale in scientific workflows. Recent advances in machine reading and learning approaches to converting unstructured text, tables, and figures into structured knowledge bases are promising, but these software tools cannot be deployed for scientific research purposes without access to new and old publications and computing resources. Automation of such approaches is also necessary in order to keep pace with the ever-growing scientific literature. GeoDeepDive bridges the gap between scientists needing to locate and extract information from large numbers of publications and the millions of documents that are distributed by multiple different publishers every year. As of August 2018, GeoDeepDive (GDD) had ingested over 7.4 million full-text documents from multiple commercial, professional society, and open-access publishers. In accordance with GDD-negotiated publisher agreements, original documents and citation metadata are stored locally and prepared for common data mining activities by running software tools that parse and annotate their contents linguistically (natural language processing) and visually (optical character recognition). Vocabularies of terms in domain-specific databases can be labeled throughout the full-text of documents, with results exposed to users via an API. New vocabularies and versions of parsing and annotation tools can be deployed rapidly across all original documents using the distributed computing capacities provided by HTCondor. Downloading, storing, and pre-processing original PDF content from distributed publishers and making these data products available to user applications provides new mechanisms for discovering and using information in publications, augmenting existing databases with new information, and reducing time-to-science.

Subject Keywords

Deleting all keywords will set the resource sharing status to private.

Content

Learn more about the BagIt download

Select a file to see file type metadata.

Related Resources

This resource belongs to the following collections:

Title	Owners	Sharing Status	My Permission
GSA 2018 Pardee: Earth as a Big Data Puzzle: Advancing Information Frontiers in Geoscience	Leslie Hsu	Public & Shareable	Open Access

How to Cite

E., P. S. (2026). Peters - GEODEEPDIVE: AUTOMATING THE LOCATION AND EXTRACTION OF DATA AND INFORMATION FROM DIGITAL PUBLICATIONS, HydroShare, http://www.hydroshare.org/resource/5b4038a534b74404864ceff2ea933147

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/

Comments

There are currently no comments

Notifications (${tasks.length})