Jess Joan Goddard

SimpleLab | Chief Science Officer

 Recent Activity

ABSTRACT:

This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 93% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

Population coverage rates per Tier, for systems with population reported:
- Tier 1: 49.3% population covered (155,869,771 people)
- Tier 2: 35.13% population covered (111,074,087 people)
- Tier 3: 12.9% population covered (40,771,645 people)

Active community water systems coverage rates per Tier:
- Tier 1: 35.7% system covered (17645 systems)
- Tier 2: 22.42% system covered (11079 systems)
- Tier 3: 34.9% system covered (17249 systems)
- No Tier/Geometry: 6.98% system covered (3451 systems)

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

Changelog:
# 3.0.0 (2022-10-31)
* Adding manually-contributed systems from the Internet of Water's Github: https://github.com/cgs-earth/ref_pws/raw/main/02_output/contributed_pws.gpkg
* Refactored to use geopackage through most of pipeline instead of geojson
* Added `geometry_source_detail` column to, where possible, include notes provided by the data sources themselves about how the geometry was sourced

Show More

ABSTRACT:

This is a layer of water service boundaries for 46,014 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

Population coverage rates per Tier, for systems with population reported:
- Tier 1: 45.6% population covered (140,302,401 people)
- Tier 2: 39.98% population covered (123,028,626 people)
- Tier 3: 14.42% population covered (44,372,326 people)

Active community water systems coverage rates per Tier:
- Tier 1: 35.61% system covered (17600 systems)
- Tier 2: 22.49% system covered (11117 systems)
- Tier 3: 35% system covered (17297 systems)
- No Tier/Geometry: 6.9% system covered (3410 systems)

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

Show More

ABSTRACT:

This is a layer of water service boundaries for 44,786 community water systems that deliver tap water to 307.1 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. In v1.0.0, Tier 2b reflects overlapping boundaries for multiple systems. In v2.0.0 Tier 2b is removed through a "best match" algorithm that assigns one water system to one TIGER place. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system in v2.0.0 requires further validation. Many systems were then assigned to Tier 3. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Future iterations plan to improve upon geometry quality for modeled systems. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

Show More

ABSTRACT:

This is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

Show More
Resources
All 0
Collection 0
Resource 0
App Connector 0
Resource Resource
U.S. Community Water Systems Service Boundaries, v1.0.0
Created: May 2, 2022, 5:51 p.m.
Authors: ·

ABSTRACT:

This is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

Show More
Resource Resource
U.S. Community Water Systems Service Boundaries, v2.0.0
Created: July 1, 2022, 8:01 p.m.
Authors: ·

ABSTRACT:

This is a layer of water service boundaries for 44,786 community water systems that deliver tap water to 307.1 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. In v1.0.0, Tier 2b reflects overlapping boundaries for multiple systems. In v2.0.0 Tier 2b is removed through a "best match" algorithm that assigns one water system to one TIGER place. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system in v2.0.0 requires further validation. Many systems were then assigned to Tier 3. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Future iterations plan to improve upon geometry quality for modeled systems. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.

Show More
Resource Resource
U.S. Community Water Systems Service Boundaries, v2.4.0
Created: Sept. 2, 2022, 6:25 a.m.
Authors: ·

ABSTRACT:

This is a layer of water service boundaries for 46,014 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 91% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

Population coverage rates per Tier, for systems with population reported:
- Tier 1: 45.6% population covered (140,302,401 people)
- Tier 2: 39.98% population covered (123,028,626 people)
- Tier 3: 14.42% population covered (44,372,326 people)

Active community water systems coverage rates per Tier:
- Tier 1: 35.61% system covered (17600 systems)
- Tier 2: 22.49% system covered (11117 systems)
- Tier 3: 35% system covered (17297 systems)
- No Tier/Geometry: 6.9% system covered (3410 systems)

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

Show More
Resource Resource
U.S. Community Water Systems Service Boundaries, v3.0.0
Created: Nov. 1, 2022, 12:53 a.m.
Authors: ·

ABSTRACT:

This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US. This amounts to 97% of the population reportedly served by active community water systems and 93% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When multiple water systems match to the same TIGER boundary, we employ a "best match" algorithm that assigns one water system to one TIGER place based on features like population served and other locational information about the water system. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3). Water system centroids are taken from the ECHO database; however, where a system centroid is labeled as a county or state centroid, we take several steps to assign a better centroid (using sources like UCMR or TIGER). A summary of the systems and population assigned to different tiers is as follows:

Population coverage rates per Tier, for systems with population reported:
- Tier 1: 49.3% population covered (155,869,771 people)
- Tier 2: 35.13% population covered (111,074,087 people)
- Tier 3: 12.9% population covered (40,771,645 people)

Active community water systems coverage rates per Tier:
- Tier 1: 35.7% system covered (17645 systems)
- Tier 2: 22.42% system covered (11079 systems)
- Tier 3: 34.9% system covered (17249 systems)
- No Tier/Geometry: 6.98% system covered (3451 systems)

Several limitations to this data exist–and the layer should be used with these in mind. The case of assigning a Census Place TIGER polygon to the "best match" water system first introduced in v2.0.0 requires further validation. Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Since v 2.0.0 we have improved the percentage of Tier 3 geometries with state centroids and county centroids from 50% of Tier 3 boundaries to 30% of Tier 3 boundaries. Missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

Changelog:
# 3.0.0 (2022-10-31)
* Adding manually-contributed systems from the Internet of Water's Github: https://github.com/cgs-earth/ref_pws/raw/main/02_output/contributed_pws.gpkg
* Refactored to use geopackage through most of pipeline instead of geojson
* Added `geometry_source_detail` column to, where possible, include notes provided by the data sources themselves about how the geometry was sourced

Show More