Data Sources#

Watershed Workflow stores a library of sources, which provide functionality to access data as if it was local. Given appropriate bounds (spatial and/or temporal), the sources typically use REST-APIs or other web-based services to locate, download, unzip, and file datasets, which are then stored indefinitely for future use. These datasets are stored in a local data store whose location is specified in the Package Configuration file.

The following sections lay out the sources subpackage, which is simply a way of getting and working with default sources, and the broad classes of sources frequently used in workflows.

watershed_workflow.sources.manager_shapefile.ManagerShapefile

A simple class for reading shapefiles.

watershed_workflow.sources.manager_raster.ManagerRaster

A simple class for reading rasters.

watershed_workflow.sources.manager_wbd.ManagerWBD

Leverages pygeohydro to download WBD data.

watershed_workflow.sources.manager_nhd.ManagerNHD

Leverages pynhd to download NHD data and its supporting shapes.

watershed_workflow.sources.manager_3dep.Manager3DEP

3D Elevation Program (3DEP) data manager.

watershed_workflow.sources.manager_nrcs.ManagerNRCS

The National Resources Conservation Service's SSURGO Database [NRCS] contains a huge amount of information about soil texture, parameters, and structure, and are provided as shape files containing soil type delineations with map-unit-keys (MUKEYs).

watershed_workflow.sources.manager_glhymps.ManagerGLHYMPS

The [GLHYMPS] global hydrogeology map provides global values of a two-layer (unconsolidated, consolidated) structure.

watershed_workflow.sources.manager_soilgrids_2017.ManagerSoilGrids2017

SoilGrids 250m (2017) datasets.

watershed_workflow.sources.manager_pelletier_dtb.ManagerPelletierDTB

The [PelletierDTB] global soil regolith sediment map provides global values of depth to bedrock at a 1km spatial resolution.

watershed_workflow.sources.manager_nlcd.ManagerNLCD

National Land Cover Database manager for single-year snapshots.

watershed_workflow.sources.manager_daymet.ManagerDaymet

Daymet meterological datasets.

watershed_workflow.sources.manager_aorc.ManagerAORC

AORC dataset.

watershed_workflow.sources.manager_modis_appeears.ManagerMODISAppEEARS

MODIS data through the AppEEARS data portal.

Source List#

Most users will access sources through the dictionaries of types of sources created here. In particular, getDefaultSources() will be the standard starting point.

This module provides a dictionary of sources, broken out by data type, and a dictionary of default sources.

These dictionaries are provided as module-local (singleton) variables.

  • huc_sources : A dictionary of sources that provide USGS HUC boundaries.

  • hydrography_sources : A dictionary of sources that provide river reaches by HUC.

  • dem_sources : A dictionary of available digital elevation models.

  • soil_sources : A dictionary of available sources for soil properties.

  • land_cover_sources : A dictionary of available land cover datasets.

#

watershed_workflow.sources.getDefaultSources() Dict[str, Any][source]#

Provides a default set of data sources.

Returns a dictionary with default sources for each type.

watershed_workflow.sources.getSources(args) Dict[str, Any][source]#

Parsers the command line argument struct from argparse and provides an updated set of data sources.

Parameters:

args (struct) – A python struct generated from an argparse.ArgumentParser object with source options set by watershed_workflow.ui.*_source_options

Returns:

sources – Dictionary of defaults for each of “HUC”, “hydrography”, “DEM”, “soil type”, and “land cover”.

Return type:

dict

watershed_workflow.sources.logSources(sources: Dict[str, Any]) None[source]#

Pretty print source dictionary to log.

Watershed boundaries and hydrography#

Watershed boundary datasets and hydrography datasets together form the geographic structure of a watershed. Watershed boundary datasets are typically formed through analysis of elevation datasets, collecting within the same watershed all parts of the land surface which drain to a common river outlet. Watersheds are hierarchical, ranging in scale from small primary watersheds which drain into first order streams to full river basins which drain into an ocean. In the United States, the USGS formally calculates hydrologic units and identifies them using Hydrologic Unit Codes, or HUCs, which respect this hierarchy. HUC 2 regions (e.g. the Upper Colorado River or the Tennessee River Basin) are the largest in areal extent, while HUC 12s, or sub-watersheds, are the smallest, representing on the order of 100 square kilometers. Watershed Workflow uses HUCs as an organizing unit for working with data, primarily because most datasets in the US are organized by the HUC, but also because they form physically useful domains for simulation.

Hydrography datasets provide surveys of river networks, which form the drainage network of watersheds and are where most of the fast-time scale dynamics occur. Some hydrologic models (for instance river routing models, dam operations management models, and many flood models) directly use the river network as their simulation domain, while others (for instance the class of integrated, distributed models described here) can use the river network to refine meshes near the rivers and therefore improve resolution where fast dynamics are occuring. Watershed boundary and Hydrography datasets are typically available as GIS shapefiles, where each watershed boundary or reach is represented as a shape.

Currently two ways of getting watershed boundaries are supported – USGS HUC delineations and user-provided shape files. Watershed boundaries read from shapefiles can use the shapefile manager.

class watershed_workflow.sources.manager_shapefile.ManagerShapefile(filename: str, url: str | None = None, id_name: str | None = None)[source]#

A simple class for reading shapefiles.

Parameters:
  • filename (str) – Path to the shapefile.

  • id_name (str, optional) – Name of the ID field in the shapefile.

class watershed_workflow.sources.manager_wbd.ManagerWBD(protocol_name: str = 'WBD')[source]#

Leverages pygeohydro to download WBD data.

getAll(level: int) GeoDataFrame[source]#

Download all HUCs at a given level.

Getting the reaches used to construct rivers is done as either shapefiles as above, or through NHD datasets, which include NHD Medium Resolution, NHD Medium Resolution v2.1 (preferred) and NHD High Res.

class watershed_workflow.sources.manager_nhd.ManagerNHD(dataset_name: str, layer: str | None = None, catchments: bool | None = True, fewer_columns: bool | None = True)[source]#

Leverages pynhd to download NHD data and its supporting shapes.

getCatchments(df: GeoDataFrame) GeoDataFrame[source]#

Add catchment data to flowline data.

Parameters:

df (gpd.GeoDataFrame) – GeoDataFrame with flowline data and ID column.

Returns:

GeoDataFrame with catchment data merged in.

Return type:

gpd.GeoDataFrame

Digital Elevation Models#

For any distributed, integrated hydrologic model, elevation datasets are critical. These set the local spatial gradients that drive flow in Richards and overload flow equations, and are necessary to form a mesh, whether structured or unstructured, for simulation. Elevation datasets are typically stored as raster images.

Then workflows can query the raster for interpolated elevations at points on a mesh, river, or other locations of interest. Internally, affine coordinate system transformations are hidden; the coordinate system of the requested points are mapped to that of the raster and interpolated. By default, piecewise bilinear interpolation is used to ensure that extremely high-resolution queries do not look stairstepped; this improves mesh quality in meshes near the resolution of the underlying elevation dataset.

class watershed_workflow.sources.manager_raster.ManagerRaster(filename: str, url: str | None = None, native_resolution: float | None = None, native_crs: CRS | None = None, bands: Iterable[str] | int | None = None)[source]#

A simple class for reading rasters.

class watershed_workflow.sources.manager_3dep.Manager3DEP(resolution: int)[source]#

3D Elevation Program (3DEP) data manager.

Provides access to USGS 3DEP elevation and derived products through the py3dep library. Supports multiple resolution options and various topographic layers including DEM, slope, aspect, and hillshade products.

Land Cover#

Land cover datasets set everything from impervious surfaces to plant function and therefore evaportranspiration, and are used in some integrated hydrologic models for a wide range of processes. Land cover is used to define a collection of indices on which mesh sets are generated and then used to generate and affect processes and process parameters. Additionally, leaf area index (LAI) is used frequently in determining potential evapotranspiration.

class watershed_workflow.sources.manager_nlcd.ManagerNLCD(location='L48', year=None)[source]#

National Land Cover Database manager for single-year snapshots.

Supports variables: cover, impervious, canopy, descriptor. Each manager instance represents a single year of NLCD data.

Parameters:
  • location (str, optional) – Location code (‘L48’, ‘AK’, ‘HI’, ‘PR’). Default ‘L48’.

  • year (int, optional) – NLCD data year. If None, uses most recent available for location.

  • https (.. [NLCD])

class watershed_workflow.sources.manager_modis_appeears.ManagerMODISAppEEARS(login_token: str | None = None)[source]#

MODIS data through the AppEEARS data portal.

Note this portal requires authentication – please enter a username and password in your .watershed_workflowrc file. For now, as this is not the highest security data portal or workflow package, we expect you to store this password in plaintext. Maybe we can improve this? If it bothers you, please ask how you can contribute (the developers of this package are not security experts!)

To enter the username and password, register for a login in the AppEEARs data portal at:

Currently the variables supported here include LAI and estimated ET.

All data returned includes a time variable, which is in units of [days past Jan 1, 2000, 0:00:00.

Note this is implemented based on the API documentation here:

class Request(request: Request, task_id: str = '', filenames: Dict[str, str] | None = None, urls: Dict[str, str] | None = None)[source]#

MODIS AppEEARS-specific request that includes Task information.

isReady(request: Request) bool[source]#

Check if MODIS data request is ready for download.

Overrides base class to check AppEEARS processing status and bundle availability.

Parameters:

request (ManagerDataset.Request) – MODIS request object with AppEEARS task information.

Returns:

True if data is ready for download, False otherwise.

Return type:

bool

Soil structure and properties#

Soil structure and hydrologic properties (i.e. porosity, permeability, water retention curves) are often derived from texture parameterizations. Similarly, depth to bedrock and other subsurface data can be essential in these types of simulations. Often these are mapped into the simulation mesh.

class watershed_workflow.sources.manager_nrcs.ManagerNRCS(force_download: bool = False)[source]#

The National Resources Conservation Service’s SSURGO Database [NRCS] contains a huge amount of information about soil texture, parameters, and structure, and are provided as shape files containing soil type delineations with map-unit-keys (MUKEYs). These are re-broadcast onto a raster (much like gSSURGO, which is unfortunately not readable by open tools) and used to index soil parameterizations for simulation.

Data is accessed via two web APIs – the first for spatial (shapefiles) survey information, the second for properties.

TODO: Functionality for mapping from MUKEY to soil parameters.

[NRCS] (1,2)

Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture. Web Soil Survey. Available online at https://websoilsurvey.nrcs.usda.gov/. Accessed [month/day/year].

class watershed_workflow.sources.manager_glhymps.ManagerGLHYMPS(filename=None)[source]#

The [GLHYMPS] global hydrogeology map provides global values of a two-layer (unconsolidated, consolidated) structure.

Note

GLHYMPS does not have an API, and is a large (~4GB) download. Download the file from the below citation DOI and unzip the file into:

<data_directory>/soil_structure/GLHYMPS/

which should yield GLHYMPS.shp (amongst other files).

[GLHYMPS] (1,2)

Huscroft, J.; Gleeson, T.; Hartmann, J.; Börker, J., 2018, “Compiling and mapping global permeability of the unconsolidated and consolidated Earth: GLobal HYdrogeology MaPS 2.0 (GLHYMPS 2.0). [Supporting Data]”, https://doi.org/10.5683/SP2/TTJNIU, Scholars Portal Dataverse, V1

class watershed_workflow.sources.manager_pelletier_dtb.ManagerPelletierDTB(filename=None)[source]#

The [PelletierDTB] global soil regolith sediment map provides global values of depth to bedrock at a 1km spatial resolution.

Note

Pelletier DTB is served through ORNL’s DAAC, does not have an API, and is a large (~1GB) download. Download the file from the below citation DOI and unzip the file into:

<data_directory>/soil_structure/PelletierDTB/

which should yield a set of tif files,

Global_Soil_Regolith_Sediment_1304/data/*.tif

[PelletierDTB] (1,2)

Pelletier, J.D., P.D. Broxton, P. Hazenberg, X. Zeng, P.A. Troch, G. Niu, Z.C. Williams, M.A. Brunke, and D. Gochis. 2016. Global 1-km Gridded Thickness of Soil, Regolith, and Sedimentary Deposit Layers. ORNL DAAC, Oak Ridge, Tennessee, USA. http://dx.doi.org/10.3334/ORNLDAAC/1304

class watershed_workflow.sources.manager_soilgrids_2017.ManagerSoilGrids2017(variant: str | None = None)[source]#

SoilGrids 250m (2017) datasets.

SoilGrids 2017 maintains, to date, the only complete characterization of all soil properties needed for a hydrologic model. The resolution is decent, and the accuracy is ok, but most importantly it is complete.

[hengl2014soilgrids]

Hengl, Tomislav, et al. “SoilGrids1km—global soil information based on automated mapping.” PloS one 9.8 (2014): e105992.

[hengl2017soilgrids]

Hengl, Tomislav, et al. “SoilGrids250m: Global gridded soil information based on machine learning.” PLoS one 12.2 (2017): e0169748.

See the above link for a complete listing of potential variable names; included here are a subset used by this code. That said, any 2017 filename can be used with this source manager.

Variables available with layer information:

  • BLDFIE_layer_1 through BLDFIE_layer_7: Bulk density of fine earth [kg m^-3]

  • CLYPPT_layer_1 through CLYPPT_layer_7: Percent clay [%]

  • SLTPPT_layer_1 through SLTPPT_layer_7: Percent silt [%]

  • SNDPPT_layer_1 through SNDPPT_layer_7: Percent sand [%]

  • WWP_layer_1 through WWP_layer_7: Soil water capacity % at wilting point [%]

  • BDTICM: Absolute depth to continuous, unfractured bedrock [cm]

Meteorology#

Meteorological data is used for forcing hydrologic models. Note that we keep DayMet here, but it is currently deprecated and unusable due to the NASA DAAC THREDDS API being down indefinitely. Use AORC instead.

class watershed_workflow.sources.manager_aorc.ManagerAORC[source]#

AORC dataset.

Explore the Analysis Of Record for Calibration (AORC) version 1.1 data

https://registry.opendata.aws/noaa-nws-aorc/

Using Xarray, Dask and hvPlot to explore the AORC version 1.1 data. We read from a cloud-optimized Zarr dataset that is part of the NOAA Open Data Dissemination (NODD) program and we use a Dask cluster to parallelize the computation and reading of data chunks.

AORC variables available to use: - APCP_surface - DLWRF_surface - DSWRF_surface - PRES_surface - SPFH_2maboveground - TMP_2maboveground - UGRD_10maboveground - VGRD_10maboveground

There are eight variables representing the meteorological conditions

  • Total Precipitaion (APCP_surface): Hourly total precipitation (kgm-2 or mm) for Calibration (AORC) dataset

  • Air Temperature (TMP_2maboveground): Temperature (at 2 m above-ground-level (AGL)) (K)

  • Specific Humidity (SPFH_2maboveground): Specific humidity (at 2 m AGL) (g g-1)

  • Downward Long-Wave Radiation Flux (DLWRF_surface): (1) longwave (infrared) and (2) radiation flux (at the surface) (W m-2)

  • Downward Short-Wave Radiation Flux (DSWRF_surface): (1) Downward shortwave (solar) and (2) radiation flux (at the surface) (W m-2)

  • Pressure (PRES_surface): Air pressure (at the surface) (Pa)

  • U-Component of Wind (UGRD_10maboveground): U (west-east) - components of the wind (at 10 m AGL) (m s-1)

  • V-Component of Wind (VGRD_10maboveground): V (south-north) - components of the wind (at 10 m AGL) (m s-1)

Precipitation and Temperature

The gridded AORC precipitation dataset contains one-hour Accumulated Surface Precipitation (APCP) ending at the “top” of each hour, in liquid water-equivalent units (kg m-2 to the nearest 0.1 kg m-2), while the gridded AORC temperature dataset is comprised of instantaneous, 2 m above-ground-level (AGL) temperatures at the top of each hour (in Kelvin, to the nearest 0.1).

Specific Humidity, Pressure, Downward Radiation, Wind

The development process for the six additional dataset components of the Conus AORC [i.e., specific humidity at 2m above ground (kg kg-1); downward longwave and shortwave radiation fluxes at the surface (W m-2); terrain-level pressure (Pa); and west-east and south-north wind components at 10 m above ground (m s-1)] has two distinct periods, based on datasets and methodology applied: 1979–2015 and 2016–present.

class Request(request: Request, filename: str = '')[source]#

AORC-specific request that includes filename for cached data.

class watershed_workflow.sources.manager_daymet.ManagerDaymet[source]#

Daymet meterological datasets.

Daymet is a historic, spatially interpolated product which ingests large number of point-sources of meterological data, aggregates them to daily time series, and spatially interpolates them onto a 1km gridded product that covers all of North America from 1980 to present [Daymet].

Variable names and descriptions

name

units

description

prcp

\(mm / day\)

Total daily precipitation

tmin, tmax

\(^\circ C\)

Min/max daily air temperature

srad

\(W / m^2\)

Incoming solar radiation - per DAYLIT time!

vp

\(Pa\)

Vapor pressure

swe

\(Kg / m^2\)

Snow water equivalent

dayl

\(s / day\)

Duration of sunlight

class Request(manager: ManagerDataset, is_ready: bool, geometry: Polygon, start: datetime, end: datetime, variables: List, out_crs: CRS | None = None, resampling: str | None = None, bounds: list = None, start_year: int = None, end_year: int = None)[source]#

DayMet-specific request that adds download information.