Data Sources
Watershed Workflow stores a library of sources, which provide functionality to access data as if it was local. Given appropriate bounds (spatial and/or temporal), the sources typically use REST-APIs or other web-based services to locate, download, unzip, and file datasets, which are then stored indefinitely for future use. These datasets are stored in a local data store whose location is specified in the Package configuration file.
The following sections lay out the source list, which is simply a way of getting and working with default sources, and the broad classes of sources frequently used in workflows.
Manager for interacting with USGS National Hydrography Datasets. |
|
Watershed Workflow leverages the USGS's National Elevation Dataset (NED), a precursor to and currently part of the USGS's 3D Elevation Program (3DEP) [NED]. |
|
National Land Cover Database provides a raster for indexed land cover types [NLCD]. |
|
|
MODIS data through the AppEEARS data portal. |
The National Resources Conservation Service's SSURGO Database [NRCS] contains a huge amount of information about soil texture, parameters, and structure, and are provided as shape files containing soil type delineations with map-unit-keys (MUKEYs). |
|
|
The [GLHYMPS] global hydrogeology map provides global values of a two-layer (unconsolidated, consolidated) structure. |
|
SoilGrids 250m (2017) datasets. |
Daymet meterological datasets. |
|
A simple class for reading rasters. |
|
A simple class for reading shapefiles. |
Source List
This module provides a dictionary of sources, broken out by data type, and a dictionary of default sources.
These dictionaries are provided as module-local (singleton) variables.
huc_sources : A dictionary of sources that provide USGS HUC boundaries.
hydrography_sources : A dictionary of sources that provide river reaches by HUC.
dem_sources : A dictionary of available digital elevation models.
soil_sources : A dictionary of available sources for soil properties.
land_cover_sources : A dictionary of available land cover datasets.
- watershed_workflow.source_list.get_default_sources()[source]
Provides a default set of data sources.
Returns a dictionary with default sources for each type.
- watershed_workflow.source_list.get_sources(args)[source]
Parsers the command line argument struct from argparse and provides an updated set of data sources.
- Parameters:
args (struct) – A python struct generated from an argparse.ArgumentParser object with source options set by watershed_workflow.ui.*_source_options
- Returns:
sources – Dictionary of defaults for each of “HUC”, “hydrography”, “DEM”, “soil type”, and “land cover”.
- Return type:
dict
Implementing a new data source for an existing type of data should follow the API for existing implementations. This makes it easy to use it with the existing high level API. See the Sources API for how managers are used within the API.
Watershed boundaries and hydrography
Watershed boundary datasets and hydrography datasets together form the geographic structure of a watershed. Watershed boundary datasets are typically formed through analysis of elevation datasets, collecting within the same watershed all parts of the land surface which drain to a common river outlet. Watersheds are hierarchical, ranging in scale from small primary watersheds which drain into first order streams to full river basins which drain into an ocean. In the United States, the USGS formally calculates hydrologic units and identifies them using Hydrologic Unit Codes, or HUCs, which respect this hierarchy. HUC 2 regions (e.g. the Upper Colorado River or the Tennessee River Basin) are the largest in areal extent, while HUC 12s, or sub-watersheds, are the smallest, representing on the order of 100 square kilometers. Watershed Workflow uses HUCs as an organizing unit for working with data, primarily because most datasets in the US are organized by the HUC, but also because they form physically useful domains for simulation.
Hydrography datasets provide surveys of river networks, which form the drainage network of watersheds and are where most of the fast-time scale dynamics occur. Some hydrologic models (for instance river routing models, dam operations management models, and many flood models) directly use the river network as their simulation domain, while others (for instance the class of integrated, distributed models described here) can use the river network to refine meshes near the rivers and therefore improve resolution where fast dynamics are occuring. Watershed boundary and Hydrography datasets are typically available as GIS shapefiles, where each watershed boundary or reach is represented as a shape.
Currently two ways of getting watershed boundaries are supported – USGS HUC delineations and user-provided shape files. Watershed boundaries read from shapefiles can use the Generic shapefiles manager.
- class watershed_workflow.sources.manager_nhd._FileManagerNHD(name: str, file_level: int, lowest_level: int, name_manager)[source]
Manager for interacting with USGS National Hydrography Datasets.
Note that this includes NHD, NHDPlus, and WBD – this class should not be used directly but instead use one of the derived classes:
manager_nhd.FileManagerNHD (Hi Res)
manager_nhd.FileManagerNHDPlus (Hi Res)
manager_nhd.FileManagerWBD
Watershed Workflow leverages the Watershed Boundary Dataset (WBD) and the National Hydrography Dataset (NHD), USGS and EPA datasets available at multiple resolutions to represent United States watersheds, including Alaska [NHD]. Also used is the NHD Plus dataset, an augmented dataset built on watershed boundaries and elevation products. By default, the 1:100,000 High Resolution datasets are used. Data is discovered through The National Map’s [TNM] REST API, which allows querying for data files organized by HUC and resolution via HTTP POST requests, providing direct-download URLs. Files are downloaded on first request, unzipped, and stored in the data library for future use. Currently, files are indexed by 2-digit (WBD), 4-digit (NHD Plus HR) and 8-digit (NHD) HUCs.
- get_huc(huc, force_download=False, exclude_hu_types=None)[source]
Get the specified HUC in its native CRS.
- Parameters:
huc (int or str) – The USGS Hydrologic Unit Code
force_download (bool, optional) – If true, delete any file and redownload.
exclude_hu_types (list[str], optional) – List of HUtypes to exclude. Likely this is None or [‘W’,] to exclude water HUCs for e.g. a bay, great lake, or ocean.
- Returns:
profile (dict) – The fiona shapefile profile (see Fiona documentation).
hu (dict) – Fiona shape object representing the hydrologic unit.
Note this finds and downloads files as needed.
- get_hucs(huc, level, force_download=False, exclude_hu_types=None)[source]
Get all sub-catchments of a given HUC level within a given HUC.
- Parameters:
huc (int or str) – The USGS Hydrologic Unit Code
level (int) – Level of requested sub-catchments. Must be larger or equal to the level of the input huc.
force_download (bool) – Download or re-download the file if true.
exclude_hu_types (list[str]) – List of HUtypes to exclude. Likely this is None or [‘W’,] to exclude water HUCs for e.g. a bay, great lake, or ocean.
- Returns:
profile (dict) – The fiona shapefile profile (see Fiona documentation).
hus (list(dict)) – List of fiona shape objects representing the hydrologic units.
Note this finds and downloads files as needed.
- get_hydro(huc, bounds=None, bounds_crs=None, in_network=True, properties=None, include_catchments=False, force_download=False)[source]
Get all reaches within a given HUC and/or coordinate bounds.
- Parameters:
huc (int or str) – The USGS Hydrologic Unit Code
bounds ([xmin, ymin, xmax, ymax], optional) – Coordinate bounds to filter reaches returned. If this is provided, bounds_crs must also be provided.
bounds_crs (CRS, optional) – CRS of the above bounds.
in_network (bool, optional) – If True (default), remove reaches that are not “in” the NHD network
properties (list(str) or bool, optional) –
A list of property aliases to be added to reaches. See alias names in Table 16 (NHDPlusFlowlineVAA) or 17 (NHDPlusEROMMA) of NHDPlus User Guide). This is only supported for NHDPlus. Commonly used properties include:
’TotalDrainageAreaKmSq’ : total drainage area
’CatchmentAreaKmSq’ : differential catchment contributing area
’HydrologicSequence’ : VAA sequence information
’DownstreamMainPathHydroSeq’ : VAA sequence information
’UpstreamMainPathHydroSeq’ : VAA sequence information
’catchment’ : catchment polygon geometry
If bool is provided and the value is True, a standard default set of VAA and EROMMA attributes are added as properties.
include_catchments (bool, optional) – If True, adds catchment polygons for each reach in the river tree from ‘NHDPlusCatchment’ layer
force_download (bool Download) – or re-download the file if true.
- Returns:
profile (dict) – The fiona shapefile profile (see Fiona documentation).
reaches (list(dict)) – List of fiona shape objects representing the stream reaches.
Note this finds and downloads files as needed.
Digital Elevation Models
For any distributed, integrated hydrologic model, elevation datasets are critical. These set the local spatial gradients that drive flow in Richards and overload flow equations, and are necessary to form a mesh, whether structured or unstructured, for simulation. Elevation datasets are typically stored as raster images.
Then workflows can query the raster for interpolated elevations at points on a mesh, river, or other locations of interest. Internally, affine coordinate system transformations are hidden; the coordinate system of the requested points are mapped to that of the raster and interpolated. By default, piecewise bilinear interpolation is used to ensure that extremely high-resolution queries do not look stairstepped; this improves mesh quality in meshes near the resolution of the underlying elevation dataset.
- class watershed_workflow.sources.manager_ned.FileManagerNED(resolution='1/3 arc-second', file_format='GeoTIFF')[source]
Watershed Workflow leverages the USGS’s National Elevation Dataset (NED), a precursor to and currently part of the USGS’s 3D Elevation Program (3DEP) [NED]. It is available seamlessly at a variety of resolutions ranging from 2 arc-seconds to 1/3 arc-seconds (~60m and 10m, respectively) in the conterminous United States and comparable resolution through most of Alaska. Like the NHD data, these datasets are available through The National Map’s [TNM] REST API, and are provided in 1-degree tiles. Watershed Workflow manages querying for URLs, downloading these tiles on demand, and forming the mosaic of images through underlying capability in rasterio to provide a single raster across the watershed requested. Higher resolution products, including LiDAR products across the conterminous US and IfSAR products across Alaska are coming available, but these are not currently supported by Watershed Workflow.
- Parameters:
resolution (str, optional) – Resolution of the desired product. One of: * “1/3 arc-second” (default) * “1 arc-second”
file_format (str, optional) – Desired output format. Default and universally available is “IMG”.
- get_raster(shape, crs, force_download=False)[source]
Download and read a DEM for this shape, clipping to the shape.
- Parameters:
shape (fiona or shapely shape) – Shape to provide bounds of the raster.
crs (CRS) – CRS of the shape.
force_download (bool) – Download or re-download the file if true.
- Returns:
profile (rasterio profile) – Profile of the raster.
raster (np.ndarray) – Array containing the elevation data.
Note that the raster provided is in its native CRS (which is in the
rasterio profile), not the shape’s CRS.
Land Cover
Land cover datasets set everything from impervious surfaces to plant function and therefore evaportranspiration, and are used in some integrated hydrologic models for a wide range of processes. Land cover is used to define a collection of indices on which mesh sets are generated and then used to generate and affect processes and process parameters. Additionally, leaf area index (LAI) is used frequently in determining potential evapotranspiration.
- class watershed_workflow.sources.manager_nlcd.FileManagerNLCD(layer='Land_Cover', year=None, location='L48', version='20210604')[source]
National Land Cover Database provides a raster for indexed land cover types [NLCD].
Note
NLCD does not provide an API for subsetting the data, so the first time this is used, it WILL result in a long download time as it grabs the big file. After that it will be much faster as the file is already local.
TODO: Labels and colors for these indices should get moved here, but currently reside in watershed_workflow.colors.
- Parameters:
layer (str, optional) – Layer of interest. Default is “land_cover”, should also be one for at least imperviousness, maybe others?
year (int, optional) – Year of dataset. Defaults to the most current available at the location.
location (str, optional) – Location code. Default is “L48” (lower 48), valid include “AK” (Alaska), “HI” (Hawaii, and “PR” (Puerto Rico).
https (.. [NLCD]) –
- get_raster(shply, crs, force_download=False)[source]
Download and read a DEM for this shape, clipping to the shape.
- Parameters:
shply (fiona or shapely shape) – Shape to provide bounds of the raster.
crs (CRS) – CRS of the shape.
force_download (bool, optional) – Download or re-download the file if true.
- Returns:
profile (rasterio profile) – Profile of the raster.
raster (np.ndarray) – Array containing the elevation data.
Note that the raster provided is in NLCD native CRS (which is in the
rasterio profile), not the shape’s CRS.
- class watershed_workflow.sources.manager_modis_appeears.FileManagerMODISAppEEARS(login_token=None, remove_leap_day=True)[source]
MODIS data through the AppEEARS data portal.
Note this portal requires authentication – please enter a username and password in your .watershed_workflowrc file. For now, as this is not the highest security data portal or workflow package, we expect you to store this password in plaintext. Maybe we can improve this? If it bothers you, please ask how you can contribute (the developers of this package are not security experts!)
To enter the username and password, register for a login in the AppEEARs data portal at:
Currently the variables supported here include LAI and estimated ET.
All data returned includes a time variable, which is in units of [days past Jan 1, 2000, 0:00:00.
Note this is implemented based on the API documentation here:
- get_data(polygon_or_bounds=None, crs=None, start=None, end=None, variables=None, force_download=False, task=None, filenames=None)[source]
Get dataset corresponding to MODIS data from the AppEEARS data portal.
Note that AppEEARS requires the constrution of a request, and then prepares the data for you. As a result, the raster may (if you’ve downloaded it previously, or it doesn’t take very long) or may not be ready instantly.
- Parameters:
polygon_or_bounds (fiona or shapely shape, or [xmin, ymin, xmax, ymax]) – Collect a file that covers this shape or bounds.
crs (CRS object) – Coordinate system of the above polygon_or_bounds
start (str or datetime.date object, optional) – Date for the beginning of the data, in YYYY-MM-DD. Valid is >= 2002-07-01.
end (str or datetime.date object, optional) – Date for the end of the data, in YYYY-MM-DD. Valid is <= 2020-12-30.
variables (str or list, optional) – Variable to download, currently one of {LAI, LULC}. Default is both LAI and LULC.
force_download (bool, optional) – Force a new file to be downloaded. Default is False.
task ((str, str) tuple of task_id, filename) – If a request has already been created, use this task to access the data rather than creating a new request. Default means to create a new request.
filenames (list of str, optional) – If a list of filenames is provided, use these rather than creating a new request.
- Returns:
dict ({ variable : (profile, times, data) }) – Returns a dictionary of (variable, data) pairs. For each variable, profile is a dictionary of standard raster profile information, times is an array of datetime objects of length NTIMES, and data is an array of shape (NTIMES, NX, NY) storing the actual values.
OR
task ((task_id, filename)) – If the data is not yet ready after the wait time, returns a task tuple for use in a future call to get_data().
Soil structure and properties
Soil structure and hydrologic properties (i.e. porosity, permeability, water retention curves) are often derived from texture parameterizations. Similarly, depth to bedrock and other subsurface data can be essential in these types of simulations. Often these are mapped into the simulation mesh.
- class watershed_workflow.sources.manager_nrcs.FileManagerNRCS[source]
The National Resources Conservation Service’s SSURGO Database [NRCS] contains a huge amount of information about soil texture, parameters, and structure, and are provided as shape files containing soil type delineations with map-unit-keys (MUKEYs). These are re-broadcast onto a raster (much like gSSURGO, which is unfortunately not readable by open tools) and used to index soil parameterizations for simulation.
Data is accessed via two web APIs – the first for spatial (shapefiles) survey information, the second for properties.
TODO: Functionality for mapping from MUKEY to soil parameters.
[NRCS] (1,2)Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture. Web Soil Survey. Available online at https://websoilsurvey.nrcs.usda.gov/. Accessed [month/day/year].
- get_shapes(bounds, bounds_crs, force_download=False)[source]
Downloads and reads soil shapefiles.
This accepts only a bounding box.
- Parameters:
bounds ([xmin, ymin, xmax, ymax]) – Bounding box to filter shapes.
crs (CRS) – Coordinate system of the bounding box.
force_download (bool) – Download or re-download the file if true.
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list) – List of fiona shapes that match the index or bounds.
- get_shapes_and_properties(shapes, crs, force_download=False, split_download=False)[source]
Downloads and reads soil shapefiles, and aggregates SSURGO data onto MUKEYS
Accepts either a bounding box, shape, or list of shapes.
- Parameters:
shapes (shply, list(shply), or [xmin, ymin, xmax, ymax]) – Shapes on which to run the query.
crs (CRS) – Coordinate system of the bounding box.
force_download (bool) – Download or re-download the file if true.
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list) – List of fiona shapes that match the index or bounds.
properties (pandas dataframe) – Dataframe of data by mukey = shape[‘id’]
- class watershed_workflow.sources.manager_glhymps.FileManagerGLHYMPS(filename=None)[source]
The [GLHYMPS] global hydrogeology map provides global values of a two-layer (unconsolidated, consolidated) structure.
Note
GLHYMPS does not have an API, and is a large (~4GB) download. Download the file from the below citation DOI and unzip the file into:
<data_directory>/soil_structure/GLHYMPS/
which should yield GLHYMPS.shp (amongst other files).
[GLHYMPS] (1,2)Huscroft, J.; Gleeson, T.; Hartmann, J.; Börker, J., 2018, “Compiling and mapping global permeability of the unconsolidated and consolidated Earth: GLobal HYdrogeology MaPS 2.0 (GLHYMPS 2.0). [Supporting Data]”, https://doi.org/10.5683/SP2/TTJNIU, Scholars Portal Dataverse, V1
- get_shapes(bounds, crs, force_download=None)[source]
Read the shapes in bounds provided by shape object.
- Parameters:
bounds (bounds tuple [x_min, y_min, x_max, y_max]) – bounds in which to find GLHYMPS shapes.
crs (CRS) – CRS of the bounds
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list) – List of fiona shapes that match the bounds.
- get_shapes_and_properties(bounds, crs, **kwargs)[source]
Read shapes and process properties.
- Parameters:
bounds (bounds tuple [x_min, y_min, x_max, y_max]) – bounds in which to find GLHYMPS shapes.
crs (CRS) – CRS of the bounds.
min_porosity (optional, double in [0,1]) – Some GLHYMPs formations have zero porosity, and this breaks most codes. This allows the user to set the minimum valid porosity. Defaults to 0.01 (1%).
max_permeability (optional, double > 0) – Some GLHYMPs formations (fractured bedrock?) have very high permeability, and this results in very slow runs. This allows the user to set a maximum valid permeability [m^2]. Defaults to inf.
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list) – List of fiona shapes that match the index or bounds.
properties (pandas dataframe) – Dataframe including geologic properties.
- class watershed_workflow.sources.manager_soilgrids_2017.FileManagerSoilGrids2017(variant=None)[source]
SoilGrids 250m (2017) datasets.
SoilGrids 2017 maintains, to date, the only complete characterization of all soil properties needed for a hydrologic model. The resolution is decent, and the accuracy is ok, but most importantly it is complete.
[SoilGrids2017][hengl2014soilgrids]Hengl, Tomislav, et al. “SoilGrids1km—global soil information based on automated mapping.” PloS one 9.8 (2014): e105992.
[hengl2017soilgrids]Hengl, Tomislav, et al. “SoilGrids250m: Global gridded soil information based on machine learning.” PLoS one 12.2 (2017): e0169748.
See the above link for a complete listing of potential variable names; included here are a subset used by this code. That said, any 2017 filename can be used with this source manager.
name
units
description
BDTICM
\(cm\)
Absolute depth to continuous, unfractured bedrock.
BLDFIE
\(kg m^-3\)
Bulk density of fine earth
CLYPPT
\(%\)
percent clay
SLTPPT
\(%\)
percent silt
SNDPPT
\(%\)
percent sand
WWP
\(%\)
Soil water capacity % at wilting point
- get_raster(shply, crs, variable, layer=None, force_download=False)[source]
Download and read a raster for this shape, clipping to the shape.
- Parameters:
shply (fiona or shapely shape or bounds) – Shape to provide bounds of the raster.
crs (CRS) – CRS of the shape.
variable (str) – The SoilGrids variable, see class-level documentation for choices.
layer (int, optional) – Soil layer, from 0 (top) to 7 (bottom). Only valid for vertically distributed quantities.
force_download (bool, optional) – Download or re-download the file if true.
- Returns:
profile (rasterio profile) – Profile of the raster.
raster (np.ndarray) – Array containing the elevation data.
Note that the raster provided is in SoilGrids native CRS
(which is in the rasterio profile), not the shape’s CRS.
Meteorology
Meteorological data is used for forcing hydrologic models.
- class watershed_workflow.sources.manager_daymet.FileManagerDaymet[source]
Daymet meterological datasets.
Daymet is a historic, spatially interpolated product which ingests large number of point-sources of meterological data, aggregates them to daily time series, and spatially interpolates them onto a 1km gridded product that covers all of North America from 1980 to present [Daymet].
Variable names and descriptions
name
units
description
prcp
\(mm / day\)
Total daily precipitation
tmin, tmax
\(^\circ C\)
Min/max daily air temperature
srad
\(W / m^2\)
Incoming solar radiation - per DAYLIT time!
vp
\(Pa\)
Vapor pressure
swe
\(Kg / m^2\)
Snow water equivalent
dayl
\(s / day\)
Duration of sunlight
- get_data(polygon_or_bounds, crs, start=None, end=None, variables=None, force_download=False, buffer=0.01)[source]
Gets file for a single year and single variable.
- Parameters:
polygon_or_bounds (fiona or shapely shape, or [xmin, ymin, xmax, ymax]) – Collect a file that covers this shape or bounds.
crs (CRS object) – Coordinate system of the above polygon_or_bounds
start (str or datetime.date object, optional) – Date for the beginning of the data, in YYYY-MM-DD. Valid is >= 2002-07-01.
end (str or datetime.date object, optional) – Date for the end of the data, in YYYY-MM-DD. Valid is < the current month (DayMet updates monthly.)
variables (str or list, optional) – Name the variables to download, see class-level documentation for choices. Default is [prcp,tmin,tmax,vp,srad].
force_download (bool) – Download or re-download the file if true.
buffer (float) – Buffer the bounds by this amount, in degrees. The default is 0.01.
- Returns:
Dataset object containing the met data.
- Return type:
datasets.Dataset
Generic Files
We also provide readers for user-provided rasters and shapefiles for generic use.
- class watershed_workflow.sources.manager_raster.FileManagerRaster(filename: str)[source]
A simple class for reading rasters.
- Parameters:
filename (str) – Path to the raster file.
- get_raster(shape, crs, band=1)[source]
Download and read a DEM for this shape, clipping to the shape.
- Parameters:
shape (fiona or shapely shape) – Shape to provide bounds of the raster.
crs (CRS) – CRS of the shape.
band (int,optional) – Default is 1, the first band (1-indexed).
- Returns:
profile (rasterio profile) – Profile of the raster.
raster (np.ndarray) – Array containing the elevation data.
Note that the raster provided is in its native CRS (which is in the
rasterio profile), not the shape’s CRS.
- class watershed_workflow.sources.manager_shape.FileManagerShape(filename: str)[source]
A simple class for reading shapefiles.
- Parameters:
filename (str) – Path to the shapefile.
- get_shape(*args, **kwargs)[source]
Read the file and filter to get shapes, then ensures there is only one match.
- Parameters:
get_shapes(). (See that of) –
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list(dict)) – List of fiona shapes that match the index or bounds.
- get_shapes(index_or_bounds=-1, crs=None)[source]
Read the file and filter to get shapes.
This accepts either an index, which is the integer index of the desired shape in the file, or a bounding box.
- Parameters:
index_or_bounds (int or [xmin, ymin, xmax, ymax]) – Index of the requested shape in filename, or bounding box to filter shapes, or defaults to -1 to get them all.
crs (crs-type) – Coordinate system of the bounding box (or None if index).
- Returns:
profile (dict) – Fiona profile of the shapefile.
shapes (list(dict)) – List of fiona shapes that match the index or bounds.