Data Structures and Shape Manipulation#

Several custom data structures are used in the manipulation of geometry to form a consistent mesh. Two are the most important: the SplitHUC object defines a set of polygons that partition the domain into sub-catchments of the full domain (e.g. HUC 12s in a a HUC8 domain, or differential contributing areas to a series of gages). The RiverTree object defines a tree data structure defined by a child-parent relationship where children of a reach are all reaches that flow into that reach.

SplitHUCs#

A module for working with multi-polys, a MultiLine that together forms a Polygon

class watershed_workflow.split_hucs.HandledCollection(*args)[source]#

A collection of of objects and handles for those objects.

Semantics of this are a bit odd – it is somewhat like a list and somewhat like a dict.

append(value)[source]#: Adds a object, returning a handle to that object

extend(values)[source]#: Add many objects, returning a list of handles.

handles()[source]#: Generator for handles

keys()[source]#: Generator for handles

pop(key)[source]#: Removes a handle and its object.

class watershed_workflow.split_hucs.SplitHUCs(df: GeoDataFrame, abs_tol: float = 1, rel_tol: float = 1e-05, exterior_outlet: Point | None = None)[source]#

Class for dealing with the multiple interacting views of HUCs

Parameters:

shapes (list[Polygon]) – The shapes to be split, one per subcatchment to be delineated.
abs_tol (float) – Distance used in defining small holes at intersections to ignore.
rel_tol (float) – Relative to the shapes area, a tolerance for defining small holes on intersections.
exterior_outlet (np.array((2,))) – Location of the outlet of the entire domain.
polygon_outlets (np.array((len(shapes), 2))) – Location of the outlets of each polygon.

The resulting class instance includes the following views into data:

linestringsHandledCollection[LineString]: unique list of all linestrings, a HandledCollection of LineStrings
boundariesHandledCollection[int]: A HandledCollection of handles into linestrings, identifying those linestrings on the outer boundary of the collection.
intersectionsHandledCollection[int]: A HandledCollection of handles into linestrings, identifying those linestrings on the shared, inner boundaries.
gonslist[HandledCollection[int], HandledCollection[int]]: One per polygon provided, a pair of HandledCollections, identifying the collection of handles into intersetctions and boudaries that make up thos polygon.

computePolygon(i: int) → Polygon[source]#: Construct polygon i and return a copy.

deepcopy()[source]#: Return a deep copy

explore(column: str = 'ID', m: Any | None = None, marker: str | None = None, name: str = 'watersheds', **kwargs)[source]#: Open a map!

property exterior#: Construct boundary polygon and return a copy.

plot(*args, **kwargs)[source]#: Plot as polygons (boundaries only).

plotAsLinestrings(*args, **kwargs)[source]#: Plot not as polygons, but individual linestrings.

polygons() → Iterable[Polygon][source]#: Iterate over the polygons.

spines()[source]#: Iterate over spines.

update()[source]#: Recomputes all polygons

watershed_workflow.split_hucs.findBiggest(list_of_shapes: Iterable[Polygon]) → Polygon[source]#: Finds the biggest (by area) polygon.

watershed_workflow.split_hucs.intersectAndSplit(list_of_shapes: Sequence[Polygon]) → Tuple[List[LineString], List[List[None | LineString]]][source]#

Given a list of shapes which share boundaries (i.e. they partition some space), return a compilation of their linestrings.

Parameters:

list_of_shapes (Sequence[shapely.geometry.Polygon]) – The polygons to intersect and split, of length N.

Returns:

uniques (list[None | shapely.geometry.LineString | shapely.geometry.MultiLineString]) – An N-length-list of the entities describing the exterior boundary.
intersections (list[list[None | shapely.geometry.LineString | shapely.geometry.MultiLineString]]) – An NxN list of lists of the entities describing the interior boundary.

watershed_workflow.split_hucs.partition(list_of_shapes: Sequence[Polygon], abs_tol: float = 1, rel_tol: float = 1e-05) → List[Polygon][source]#: Given a list of shapes which mostly share boundaries, make sure they partition the space. Often HUC boundaries have minor overlaps and underlaps – here we try to account for wiggles.

watershed_workflow.split_hucs.removeHoles(polygons: Iterable[Polygon], abs_tol: float = 1, rel_tol: float = 1e-05, remove_all_interior: bool = True) → Tuple[List[Polygon], List[Polygon]][source]#

Removes interior small holes between the boundaries of polygons.

Note this assumes the polygons are mostly disjoint.

watershed_workflow.split_hucs.simplify(hucs: SplitHUCs, tol=0.1) → None[source]#: Simplify, IN PLACE, all linestrings in the polygon representation.

RiverTree#

Module for working with tree data structures, built on watershed_workflow.tinytree

Note that this class knows how to work with the following properties, which are expected to be spelled this way if they exist. Only index and geometry MUST exist.

index: Must be the index of the DataFrame
geometryshapely.LineString: the river reach line
catchmentshapely.Polygon: the local contributing area to this reach
areadouble: area [m^2] of catchment, the local contributing area
hydroseqint: See documentation for NHDPlus
dnhydroseqint: See documentation for NHDPlus

class watershed_workflow.river_tree.River(index: int | str, df: GeoDataFrame, children: List[River] | None = None)[source]#

A tree structure whose node data is stored in a pandas DataFrame, accessed by an index.

ListType#: alias of _MySortedList

accumulate(to_accumulate: str, to_save: str | None = None, op: ~typing.Callable = <built-in function sum>)[source]#

Accumulates a property across the river tree.

Parameters:

to_accumulate (str) – Name of the property to accumulate from child nodes.
to_save (str, optional) – Name of the property to store the accumulated result in. If None, the result is not saved to the node.
op (Callable, optional) – Operation to use for accumulation. Defaults to sum.

Returns:

The accumulated value for this node and all its children.

Return type:

Any

addChild(child_or_index: River | str | int) → River[source]#: Append a child (upstream) reach to this reach.

property angle: float#: Returns the angle, in radians, from node.parent.linestring to node.linestring, in a clockwise sense.

appendCoordinate(xy: Tuple[float, float]) → None[source]#: Appends a coordinate at the end (downstream) of the linestring.

assignOrder() → None[source]#

Working from leaves to trunk, assign stream order property.

This method assigns stream order values to all reaches in the river network following the Strahler stream ordering system. Orders are calculated from leaf nodes (order 1) toward the trunk, where confluences of streams of equal order increment the order by 1.

classmethod constructRiversByDataFrame(df) → List[River][source]#

Create a list of rivers from a dataframe that includes a ‘parent’ column.

Parameters:: df (gpd.GeoDataFrame) – GeoDataFrame containing reach linestrings with parent-child relationships. Must contain PARENT and ID columns as defined in standard_names.
Returns:: List of River objects, each representing a river network tree.
Return type:: list[River]

classmethod constructRiversByGeometry(df, tol: float = 1e-07) → List[River][source]#

Forms a list of River trees from a list of reaches by looking for close endpoints of those reaches.

Parameters:

df (gpd.GeoDataFrame) – GeoDataFrame containing reach linestrings. Must have a ‘geometry’ column with LineString geometries.
tol (float, optional) – Geometric tolerance for matching reach endpoints to beginpoints. Defaults to _tol (1e-7).

Returns:

List of River objects, each representing a river network tree.

Return type:

list[River]

Note

This expects that endpoints of a reach coincide with beginpoints of their downstream reach, and does not work for cases where the junction is at a midpoint of a reach.

classmethod constructRiversByHydroseq(df) → List[River][source]#

Given a list of linestrings, create a list of rivers using the HydroSeq maps provided in NHDPlus datasets.

Parameters:: df (gpd.GeoDataFrame) – GeoDataFrame containing reach linestrings with NHDPlus attributes. Must contain columns for HYDROSEQ and DOWNSTREAM_HYDROSEQ as defined in watershed_workflow.sources.standard_names.
Returns:: List of River objects, each representing a river network tree.
Return type:: list[River]

copy(df: GeoDataFrame) → River[source]#: Shallow copy using a provided DataFrame

copySubtree() → River[source]#: Returns a deep copy rooted at self.

deepcopy() → River[source]#: Creates a deep copy of self

explore(column='ID', m=None, marker=None, name=None, **kwargs)[source]#

Open an interactive map using Folium.

Parameters:

column (str, optional) – Column name to use for coloring/styling the rivers. Defaults to the ID column.
m (folium.Map, optional) – Existing Folium map to add rivers to. If None, creates a new map.
marker (bool, optional) – Whether to add coordinate markers for each vertex. Defaults to None (no markers).
name (str, optional) – Name for the layer in the map. If None, attempts to use NAME or ID property.
**kwargs – Keyword arguments passed to geopandas.GeoDataFrame.explore(). See that function for available parameters like color, cmap, tooltip, popup, etc.

Returns:

Interactive map with the river network displayed.

Return type:

folium.Map

extendCoordinates(xys: List[Tuple[float, float]]) → None[source]#: Appends multiple coordinates at the end (downstream) of the linestring.

findNode(lambd: Callable) → River | None[source]#: Find a node, returning the first whose lambda application is true, or None

getNode(index: int | str) → River | None[source]#: return node for a given index

insertCoordinate(i: int, xy: Tuple[float, float]) → int[source]#

If it doesn’t already exist, inserts a new coordinate before the ith coordinate.

Returns the index of the new (or preexisting) coordinate.

insertCoordinateByArclen(s: float) → int[source]#

Inserts a new coordinate at a given arclen, returning the index of that coordinate.

Parameters:: s (float) – Arc length distance from the downstream end of the reach at which to insert the new coordinate. Must be between 0 and the total reach length.
Returns:: Index of the newly inserted coordinate in the linestring.
Return type:: int

Note

Arc length is measured from the downstream end of the reach.

isConsistent(tol: float = 1e-07) → bool[source]#: Validity checking of the tree.

isContinuous(tol: float = 1e-07) → bool[source]#

Checks geometric continuity of the river.

Confirms that all upstream children’s downstream coordinate coincides with self’s upstream coordinate.

isHydroseqConsistent() → bool[source]#: Confirms that hydrosequence is valid.

isLocallyContinuous(tol: float = 1e-07) → bool[source]#: Is this node continuous with its parent and children?

isLocallyMonotonic() → bool[source]#: Checks for monotonically decreasing elevation as we march downstream in this reach.

property linestring: LineString#: Returns the linestring geometry.

makeContinuous(tol: float = 1e-07) → None[source]#: Sometimes there can be small gaps between linestrings of river tree if river is constructed using hydroseq and Snap option is not used. Here we make them consistent.

merge(merge_reach: bool = True) → None[source]#

Merges this node with its parent.

Parameters:: merge_reach (bool, optional) – Whether to merge the linestring geometries. If True, combines the linestrings. If False, only merges properties. Defaults to True.

moveCoordinate(i: int, xy: Tuple[float, float] | Tuple[float, float, float]) → None[source]#: Moves the ith coordinate of self.linestring to a new location.

pathToRoot() → Generator[source]#: A generator for the nodes on the path to root, including this.

plot(*args, **kwargs)[source]#

Plot the rivers.

Parameters:

*args – Positional arguments passed to watershed_workflow.plot.linestringsWithCoords.
**kwargs – Keyword arguments passed to watershed_workflow.plot.linestringsWithCoords. See that function for available parameters.

Returns:

The plotting result from linestringsWithCoords.

Return type:

matplotlib figure or axes

popCoordinate(i: int) → Tuple[float, float][source]#: Removes the ith coordinate and returns its value.

prependCoordinates(xys: List[Tuple[float, float]]) → None[source]#: Prepends multiple coordinates at the beginning (upstream) of the linestring.

prune() → None[source]#

Removes this node and all below it, merging properties.

This method removes the entire subtree rooted at this node, but first merges all properties (like catchment areas) up to the parent node.

Raises:: ValueError – If called on a node with no parent (cannot prune the root).

resetDataFrame(force=False) → None[source]#

Resets the data frame for the river rooted at self, and reindexes the tree to a simple integer-based, preOrdered indexing.

This restricts the (shared) DataFrame to a subset of rows that are all in the river rooted at self.

split(i: int) → Tuple[River, River][source]#

Split the reach at the ith coordinate of the linestring.

Note that this does not split the catchment!

self becomes the downstream node, and is modified in-place to preserve the full tree if the trunk is the one being split.

Returns upstream_node, downstream_node.

splitAtArclen(s: float) → Tuple[River, River][source]#

Inserts a coordinate at arclen s, then splits at that coordinate.

Parameters:: s (float) – Arc length distance from the downstream end of the reach at which to split the reach. Must be between 0 and the total reach length.
Returns:: Tuple of (upstream_node, downstream_node) after splitting.
Return type:: Tuple[River, River]

to_crs(crs: CRS) → None[source]#: Warp the coordinate system.

to_dataframe() → GeoDataFrame[source]#: Represent as GeoDataFrame, useful for pickling.

to_file(filename: str, **kwargs) → None[source]#

Save the network for this river only to a geopandas file.

Note this file can be reloaded via:

$> watershed_workflow.river_tree.River.constructRiversByDataFrame(gpd.read_file(filename))

to_mls() → MultiLineString[source]#: Represent this as a shapely.geometry.MultiLineString

watershed_workflow.river_tree.accumulateCatchments(rivers: List[River], outlets: GeoDataFrame, reach_ID_column: str = 'reach_ID') → GeoDataFrame[source]#

Given a dataframe of outlets, compute contributing areas for each one.

Parameters:

rivers (list[River]) – Rivers from which outlet reaches are potentially from
outlets (gpd.GeoDataFrame) –
GeoDataFrame containing at least the following columns
- river_index : index into rivers indicating which river the outlet is on
- reach_index : index of the reach holding the outlet
- location_on_reach : indicator (0,1) of where on the reach is the outlet
Likely this is satisfied by calling determineOutletToReachMap()
reach_ID_column (str, optional) – Name of the column containing the reach ID. Defaults to ‘reach_ID’.

Returns:

An updated outlets GeoDataFrame including additionally:

catchment : polygon geometry of the contributing area to the outlet

Return type:

geopandas.GeoDataFrame

watershed_workflow.river_tree.accumulateIncrementalCatchments(rivers: List[River], outlets: GeoDataFrame) → GeoDataFrame[source]#

Given a list of outlet_indices, form the incremental contributing areas.

Parameters:

rivers (list[River]) – Rivers from which outlet reaches are potentially from
outlets (gpd.GeoDataFrame) –
GeoDataFrame containing at least the following columns
- river_index : index into rivers indicating which river the outlet is on
- reach_index : index of the reach holding the outlet
- location_on_reach : indicator (0,1) of where on the reach is the outlet

Returns:

An updated outlets GeoDataFrame including additionally:

incremental_catchment : polygon geometry of the contributing area to the outlet

Return type:

geopandas.GeoDataFrame

watershed_workflow.river_tree.combineSiblings(n1: River, n2: River, new_ls: LineString | None = None, ds: float | None = None) → River[source]#

Combines two sibling nodes, merging catchments and metadata.

Parameters:

n1 (River) – First sibling node to combine.
n2 (River) – Second sibling node to combine.
new_ls (shapely.geometry.LineString, optional) – Linestring geometry for the combined reach. If None, the geometry is computed by interpolating discrete nodes every ds meters.
ds (float, optional) – Distance between interpolated points when computing new geometry. Required if new_ls is None.

Returns:

The combined river node (n1 is modified and returned).

Return type:

River

Note

The resulting reach is either provided (by new_ls) or is computed by interpolating discrete nodes every ds.

watershed_workflow.river_tree.createRivers(reaches: GeoDataFrame, method: Literal['geometry', 'hydroseq', 'native'] = 'geometry', tol: float = 1e-07) → List[River][source]#

Constructs River objects from a list of reaches.

Parameters:

reaches (gpd.GeoDataFrame) – The reaches to turn into rivers.
method (str, optional) –
Provide the method for constructing rivers. Valid are:
- ’geometry’ looks at coincident coordinates
- ’hydroseq’ Valid only for NHDPlus data, this uses the NHDPlus VAA tables Hydrologic Sequence. If using this method, get_reaches() must have been called with both ‘hydroseq’ and ‘dnhydroseq’ properties requested (or properties=True).
- ’native’ Reads a natively dumped list of rivers.
tol (float, optional) – Defines what close is in the case of method == ‘geometry’. Defaults to _tol.

watershed_workflow.river_tree.determineOutletToReachMap(rivers: List[River], outlets: GeoDataFrame, reach_ID_column: str = 'reach_ID', measure_tol: float = 15) → GeoDataFrame[source]#

Given a list of rivers and a set of gages, find the reach in rivers and mark where on the reach to put the effective gage.

Parameters:

rivers (list[River]) – Rivers from which outlet reaches are potentially from
outlets (gpd.GeoDataFrame) –
GeoDataFrame containing at least the following columns
- reach_ID_column : ID of the reach on which the outlet lives
- measure : (if algorithm == ‘measure tol’) the % up the reach from downstream of the true location of the outlet
reach_ID_column (str) – Name of the column containing the reach ID.

Returns:

An updated outlets GeoDataFrame including additionally:

river_index : index into rivers indicating which river the outlet is on
reach_index : index of the reach holding the outlet
location_on_reach : an indicator function, 0 if the outlet should be approximated on the downstream end of the reach, 1 if it is on the upstream end.
true_geometry : the old geometry
geometry : the new location of the outlet

Return type:

geopandas.GeoDataFrame

watershed_workflow.river_tree.filterDivergences(rivers: List[River]) → List[River][source]#

Removes both diversions and braids.

Braids are divergences that return to the river network, and so look like branches of a river tree whose upstream entity is in the river (in another branch).

Diversions are divergences that do not return to the stream network, and so their upstream entity is in another river.

watershed_workflow.river_tree.filterDiversions(rivers: List[River]) → List[River][source]#: Filteres diversions, but not braids.

watershed_workflow.river_tree.filterSmallRivers(rivers: List[River], count: int) → List[River][source]#: Remove any rivers with fewer than count reaches.

watershed_workflow.river_tree.getNode(rivers, index) → River | None[source]#

Finds the node, by index, in a list of rivers.

Parameters:

rivers (list[River]) – List of River objects to search through.
index (int or str) – Index of the node to find.

Returns:

The River node with the specified index, or None if not found.

Return type:

River or None

watershed_workflow.river_tree.isClose(river1: River, river2: River, tol: float) → bool[source]#

Equivalence of rivers.

Parameters:

river1 (River) – First river to compare.
river2 (River) – Second river to compare.
tol (float) – Tolerance for geometric comparison.

Returns:

True if the rivers are equivalent within the given tolerance.

Return type:

bool

watershed_workflow.river_tree.mergeShortReaches(river: River, tol: float | None) → None[source]#

Remove inner branches that are short, combining branchpoints as needed.

This function merges the “short” linestring into the child linestring if it is a junction tributary with one child or into the parent linestring otherwise.

Parameters:

river (River) – The river network to process.
tol (float, optional) – Length threshold below which reaches will be merged. If None, the tolerance is taken from the reach property TARGET_SEGMENT_LENGTH.

watershed_workflow.river_tree.pruneByArea(river: River, area: float, prop: str = 'drainage_area_sqkm') → int[source]#

Removes, IN PLACE, reaches whose total contributing area is less than area km^2.

Parameters:

river (River) – The river network to prune.
area (float) – Area threshold in km^2. Reaches with contributing area below this value will be removed.
prop (str, optional) – Name of the property containing drainage area values. Defaults to DRAINAGE_AREA from standard_names.

Returns:

Number of reaches that were pruned.

Return type:

int

Note

This requires NHDPlus data to have been used and the drainage area property to have been set.

watershed_workflow.river_tree.pruneByLineStringLength(river: River, prune_tol: float | None = None) → int[source]#

Removes any leaf linestrings that are shorter than prune_tol.

Parameters:

river (River) – The river network to prune.
prune_tol (float, optional) – Length threshold below which leaf reaches will be removed. If None, uses the TARGET_SEGMENT_LENGTH property from each leaf.

Returns:

Number of reaches that were pruned.

Return type:

int

watershed_workflow.river_tree.pruneRiversByArea(rivers: List[River], area: float, prop: str = 'drainage_area_sqkm') → List[River][source]#: Both prunes reaches and filters rivers whose contributing area is less than area.

watershed_workflow.river_tree.removeBraids(rivers: List[River]) → None[source]#: Remove braids, but not diversions.

watershed_workflow.river_tree.simplify(rivers: List[River], tol: float) → None[source]#: Simplify, IN PLACE, all reaches.

Hydrography#

Functions for manipulating combinations of River and SplitHUCs objects

watershed_workflow.hydrography.cutAndSnapCrossings(hucs: SplitHUCs, rivers: List[River], tol: float) → None[source]#

Aligns river and HUC objects.

where a reach crosses an external boundary, cut in two and keep only internal portion.
where a reach crosses an internal boundary, either: - snap the internal boundary to the reach endpoint - cut the reach in two

At the end of the day, we ensure that any place where a river crosses a HUC boundary is a discrete point at a both a reach endpoint and a HUC boundary segment. This ensures that those points will never move and will always be coincident.

watershed_workflow.hydrography.findOutletsByCrossings(hucs: SplitHUCs, river: River, tol: float = 10, debug_plot: bool = False) → None[source]#

For each HUC, find all outlets using a river network’s crossing points.

Parameters:

hucs (SplitHUCs) – Split HUCs object to find outlets for.
river (River) – River network to use for finding crossing points.
tol (float, optional) – Tolerance in map units for clustering crossings, by default 10.
debug_plot (bool, optional) – Whether to create debug plots showing outlets, by default False.

watershed_workflow.hydrography.findOutletsByElevation(hucs: SplitHUCs, elev_raster: Dataset) → None[source]#

Find outlets by the minimum elevation on the boundary.

Parameters:

hucs (SplitHUCs) – Split HUCs object to find outlets for.
elev_raster (xarray.Dataset) – Elevation raster dataset for determining minimum elevations.

watershed_workflow.hydrography.findOutletsByHydroseq(hucs: SplitHUCs, river: River, tol: float = 0.0) → None[source]#

Find outlets using the HydroSequence VAA of NHDPlus.

Finds the minimum hydroseq reach in each HUC, and intersects that with the boundary to find the outlet.

Parameters:

hucs (SplitHUCs) – Split HUCs object to find outlets for.
river (River) – River network with HydroSequence properties.
tol (float, optional) – Tolerance for buffering reaches, by default 0.0.

watershed_workflow.hydrography.snapHUCsJunctions(hucs: SplitHUCs, rivers: List[River], tol: float) → None[source]#

Snaps the junctions of HUC linestrings to endpoints of rivers.

Modifies HUCs geometry.

Parameters:

hucs (SplitHUCs) – Split HUCs object to modify.
rivers (List[River]) – List of river networks to snap to.
tol (float) – Snapping tolerance in map units.

watershed_workflow.hydrography.snapReachEndpoints(hucs: SplitHUCs, river: River, tol: float) → None[source]#

Snap river endpoints to HUC linestrings and insert that point into the boundary.

Note this is O(n^2), and could be made more efficient. Modifies reach geometry.

Parameters:

hucs (SplitHUCs) – Split HUCs object containing boundary linestrings.
river (River) – River network to snap endpoints for.
tol (float) – Snapping tolerance in map units.

watershed_workflow.hydrography.snapWaterbodies(waterbodies: List[BaseGeometry], hucs: SplitHUCs, rivers: List[River], tol: float) → None[source]#

Snap waterbodies to HUCs and river linestrings.

Attempts to make waterbodies that intersect or nearly intersect hucs intersect discretely, in that they share common point(s).

Parameters:

waterbodies (List[shapely.geometry.base.BaseGeometry]) – List of waterbody geometries to snap.
hucs (SplitHUCs) – Split HUCs object containing boundary linestrings.
rivers (List[River]) – List of river networks to snap to.
tol (float) – Snapping tolerance in map units.