PhD Pilot Blog

What the Water Doesn’t Tell Us: Building Machine Learning Systems for Nordic inland water quality Monitoring

Nordic inland waters are among the most extensively monitored freshwater systems on Earth. Across Norway, Sweden, Finland, and Denmark, thousands of stations have been sampled for chemical and biological water quality parameters over multiple decades, producing one of the richest long-term observational records in environmental science. Yet a quiet crisis is unfolding in these lakes. Eutrophication is intensifying in agricultural lowlands while progressive browning, driven by rising dissolved organic matter from peat-rich boreal catchments, is altering light regimes and carbon cycles across sub-arctic systems.

The EU Water Framework Directive (WFD) demands ecological status assessments for tens of thousands of water bodies, most of which have never been directly sampled. Models are needed. The question is whether the data we have is structured well enough to build them.

The Gap Between What We Measure and What We Know

Monitoring stations exist where someone decided to place them, typically near water treatment infrastructure, downstream of known pollution sources, or at easily accessible shorelines. The resulting spatial distribution reflects administrative priorities rather than ecological sampling design.

In practice, this means that large fractions of Nordic lakes remain unobserved, and even the lakes that are monitored are often only checked on a fixed schedule, such as once a year. This can miss important short-term events. For example, a lake might go through a full algae bloom, sometimes involving toxic cyanobacteria, between two visits, and that event would never be recorded.

Satellite data can help fill in these gaps. Satellites, such as Sentinel 1,2 and 3, Landsat 7/8/9, and MODIS, observe the Earth frequently, sometimes almost daily, with image detail as fine as 10 meters. Over time, this creates a large archive of images that is waiting to be translated into water quality information. However, turning satellite images into reliable water quality information is not always straightforward. The problem is especially difficult in Nordic lakes, where conditions often make the signals hard to interpret.

from monitoring to implementing — Figure 1. Water bodies across the Nordic River Basin Districts failing to achieve Good Ecological Status under the EU Water Framework Directive (2nd River Basin Management Plan cycle). Blue indicates low failure rates; red indicates high failure rates. Source: European Environment Agency, WFD Ecological Status Dashboard. © EEA, © 2026 Mapbox, © OpenStreetMap.

Why Satellites Struggle Over Nordic Lakes

Satellites are a powerful tool for observing water quality, but in Nordic regions they face some important limitations. For about six to eight months of the year, many lakes in Finland, northern Norway, and northern Sweden are covered by snow or ice. During this time, satellite sensors cannot “see” the water itself. Instead, they capture signals from snow or ice on the surface, which tells us nothing about what is happening in the lake underneath. This means that for much of the year, satellite-based water monitoring is not usable, leaving only a short summer season when reliable observations are possible.

Even during summer, another problem appears. Many of the computer models used to estimate water quality from satellite images were originally developed for clearer coastal waters or more nutrient-rich temperate lakes. Nordic lakes are often different: many are dark in colour, because they contain high levels of dissolved organic matter (CDOM) in the water.

When these standard models are applied to such lakes, several issues arise:

Atmospheric correction methods that work well over oceans or clear lakes often produce negative blue-band reflectance values over dark, carbon-rich boreal lakes which is an impossible result that can quietly affect any later analysis using that band.
Chlorophyll retrievals built on blue-to-green band ratios become numerically unstable.
Turbidity estimates can confuse light absorption from dissolved carbon with scattering from suspended particles, causing dark, clear lakes to be incorrectly classified as bright and murky.

None of these are obscure edge cases. They are predictable consequences of applying southern-latitude calibrations to northern European inland waters, and they require explicit documentation if the resulting data is to be used responsibly.

The Hidden Problem With “Accurate” Water Quality Models

Even if satellite measurements were perfect, there is another major issue with how water quality machine learning models are evaluated. Most studies randomly split data into training and test sets. On paper, this looks like a fair way to measure accuracy, but in reality, it can give misleadingly good results.

Water quality measurements from stations within the same drainage basin share a large component of variance driven by common catchment inputs, shared meteorological forcing, and hydrological connectivity. This means that when a model is trained on random samples, it has effectively already seen conditions very close to the test stations. Instead of learning how to predict water quality in completely new places, the model is mostly learning patterns from areas it already knows.

As a result, many published performance metrics in state-of-the-art (SOTA) literature are systematically inflated relative to what those models achieve in genuine deployment. Correct evaluation requires spatially held-out units, whole catchments or drainage basins excluded entirely from training, so that the model faces the actual prediction problem when deployed in practice.

What Progress Requires

To make real progress in Nordic water quality monitoring, researchers need better shared datasets and testing standards. This means combining long-term in-situ observations with multi-sensor satellite reflectance, regional atmospheric reanalysis, and static catchment attributes. It would also include clear quality checks, reproducible spatial and temporal evaluation splits, and state of the art (SOTA) baselines. The open questions such infrastructure would unlock are concrete.

Can network-aware models detect early warning signals of eutrophication by propagating information along hydrological connectivity graphs before bloom thresholds are crossed?
Do the atmospheric dynamics captured at kilometre-scale regional reanalysis resolution carry predictive information absent from coarser global products?
Can models trained on southern Scandinavian lowlands generalize to sub-arctic Finnish or Norwegian catchments?

These questions have answers that matter for water management across the Nordic region. Getting to them requires building the measurement infrastructure first.

This blog reflects ongoing research on multi-source water quality benchmarking for Nordic inland waters. A full methodological account will appear in due course.

30.5.2026

Share the Post:

Symposium on restoration 30.6.-2.7.2026

Natural Resources Institute Finland is organizing an international symposium that brings together scientists, experts, and other stakeholders to share best practices on restoring freshwater habitats and improving our understanding of the status of migratory fishes at the end of June 2026. When & where: 30.6.-2.7.2026, Oulu, Finland. Abstract submission: Open

Impacts of ferry waves on bladderwrack

Petra Saari, Åbo Akademi University. petra.saari@abo.fi What are the impacts of wave disturbance from ferries on bladderwrack (Fucus vesiculosus) and the coastal ecosystem that they support? In my PhD, I combine ecological measurements, underwater technology, and sediment analysis to assess the ecological consequences of ship-induced wave stress on one of