Overview and Background
This project will develop machine learning and statistical methods for real-time forecasting via data fusion with uncertainty quantification for water catchments. It will use and develop advanced AI models (building on and expanding the work such as Allen, et al., 2025 and relevant foundation models) to fuse in situ sensors and satellite data (optical and/or radar) for hydrology and surface water quality in river catchments. The project will also assess the generalisability of developed methods across multiple river catchments.
Data demands can be very large with, for example, data collected approximately every 15 minutes from multiple in situ sensors and 10m resolution for satellite. To obtain fast (real-time or near real-time), reliable and computationally efficient (and therefore environmentally friendly) models, this project specifically targets the development of GPU programming for scalable analytics and will consider the advantages of cloud GPUs and other related platforms, for example EDITO, to support transferability and impact.
This project is a collaboration with domain expert colleagues at University of Stirling and Scottish Water and will link to NERC projects such as SenseH20 and MOT4Rivers and the Forth-ERA digital observatory of the Forth Catchment.
Methodology and Objectives
The AI-based models will be developed using the latest generation of deep learning approaches (such as transformer models, physics–informed losses, etc). As detailed in the background, this project would leverage existing model frameworks (like Aardvark), foundation models that can be specialised, and construct domain–specific pipelines as appropriate. The fusion methodologies will be tested on a number of downstream tasks, most notably predicting values of water quantity/quality far from the sensor locations (validated by cross validation) and forecasting. While the frameworks generated will be tested on the dataset in question, the assumption would be that the models can be transferred to other localities, and testing the portability of these approaches will be part of the later stages of this project.
Teaser Project 1:
The first teaser project will focus on developing an initial methodology for data fusion, based on taking an off-the-shelf deep learning approach, and applying it to the dataset in question to explore the performance. This will be compared with standard statistical approaches (e.g. Kriging and hierarchical Bayesian spatiotemporal models) to understand the relative advantages and disadvantages of this approach both in accuracy of prediction and computational time. Based on these initial findings and limitations of the approach, we will consider additional changes to the architecture including software optimisations (including GPU programming) or indeed the development of a different approach which will naturally extend into a full PhD project should the student decide to pursue this.
Teaser Project 2:
The second teaser project is strongly rooted in uncertainty quantification, i.e. understanding how certain we should be about the model’s predictions. AI-based approaches while regularly delivering high accuracy often lack the strong probabilistic frameworks to give good uncertainty quantification. To do this, we will employ a mixture of approaches from Monte Carlo based and variational inference approaches leveraging the fast inference time of AI models, to emulation-based approaches. Given the data fusion-based pipelines we will be developing this will require understanding both the uncertainty induced by the model and the uncertainty in the observations themselves. Computationally this is quite intensive, and therefore part of this teaser project will be understanding this complexity and optimising it, both through computational means (i.e. GPU coding, HPC etc etc) and through statistical techniques to most efficiently use computational resources (and limit their environmental impact).
References & Further Reading
Allen, A., Markou, S., Tebbutt, W., Requeima, J., Bruinsma, W. P., Andersson, T. R., … & Turner, R. E. (2025). End-to-end data-driven weather prediction. Nature, 641(8065), 1172-1179. 10.1038/s41586-025-08897-0
Andersson, T. R. et al. (2021) Seasonal Arctic sea ice forecasting with probabilistic deep learning. Nature Communications, 12, 5124. (doi: 10.1038/s41467-021-25257-4) (PMID:34446701) (PMCID:PMC8390499)
Colombo, P., Miller, C., Yang, X., O’Donnell, R., & Maranzano, P. (2025). Warped multifidelity Gaussian processes for data fusion of skewed environmental data. Journal of the Royal Statistical Society Series C: Applied Statistics, 74(3), 844-865. 10.1093/jrsssc/qlaf003
Wilkie, C. J., Miller, C. A., Scott, E. M., O’Donnell, R. A., Hunter, P. D., Spyrakos, E., & Tyler, A. N. (2019). Nonparametric statistical downscaling for the fusion of data of different spatiotemporal support. Environmetrics, 30(3), e2549. 10.1002/env.2549
