A do-it-yourself water quality sensor network to elucidate contaminant signatures and improve land management advice

Article

Present Address:

James E. Dare 1,2,3✉ Emailjd227@students.waikato.ac.nz

Deniz Özkundakci 1

Richard W. McDowell 4,5

1 School of Science The University of Waikato Hamilton New Zealand

2 Bay of Plenty Regional Council Tauranga New Zealand

3 AquaWatch Solutions Auckland New Zealand

4 Faculty of Agriculture and Life Sciences Lincoln University Lincoln New Zealand

5 Lincoln Science Centre AgResearch Lincoln New Zealand

James E. Dare*^1,2,3, Deniz Özkundakci¹, Richard W. McDowell^4,5

¹ School of Science, The University of Waikato, Hamilton, New Zealand.

² Bay of Plenty Regional Council, Tauranga, New Zealand.

³ AquaWatch Solutions, Auckland, New Zealand.

⁴ Faculty of Agriculture and Life Sciences, Lincoln University, Lincoln, New Zealand.

⁵ AgResearch, Lincoln Science Centre, Lincoln, New Zealand.

^* Corresponding author: James Dare, jd227@students.waikato.ac.nz

Abstract

Quantifying contaminant loads to estuaries is essential for setting effective limits on resource use and safeguarding ecological values. Traditional monitoring programmes often rely on infrequent sampling, which can substantially underestimate loads and obscure key transport processes. While commercial high-frequency sampling stations are prohibitively expensive, open-source, do-it-yourself technologies now offer affordable alternatives for continuous monitoring. Here, we deployed ten low-cost, in situ monitoring stations equipped with research-grade sensors across an intensively farmed catchment draining to a sensitive estuarine environment in the Bay of Plenty, New Zealand. Using artificial neural network models trained on concurrent grab samples, we converted sensor measurements into reliable 15-min estimates of contaminant concentrations. High-frequency load calculations revealed nitrogen, phosphorus, and sediment exports to be 6–87% greater than estimates derived from traditional monthly sampling. Moreover, time-series outputs uncovered distinct sub-catchment contaminant mobilisation and transport dynamics that would otherwise remain undetected. These findings demonstrate that open-source, high-frequency monitoring can substantially improve contaminant load quantification, provide new insights into catchment processes, and inform the development of land management and policy strategies that reflect the unique spatio-temporal patterns of contaminant export.

Introduction

Agricultural intensification and urbanisation has led to widespread degradation of freshwater quality globally, with increased contaminant losses and declining water quality in many rivers, lakes, and estuaries^1–4. Effective management of these systems requires accurate monitoring of water quality, which is essential for understanding long-term changes, estimating contaminant loads, and identifying the drivers of ecological decline in affected catchments⁵. In many regions, local authorities use a combination of policy tools, regulatory processes, and on-the-ground action to manage freshwater resources. However, traditional water quality monitoring programmes with monthly or greater sampling frequency, often fail to capture rapid hydrochemical processes and changes that occur on shorter time scales^6,7. Kirchner⁶ likens this to ‘trying to understand a symphony by listening to only one note every minute or two’. However, this symphony occurs across an entire catchment, where discrete moments in space and time are known to contribute disproportionately relative to background conditions. While datasets from monthly sampling are suitable for some analytical purposes such as trend analysis⁸ or the calculation of crude contaminant loads, the lack of temporal resolution limits the ability to identify key processes where contaminant mobilisation is most acute, resulting in an incomplete understanding of water quality dynamics⁶.

Recent advances in the concept of 'hot-spots and hot-moments' (HSHM) have highlighted the importance of temporal and spatial variability in biogeochemical processes and contaminant mobilisation^9,10. HSHM refers to specific locations and times periods within a catchment where biogeochemical reactions and contaminant fluxes are disproportionately high. Understanding these phenomena is critical for improving water quality management, as areas with elevated biogeochemical process rates (e.g., wetlands, or riparian zones) can be protected, enhanced, or created to limit the extent of problems such as eutrophication¹⁰. Resource managers also need to understand HSHM as sources of contaminants to be able to priortise the location of management interventions’, as well as understanding the relative importance of ‘hot-moments’¹⁰.

Traditional monthly water quality monitoring does not adequately characterise ‘hot-moments’ which can lead to underestimation of contaminant loads due to the disproportionate contribution of short-lived storm events¹¹. For example, Abell et al.¹² showed that annual load estimates of total nitrogen (TN) and total phosphorus (TP) in two New Zealand streams were 19% and 40% less when estimated via monthly samples versus samples supplemented with targeted storm monitoring. Furthermore, McDowell et al.¹³ found that monthly sampling resulted in a ~ 30% error in the estimation of TP and nitrate-nitrogen loads compared to sampling every 30 minutes.

Knowledge of HSHM is particularly important for implementing environmental policy in New Zealand (the focus areas of this study) within the context of New Zealand’s National Policy Statement for Freshwater Management¹⁴. This policy mandates regional authorities to set contaminant limits to prevent further water quality decline, and to achieve outcomes desired by their local community. However, underestimation of contaminant loads from traditional monthly sampling could lead environmental managers to make incorrect conclusions about the severity or location of contaminant issues, leading to weak or ineffective management actions that will never achieve the intended objective¹³. By enhancing monitoring efforts to capture HSHM, environmental managers could more accurately quantify contaminant loads and develop the most appropriate management strategy to match contaminant mobilisation nuances observed in discrete areas of the catchment¹⁵.

The expansion of in-situ water quality sensors provides an opportunity to capture HSHM at discrete sites within a monitoring network. However, commercially available monitoring stations are expensive which means that regional authorities are often limited to a small number of sites in large river network catchments^16,17. An emerging solution comes via the ‘open-source movement’, which aims to make equipment and software available for modification with a comparably low-cost barrier to entry¹⁸. This has led to the development of ‘do-it-yourself’ (DIY) environmental monitoring options that are able to control an array of research grade sensors for a fraction of the price of commercial monitoring stations. One such option is the EnviroDIY Mayfly board, which has been designed for low-cost, solar-powered, wireless applications in environmental science¹⁶. Users can programme the board themselves using a range of different sensor-specific code templates written within the Arduino interactive development environment (IDE) and shared via an active GitHub community, providing similar functionality and reliability to commercially available systems.

Environmental practitioners often use in-situ sensor data to develop a relationship with hydrochemical or physical parameters measured via lab-samples, in a process referred to as a surrogate relationship¹⁹. Surrogate relationships provide a practical, cost-effective method to understand high-frequency, hydrochemical dynamics with more accuracy than can be achieved using discrete water quality samples alone^19,20. Surrogate relationships have evolved from stepwise regression models to more computationally intensive machine learning models such as random forest or artificial neural networks (ANN)^21,22. The use of ANN methods have been a popular choice for water quality studies due to their robustness, ability to resolve nonlinear problems, absence of statistical assumptions, and suitability to smaller datasets compared with other machine learning techniques^21,23,24. Examples of ANN applications in the field of water quality research include: the prediction of water quality index scores from hydrochemical measurements in South African river basins²⁵; estimation of microcystin concentrations in the Binhu river network of China²⁶; prediction of TN and TP concentrations in US lakes from in-situ pH, conductivity, and turbidity measurements²⁷; the prediction of turbidity in the Nalón river basin (Spain) from in-situ sensor measurements²⁸; and detection of spills and discharges into the Potomac River Basin in Virginia, USA²⁹.

The main objectives of this study are twofold: firstly, to use DIY water quality stations, in combination with ANN-based surrogate models to enhance the information obtained from a traditional water quality monitoring network, so that contaminant HSHM and more accurate contaminant loads are elucidated; and secondly, to interpret, translate, and provide recommendations as to how observed HSHM information can be applied to improve resource management. It is hypothesised that the enhanced network will provide a pragmatic way to elucidate contaminant HSHM and increase the accuracy of contaminant load estimation over existing methods.

Results

Routine monitoring overview

The number of routine water quality samples collected at each site between November 2021 and June 2024 ranged from 24 to 29. This did not include targeted storm samples or opportunistic samples and therefore represents ‘traditional monitoring’ initiated by the regional authority, Bay of Plenty Regional Council (BOPRC). Summary statistics for TN, TP, and total suspended solids (TSS) for each network site are available in the supplementary information (Table S1).

Median TN concentrations ranged from 0.99 mg L^− 1 (Pokopoko at Allport Rd) to 2.37 mg L^− 1 (Wharere at Maniatutu Rd). ‘Puanene at SH2’ and ‘Mangatoetoe at end of Black Rd’ were the most variable sites for TN (stdev ~ 0.60 mg L^− 1compared to mean stdev of 0.32 mg L^− 1). Median TP ranged from 0.091 mg L^− 1 (Pokopoko at Allport Rd) to 0.133 mg L^− 1 (Puanene at SH2; Pongakawa trib at Rotoehu Rd). ‘Puanene at SH2’ had a maximum TP concentration that was 51 times higher than the median, compared to other sites where the maximum value averaged 4.4 times higher. Median TSS ranged from 3.8 mg L^− 1 (Wharere at Maniatutu Rd) to 55.9 mg L^− 1 (Pokopoko at Allport Rd). Four sites had maximum TSS values that were over 30 times their median value (Pokopoko at Allport Rd; Oeutehuehue at Farm Box Culvert; Pokopoko at Old Coach Rd; Pongakawa Tributary at Old Coach Road). Three of these sites were located within the Pokopoko sub-catchment, while ‘Pongakawa Tributary at Old Coach Road’ was located in the Pongakawa sub-catchment.

Storm event monitoring

An example of targeted storm event monitoring at ‘Mangatoetoe at end of Black Rd’ is shown in Fig. 1. Discharge peaked at 2.02 m³ s^− 1 at 20:30 on June 14 2024, reflected by a drop in specific conductivity from a pre-event baseline of approximately 130 µS cm^− 1 to a minimum of 60 µS cm^− 1. Turbidity increased with discharge to approximately 400 NTU, with isolated peaks as high as 1250 NTU, before returning to pre-event levels as the discharge receded. Water temperature was less affected by the storm event, and more controlled by diurnal patterns, increasing to a peak of 14.5°C in the afternoon before reducing into the evening.

An ISCO autosampler was deployed throughout this event to characterise contaminant concentrations. The first sample was collected at 04:00 on June 14 2024, continuing until 24 samples had been collected at 14:30 on June 15 2024. The results for TN are displayed in the bottom panel of Fig. 1 and show a pre-event concentration of 1.64 mg L^− 1, increasing to a peak of 3.21 mg L^− 1 which occurred slightly before the discharge peak, reducing to a post-event concentration of 1.93 mg L^− 1.

The dashed line in the bottom panel represents modelled TN concentrations, where discharge, conductivity, and turbidity were used as predictor variables. The model was able to accurately represent measured TN concentrations throughout this event, with the exception of the two highest values that occurred just before the discharge peak, and one value that occurred just before 06:00 on June 15.

Fig. 1

Data collected during a storm event that occurred on 14 June 2024 at the Mangatoetoe at Black Rd site. Panels show modelled discharge (top panel), in-situ conductivity (second panel), turbidity (third panel), and water temperature (fourth panel), above discrete lab-measured TN concentrations (bottom panel). The dashed line on the bottom panel shows the predicted TN record for this event derived from the surrogate relationship.

Surrogate relationships

The overall performance of ANN models was good, with 53% of models across all parameters having a Nash-Sutcliffe efficiency (NSE) value > 0.8, and 76% of models having a NSE > 0.7. TSS was the best performing contaminant model across all sites (median NSE = 0.90), followed by TP (median NSE = 0.80) and TN (median NSE = 0.75) (Table 1).

Model performance was generally poorer at sites that were influenced by groundwater, such as Wharere at Maniatutu Rd (mean NSE = -0.34 across all parameters), and Pongakawa at Old Coach Rd (mean NSE = 0.77). All contaminant models for Wharere at Maniatutu Rd, and Pongakawa Tributary at Old Coach Rd, were removed from further analysis due to poor diagnostic performance (NSE < 0.5).

Table 1

Model performance for each site and parameter across the study period (November 2021 - June 2024). The coefficient of determination (R²), Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and root mean squared error (RMSE) values are shown as indicators of model performance. ‘Storm Events’ shows the number of individual storm events that were included in the model dataset at each site, and ‘Storm Sample Composition’ provides the percentage of storm event samples relative to the entire dataset for that parameter.
Site	Parameter	Storm Events	Storm Sample Composition	R²	NSE	PBIAS	RMSE
Mangatoetoe at end of Black Rd	TN	2	55%	0.74	0.72	-2.7	0.04
	TP	2	55%	0.81	0.79	0.5	0.10
	TSS	1	44%	0.92	0.92	2.0	0.18
Oeutehuehue at Farm Box Culvert	TN	2	62%	0.88	0.83	-7.9	0.09
	TP	2	62%	0.81	0.80	3.5	0.17
	TSS	2	67%	0.95	0.94	1.5	0.14
Pokopoko at Allport Rd	TN	2	54%	0.80	0.79	7.5	0.06
	TP	2	54%	0.87	0.85	-5.2	0.11
	TSS	2	58%	0.87	0.85	-1.5	0.17
Pokopoko at Old Coach Rd	TN	2	59%	0.86	0.83	3.0	0.04
	TP	2	59%	0.83	0.82	0.2	0.18
	TSS	2	67%	0.98	0.98	-0.1	0.12
Pongakawa at Old Coach Road	TN	3	56%	0.68	0.66	2.1	0.02
	TP	3	60%	0.71	0.71	-0.3	0.06
	TSS	1	44%	0.97	0.94	-3.7	0.10
Pongakawa Tributary at Old Coach Rd	TN²	2	42%	0.55	0.45	2.4	0.06
	TP²	2	42%	0.91	0.82	4.1	0.07
	TSS²	1	45%	0.84	0.84	-6.3	0.15
Pongakawa Tributary at Rotoehu Rd	TN	2	56%	0.66	0.65	-1.1	0.03
	TP	2	56%	0.76	0.74	0.8	0.94
	TSS	2	63%	0.87	0.79	1.6	0.10
Puanene at SH2	TN	2	56%	0.84	0.83	-0.7	0.07
	TP	2	56%	0.71	0.59	-2.0	0.23
	TSS	1	43%	0.95	0.87	-7.8	0.22
Wharere at Maniatutu Rd	TN¹	2	46%	0.36	-0.23	-0.7	0.03
	TP¹	2	46%	0.01	-0.98	6.7	0.22
	TSS¹	2	50%	0.20	0.19	-2.7	0.28
Wharere at Old Coach Rd	TN³	2	54%	0.79	0.78	-0.9	0.04
	TP	2	54%	0.93	0.90	4.1	0.09
	TSS	2	59%	0.97	0.96	2.7	0.10

¹ Model was deemed unacceptable because NSE < 0.5.

² Model was deemed unacceptable based on visual inspection.

³ Model was deemed acceptable but is considered to be of lower quality after visual inspection, despite acceptable model performance indicators.

Figure 2 provides examples of modelled TN for three of the ten network sites. The shaded bands represent upper and lower 90% prediction intervals, and the red points represent observed TN concentrations from laboratory samples. All three models provide a good fit between predicted and observed data, with NSE values of 0.83, 0.65, and 0.78 (top to bottom). However, the bottom model was considered lower quality than the others due to data steps and reduced certainty (wider prediction intervals), despite having a NSE value of 0.78. These examples show the importance of visually inspecting model output in addition to statistical output.

Fig. 2

Examples of modelled TN concentrations for three of the ten network sites. The black lines represent predicted TN, the grey shaded bands represent the upper and lower 90% prediction intervals, and the red points represent discrete laboratory sample results.

Total load export

The summed mean annual load export from the most downstream site on each river branch (henceforth ‘export sites’) was 550 t TN yr^− 1, 56 t TP yr^− 1, and 36,300 t TSS yr^− 1.

Pongakawa River at Old Coach Rd had the highest mean TN and TP load export, contributing 58% (319 t yr^− 1) of the total TN load, and 49% (27.8 t yr^− 1) of the total TP load exported (Fig. 3). Pokopoko at Old Coach Rd contributed 25% of the TN (136 t yr^− 1) and 34% of the TP (19.1 t yr^− 1) load. TN and TP export at Pokopoko at Old Coach Rd was 4% and 10% greater than the combined loads from upstream sites (Pokopoko at Allport Rd, Oeutehuehue at Farm Box Culvert) for TN and TP, respectively. The combined load export from other sites within the catchment made up approximately 17% of the total load export for both TN and TP.

Sediment load export was dominated by the Pokopoko sub-catchment, with Pokopoko at Old Coach Rd exporting 62% (22,570 t yr^− 1) of the total annual network load. The combined export from the two upstream branches of the Pokopoko Stream equated to 23% of the total network load, implying that 39% of the annual network load is generated in the small area between these sites. Pongakawa at Old Coach Rd site contributed 28% (10,313 t yr^− 1), and all other river branches contributed a combined annual load ~ 10%.

Fig. 3

Total annual load of TN, TP, TSS at the ten monitoring sites in Waihi Estuary catchment for data collected between November 2021 and June 2024. Areas in white represent poor quality contaminant models that were removed from the analysis or unmonitored sections of the lower catchment.

Contaminant yields

Total nitrogen yields were highest in the Pongakawa sub-catchment (32.0 kg ha^− 1 yr^− 1), followed by the Mangatoetoe (25.9 kg ha^− 1 yr^− 1) and the upper western branch of the Pokopoko River (18.4 kg ha^− 1 yr^− 1) (Fig. 4). The Mangatoetoe expressed the largest TP yield (2.9 kg ha^− 1 yr^− 1), followed by the Pongakawa River (2.8 kg ha^− 1 yr^− 1), and the western branch of the Pokopoko Stream (2.4 kg ha^− 1 yr^− 1). Sediment yields were largest for the lower Pokopoko Stream (2,385 kg ha^− 1 yr^− 1) and could be traced upstream to the Pokopoko at Allport Rd Site (1,911 kg ha^− 1 yr^− 1). Other significant contributors of sediment were the Mangatoetoe (1,284 kg ha^− 1 yr^− 1) and Pongakawa sub-catchments (1,034 kg ha^− 1 yr^− 1).

The Pongakawa Tributary at Rotoehu Rd site had the lowest yield across all three contaminants.

Fig. 4

Yields of TN, TP, TSS at the ten monitoring sites in Waihi Estuary catchment for data collected between November 2021 and June 2024. Areas in white represent poor quality contaminant models that were removed from the analysis or unmonitored sections of the lower catchment.

Baseflow load export

The percentage of TN load exported during baseflow conditions varied from 76% to 97% (Fig. 5). The most baseflow dominant sites were Pongakawa at Old Coach Rd (97%), and the two upper Pokopoko catchment sites (Oeutehuehue at Farm Box Culvert = 92%; Pokopoko at Allport Rd = 89%). The least baseflow dominant sites were in the lower catchment (Mangatoetoe at end of Black Rd = 76%; Puanene at SH2 = 79%; Pokopoko at Old Coach Rd = 82%).

Baseflow TP export was more variable between sub-catchments than TN, ranging from 49% to 96%, and more controlled by non-baseflow events. Sites with the largest baseflow TP delivery were located in the Pongakawa or upper Pokopoko sub-catchments. Similar to TN, catchments closer to the estuary were more controlled by event-flow delivery (e.g., Mangatoetoe at end of Black Rd = 49%; Puanene at SH2 = 58%).

TSS was the most variable contaminant across sites for baseflow export (40%-95%), and the most event-driven. Two sites in the Pongakawa sub-catchment (Pongakawa at Old Coach Rd; Pongakawa Trib at Rotoehu Rd), delivered the greatest percentage of TSS load during baseflow conditions, with 95% and 86%, respectively. Conversely, Mangatoetoe at end of Black Rd and Puanene at SH2 delivered the highest percentage of TSS load during storm events, with baseflow delivery percentages of 40% and 47%, respectively.

Fig. 5

The average percentage of TN, TP, TSS load that was delivered during baseflow conditions, throughout the study period (November 2021 - June 2024). Areas in white represent poor quality contaminant models that were removed from the analysis or unmonitored sections of the lower catchment.

Comparison of load estimation methods

Figure 6 shows a comparison of load estimates calculated from ANN models and discrete monthly samples (henceforth the ‘traditional’ method). TN load estimates were approximately equal between the two methods, however TP and TSS estimates diverged, particularly for sites with larger estimated loads. ANN model estimates were on average 5% (-24.3% to 36.5%), 29% (-30% to 101%), and 61% (-6% to 315%) larger than traditional estimates, for TN, TP, and TSS, respectively. Export sites had total combined loads for TN, TP and TSS that were 6%, 32%, and 85% larger for ANN model estimates than the traditional method.

The width of confidence intervals (CI’s) was contaminant dependent, with the narrowest CI’s calculated across both methods for TN (average width = 56% of estimate), and the widest CI’s calculated for TSS (average width = 284% of estimate). Confidence intervals were on average 2% (-66% to 56%), and 74% (-61% to 482%) wider for ANN models compared to the traditional method for TN and TP, and 8% (-65% to 104%) narrower for TSS.

Fig. 6

Comparison of mean annual loads for each site calculated from ANN models (y axis) and traditional monthly sampling (x axis), for TN (left), TP (middle), and TSS (right). Circles represent the most downstream sites in each sub-catchment (i.e., contributing sites). Horizontal and vertical error bars represent 90% CI’s, the diagonal dashed line shows a 1:1 ratio, and the black line shows a model of best fit.

Discussion

The current study is among the first to use DIY sensor technology and surrogate modelling across multiple sites within a catchment, providing cost-effective, spatio-temporal insights into contaminant mobilisation. Our results show that integrating traditional monitoring with targeted storm-event sampling and low-cost sensors yields critical understanding of catchment function, particularly the timing and magnitude of nutrient delivery during high flows.

Surrogate water quality models have been widely applied in other contexts, with performance comparable to our findings. For example, Harrison et al.³⁰ pooled data across multiple River Thames sites using random forest models to predict TN and TP, achieving NSE values of 0.85 and 0.74 (compared with averages of 0.76 and 0.78 in this study). Shyu et al.³¹ developed surrogate models for wastewater treatment constituents, reporting R² values of 0.46–0.97 (0.84–0.97 in this study), while Holmberg et al.³² used ANN models in boreal streams to predict TN and TP with R² values of 0.83 and 0.78 (0.76 and 0.82 in this study).

Our findings also highlight the disproportionate timing and location of contaminant delivery. For instance, 50% of TN, TP, and TSS loads occurred during just 36%, 19%, and 2% of monitored time, respectively, reflecting strong spatio-temporal variability in supply and transport. This pattern supports the ‘hot spot–hot moment’ (HSHM) concept¹⁰, which describes uneven biogeochemical processing and contaminant movement in time and space. At the sub-catchment scale, Pokopoko at Old Coach Rd contributed 62% of TSS loads but only 28% of flow, while a single six-day storm cluster accounted for 5% of annual TN, 14% of TP, and 32% of TSS loads across all sites. Such concentrated export events are difficult to capture with monthly sampling, reinforcing the value of high-frequency monitoring.

Comparison of ANN-derived load estimates with those based on monthly sampling revealed that ANN loads were 6%, 32%, and 85% greater for TN, TP, and TSS, respectively. These differences reflect the ability of high-frequency models to capture in-stream changes that occur between discrete samples. Similar discrepancies have been reported elsewhere, with many studies concluding that monthly sampling underestimates true loads and increases uncertainty^12,33–35. For example, McDowell et al.¹³ showed that 48% of annual TN and 66% of annual TP loads in New Zealand occur during high flows—events that are often missed in monthly programmes. By capturing these dynamics, the method developed here improves understanding of the magnitude and source of contaminant loads to Waihi Estuary, providing a stronger foundation for setting achievable management targets.

Finally, our findings can inform catchment-scale modelling by enhancing the simulation of spatio-temporal nutrient fluxes. For example, ANN-derived contaminant signatures can be interpreted in relation to established biogeochemical concepts such as the four ‘control points’ described by Bernhardt et al. ⁹ (permanent, activated, export, and transport). Yulianti et al. ³⁶ used a similar approach to improve SWAT + model performance in Lake Ōkaro, calibrating streamflow sources and nitrogen transport dynamics using water and nitrogen isotopes. In our study, ANN outputs suggest that TN reaches the mainstem river in the Pongakawa sub-catchment primarily via low-threshold export (vertical leaching) and efficient shallow aquifer transport, resulting in persistently elevated baseflow concentrations. In contrast, the Pokopoko sub-catchment exhibited higher-threshold export, particularly for TP and TSS, where contaminants accumulated and were episodically delivered to the river during overland flow events.

Translating results into management actions

Our results show that ANN models can identify contaminant hot spots and hot moments (HSHM) and resolve transport pathways at the catchment scale, while simultaneously capturing storm-event dynamics that drive load estimates. To inform management, these insights must be translated into actions that align with catchment-wide goals—in this case, achieving sustainable contaminant load targets for the Waihi Estuary. This process involves both prioritising and characterising model outputs.

Prioritisation is essential, as land managers must allocate limited resources where they will have the greatest impact. Sub-catchments with yields exceeding load targets per unit area are prime candidates for intervention, but gross contaminant export must also be considered to ensure meaningful reductions at the estuary scale. For example, Pongakawa and Pokopoko are high-priority areas for TN, TP, and TSS management due to both high yields and high total export. In contrast, Mangatoetoe, Puanene, and Wharere exceed yield thresholds but contribute relatively little gross export; improvements here would have limited influence on overall estuary loads and are therefore better suited for long-term rather than immediate action.

Characterisation of contaminant pathways ensures that management actions are tailored to the dominant transport processes within each sub-catchment. In Pongakawa and Pokopoko, elevated baseflow TN export indicates vertical leaching of NO₃ to groundwater as the key pathway. This is typical of intensively farmed New Zealand landscapes with porous volcanic or alluvial soils³⁷. Here, soils act as export control points, accumulating nitrate until rainfall triggers leaching, while aquifers act as transport control points, conveying nitrate to streams. Because these pathways are largely controlled by inherent soil and geological properties, direct in-stream management is limited. The most effective strategies are those that reduce nitrogen accumulation and water movement at the source³⁸. Recommended practices include limiting stocking rates, aligning fertiliser use with crop uptake and hydrology^38,40, using organic nitrogen sources³⁹, managing land-use sensitivities³⁹, restricting tillage^38,40, and modifying land surfaces to reduce permeability or slow overland flow³⁹. Importantly, hydrological lag times mean that improvements may take years to be reflected in-stream; McDowell et al. ⁴¹ estimate a median lag of 4.5 years for NO₃ in New Zealand, with detectable improvements often requiring up to a decade post-intervention.

Sub-catchment TP dynamics fall into two categories: baseflow-driven (e.g., Pongakawa, upper Pokopoko) and storm-event-driven (e.g., lower Pokopoko, Mangatoetoe, Puanene, Wharere). Elevated baseflow TP is largely attributable to the dissolution of P-rich volcanic geology⁴², which is difficult to manage without chemical interventions such as alum dosing⁴³. Consequently, management is better directed toward storm-event-driven sub-catchments, where TP is predominantly particulate or sediment-bound. In these areas, contaminants are generally mobilised when rainfall generates sufficient overland flow, creating higher thresholds for export compared to nitrate in baseflow-dominated areas. Such dynamics present an opportunity for interceptive, co-management strategies that retain water and allow sediment to settle before reaching the river. Effective measures include retention bunds⁴⁴, constructed wetlands⁴⁵, protection of natural floodplains⁴⁶, and riparian buffer strips^45,47. At the same time, the most cost-effective way to address anthropogenic phosphorus is at its source—through fertiliser management⁴⁸, identification of critical source areas⁴⁹, and restriction of stock access⁴⁹. Additional gains can be made by stabilising erosion-prone gully heads through prevention measures (e.g., topsoil reinforcement, vegetation barriers) or control methods (e.g., runoff diversion, gully reshaping, check dams, revegetation) ⁵⁰.

Limitations and improvements

Development and knowledge requirements

A key limitation of this approach is the need for users to build and program each sensor station, which requires more technical expertise than most off-the-shelf (OTS) systems. However, these skills can be readily acquired through online resources designed to support community groups with little prior experience (Table S2). The main advantage of this method remains its cost: equipment costs are up to ten times lower than OTS systems, allowing wider spatial coverage across a catchment. In our study, however, additional requirements such as field technicians, autosamplers, and storm-event laboratory analyses meant that total costs were approximately 2.8 times higher than those of monthly sampling (Tables S3–S5). These expenses were primarily driven by event sample processing (64%), followed by sensor station equipment (30%), maintenance labour (4%), and autosampler deployment labour (2%). Despite higher overall costs, the added investment is minor when weighed against the risks of poorly informed land management actions or generic policies based on limited contaminant data.

Model performance

Model performance was generally acceptable, though sites influenced by groundwater performed worse than mixed-flow sites. This limitation likely reflects the difficulty of detecting pulses of soluble contaminants from groundwater using the available sensors. Future improvements could involve deploying additional probes capable of capturing subtle groundwater signals, such as pH or redox sensors.

Paepae et al. ⁵¹ propose accuracy targets for surrogate models, with R² values of 0.95–1.00 considered desirable, 0.90–0.95 acceptable, 0.85–0.90 tolerable, and 0.80–0.85 poor. By these standards, TN models in our study ranged from below poor to tolerable, TP from poor to acceptable, and TSS from poor to desirable. However, these thresholds were developed for commercial nitrate sensors, and Paepae et al. note that accuracy requirements depend on both environmental context and intended application. Given that our study used indirect sensor combinations to estimate concentrations, we argue that the spatio-temporal and hydrochemical insights gained outweigh the modest reduction in accuracy—particularly in applied contexts such as the Waihi Estuary, where outputs improve policy advice beyond what is possible with traditional monitoring.

Another challenge was the development of hydrological rating curves at soft-bottomed sites, which we addressed by developing flow relationships with nearby hydrological stations. This introduced variation in discharge estimation methods across the network. Future studies could reduce this limitation by employing hydrodynamic river models to standardise discharge estimates and improve comparability across sites.

Autosampling

A major strength of this study was the establishment of ten monitoring sites across the catchment, which improved spatial resolution of contaminant signatures. However, equipment availability limited the number of storm events sampled per site. Future work could focus on fewer, strategically selected locations to increase storm-event coverage and enhance model performance. In particular, emphasis should be placed on the Pongakawa River during major storm events, as this sub-catchment only responded to very high rainfall accumulation.

Conclusions

This study demonstrates that traditional water quality monitoring can be substantially enhanced through the use of cost-effective DIY sensor stations combined with ANN models. High-frequency surrogate records captured hydrochemically significant storm events missed by monthly sampling, improving load estimates—particularly for TP and TSS.

Integrating model outputs within the HSHM framework provided several key benefits for water quality management. First, it identified sub-catchments and event timings where export and transport control points combined to create disproportionate contaminant risks. Second, it enabled a triage-based approach to management, directing immediate interventions to high-export sub-catchments while reserving lower-priority areas for long-term planning. Third, it supported efficient investment by showing that the majority of contaminant exports originated from a small number of sub-catchments, highlighting the value of targeted mitigation. Finally, the framework demonstrated the need for adaptive management, as shifting contaminant signatures under changing climatic and land-use conditions demand flexible prioritisation.

By coupling ANN modelling with the HSHM framework, this study presents a cost-effective and practical method for disentangling complex contaminant transport dynamics. The approach not only provides more accurate load estimates for sensitive estuarine environments such as Waihi Estuary but also generates actionable insights that can inform spatially explicit, tailored land management strategies. While this requires greater resources than monthly sampling, the improvements in accuracy, certainty, and management relevance strongly justify the investment, and the framework is readily transferable to other agricultural catchments worldwide.

Methods

Study area

This study was carried out in the Waihi Estuary catchment, near the township of Pukehina, in the Bay of Plenty region of New Zealand (Fig. 7). The Waihi Estuary catchment is predominantly agricultural and covers approximately 34,256 hectares at the north-western edge of the Taupo Volcanic Zone⁵². Soil types are predominantly well-drained pumice soils overlying ignimbrite in the upper and mid catchment, transitioning to poorly drained gley soils overlying swamp deposits near the estuary^53,54. Major land uses include, dairy farming (35%), exotic forestry (20%, mainly Pinus radiata), sheep and beef farming (14%), and orchards (7%, mainly kiwi fruit). The catchment has undergone extensive hydraulic modification, particularly in lowland areas, where wetlands fringing Waihi Estuary have been drained, and riverine inputs have been channalised, to make way for agricultural intensification.

The catchment drains into Waihi Estuary, a shallow, eutrophic, intertidal estuary that is currently experiencing nuisance blooms of macroalgae due to excess nutrient and fine sediment loading from the contributing catchment⁵⁵. A catchment modelling study carried out in 2014 estimated that TN, TP, and total suspended solids (TSS) loads are currently 524%, 124%, and 232% of their natural state⁵⁵. Furthermore, it has been estimated that TN and TSS loads need to be reduced by approximately 40% and 65%, respectively to achieve a state that supports biodiversity, ecological functioning, and provides for cultural and community values⁵⁶. Estimated TP load reductions are more complex; work by Atkinson and Smith⁵⁷ implies that an N:P ratio of 30:1 is an ideal standard for assessing nutrient limitation in estuaries, which translates to a target of 18.2 t yr^− 1, or an approximate TP reduction of 30% from the 2014 load⁵⁵. However, natural state’ modelling implies that the natural N:P ratio was historically around 6.4:1 due to natural, volcanic, sources of P within the catchment, which means that the ideal ratio of 30:1 may be unrealistic⁵⁵. Total phosphorus remains a focus in this study given that opportunities to manage anthropogenic sources are still abundant and important for managers to understand.

Fig. 7

The Waihi Estuary Catchment in the Bay of Plenty region of New Zealand. The ten study sites are labelled and represented as black points. Each of the five monitored sub-catchments are coloured according to the legend, with the unmonitored, lower catchment left as the default shading.

Traditional water quality monitoring investigation

Bay of Plenty Regional Council (BOPRC), the regional authority that is responsible for managing water resources in the Bay of Plenty, established a routine water quality monitoring network in the Waihi Estuary catchment in 2021. This traditional monitoring network is part of BOPRC’s ‘focus catchment programme’, which aims to identify high-risk areas within a deteriorating catchment by increasing the spatial resolution of water quality monitoring sites. Monthly water quality samples were collected by trained professionals at 10 representative sites throughout the catchment (Fig. 7), and analysed for a suite of constituents, including TN, TP, and TSS. This dataset was used as the foundation for the current study.

Continuous monitoring

For the current study, each of the 10 sites in the BOPRC investigation was supplemented with a ‘Mayfly’ sensor station to better characterise riverine conditions between sampling dates. Sensor stations were built using an EnviroDIY Mayfly data logger board, which is an arduino-based microcontroller that has been developed for low-cost environmental research⁵⁸. Each Mayfly board was housed within a 2.4 litre Pelican™ box and attached to a 1.5m pole on the riverbank. Sensor stations were equipped with a 3.7V 3800mAh LiPo battery pack and a 6V 3.5W Voltaic solar panel providing an indefinite power source.

Two factory calibrated water quality sensors were attached to each sensor station: a HYDROS-21 conductivity, temperature, and depth sensor (CTD) from METER Group, and a Y11-A nephelometer with wiper from Yosemite Technologies. This sensor combination is recommended for detecting changes in dissolved contaminants (e.g., nitrate-nitrite-N [NNN], dissolved reactive P [DRP]) as well as particulate contaminants (e.g., TN, TP, TSS)¹⁶, while being price competitive, reliable, and compatible with the Mayfly system. CTD and turbidity data for each site were captured at 15-minute intervals and stored on a removable SD memory card.

Sensors were maintained via monthly site visits which involved removal of debris, thorough cleaning, and repairing of any damaged wipers. Data was downloaded at six monthly intervals and uploaded to a water management database. This database was designed for hydrological and water quality datasets and contains numerous in-built tools that enable data QA/QC processes. The four continuous data records from each monitoring site were assessed by a qualified environmental data specialist following upload to the database system. The data specialist was tasked with removing erroneous spikes, cross checking abnormalities with field notes, comparing data records to discrete field validation measurements, and adjusting datasets where appropriate (e.g., where sensor drift had occurred).

Storm event sampling

Stormflow sampling was used to capture contaminant dynamics during high-flow periods, which are known to contribute disproportionately to annual nutrient and sediment loads. Stormflow samples were collected using one of two available Teledyne ISCO 6715 autosamplers which were deployed opportunistically in response to incoming weather events. The intake hose of each autosampler was positioned next to the sensor station at each site to ensure that samples were representative of the conditions recorded by the sensors. Autosamplers were programmed to collect samples at a set time interval, commencing in pre-event conditions and, ideally, finishing when the hydrograph returned to baseflow. This generally equated to one sample every 90 minutes throughout an event.

A total of 21 storm events were captured across all 10 monitoring sites, with each site being sampled during at least two weather events. Storm event sampling lasted an average of 35 hours and represented rainfall accumulations of 27mm to 116mm.

Laboratory analysis

All hydrochemical analyses for the routine monitoring programme, as well as additional storm event samples, were carried out by the BOPRC water quality laboratory using IANZ accredited methodology. TN and TP were measured by the persulphate method APHA 4500-P J⁵⁹ using a Flow Injection Analyser. TSS was measured using a standard glass fibre filter, according to APHA 2540 D⁶⁰.

Discharge measurements

High-frequency discharge measurements were required at each site to allow for calculation of instantaneous loads. Puanene at SH2 met this requirement as it was a permanent hydrological monitoring site maintained by BOPRC, however discharge records at the other nine sites required development. This process involved manually measuring stream discharge on at least five dates over the latter half of 2022, using either a SonTek FlowTracker2 Acoustic Doppler Velocimeter, a SonTek-RS5 ADCP, or a SonTek M9. These data were combined with sensor water level readings and analysed by qualified BOPRC hydrologists to develop water level-discharge rating curves for each site. Four of the nine rating curves were unreliable due to inconsistent cross-sectional areas. This was addressed at three of the four sites by developing linear relationships between the discharge measurements and continuous discharge measurements from Puanene at SH2. Discharge at Pongakawa at Old Coach Rd was estimated through a multi parameter model that incorporated discharge from a permanent hydrology site in a neighbouring catchment (Waitahanui at Otamarakau Valley Rd, approximately 1 km away from the study catchment), water level from a nearby lake (Lake Rotoiti, approximately 7.5km away from the study catchment), and calendar year. Discharge records were summarised to 15-minute intervals to match the frequency of data from each Mayfly station. Details for each flow relationship are shown in Table 2.

Table 2

Discharge relationships developed for each of the ten monitoring sites.
Site	Discharge Model Type	Predictor Variables	Performance
Mangatoetoe at end of Black Rd	Linear relationship	● Stream discharge at Puanene at SH2	● P < 0.01 ● R² = 0.84
Oeutehuehue at Farm Box Culvert	Rating curve	● Water level	● P < 0.05 ● R² = 0.73
Pokopoko at Allport Rd	Rating curve	● Water level	● P < 0.001 ● R² = 0.99
Pokopoko at Old Coach Rd	Linear relationship	● Stream discharge at Puanene at SH2	● P < 0.001 ● R² = 0.93
Pongakawa at Old Coach Road	Artificial Neural Network	● Stream discharge at Waitahanui at Otamarakau Valley Rd ● Lake level at Rotoma at Whangaroa ● Calendar year	● RMSE = 0.023 ● R² = 0.86 ● NSE = 0.84 ● PBIAS = -0.2
Pongakawa Tributary at Old Coach Rd	Linear relationship	● Stream discharge at Puanene at SH2	● P < 0.01 ● R² = 0.81
Pongakawa Tributary at Rotoehu Rd	Rating curve	● Water level	● P < 0.05 ● R² = 0.96
Puanene at SH2	Rated hydrological site	● Water level	● P < 0.001 ● R² = 0.99
Wharere at Maniatutu Rd	Rating curve	● Water level	● P < 0.001 ● R² = 0.95
Wharere at Old Coach Rd	Rating curve	● Water level	● P < 0.05 ● R² = 0.79

Data analysis

TN, TP, and TSS concentrations from BOPRC’s routine sampling programme, and the targeted storm event monitoring programme, were extracted for each site and combined with mean-hourly or mean half-hourly sensor values. Missing sensor values were imputed by linear interpolation for gaps smaller than four hours using the ‘na.approx’ function in the ‘Zoo’ package⁶¹, and laboratory measurements that were below the 5th or beyond the 95th percentile were treated as outliers and removed from the dataset. Final datasets ranged from 64 to 101 observations, depending on the site and parameter.

Datasets for each site-parameter combination were used to develop real-time, surrogate, contaminant data records using feed-forward back-propagation ANN machine learning models²⁷. These models used sensor output (conductivity, temperature, depth, turbidity) as predictor variables, and either TN, TP, or TSS concentration as the response variable.

The general modelling procedure for each site-contaminant combination was as follows. Datasets were visually inspected using the ‘ggplot2’ package⁶² to determine the most appropriate sensor parameters to use, and statistical transformations to apply. They were then split into training and test datasets with a percentage ratio of 70:30, ensuring that an equal proportion of ‘event’ samples were present within each dataset. ANN models were developed using sensor measurements and either TN, TP, or TSS, using the ‘brulee’ package⁶³ of the ‘tidymodels’ ecosystem⁶⁴, within the R Computing Environment⁶⁵. Ten-fold cross-validation was applied to the training dataset, and a grid search was used to tune the following hyperparameters: number of epochs, number of hidden units, penalty value, and learning rate.

Finalised models were applied to the isolated test dataset, and four performance metrics were calculated: the coefficient of determination (R²), Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and root mean squared error (RMSE). Fifteen-minute resolution, surrogate models of TN, TP, and TSS concentrations were computed by applying each site-specific ANN model to the continuous sensor output. Upper and lower 90% prediction intervals were calculated for each modelled datapoint using split conformal inference methods within the ‘probably’ package⁶⁶. Model performance was assessed by interpreting performance metrics and visually inspecting the predicted vs observed time series data. Models that had a NSE < 0.5 were deemed not fit for further analysis.

Flow partitioning

Modelled discharge for each site was separated into baseflow and event-flow using a Lyne Hollick recursive baseflow filter⁶⁷ within the ‘grwat’ package⁶⁸. This baseflow separation method is widely accepted to be simple, robust and repeatable, and was the preferred method for the New Zealand baseflow index characterisation project and is well suited for streams where a substantial proportion of flow originates from groundwater⁶⁹. A filter parameter of 0.99 was selected for all sites by comparing the saturated hydraulic connectivity of volcanic soils in Alavani and Tomer⁷⁰ with optimal filter parameter values in Li et al.⁷¹.

Contaminant load calculation

Contaminant loads for TN, TP, and TSS were calculated for each site according to Eq. 1, derived from the numeric integration method in Meals et al.¹¹, where c_i is the i^th modelled concentration value, q_i is the corresponding modelled flow, and t_i is the time interval between measurements (fifteen minutes). Sensor data gaps greater than four hours were removed from the dataset to avoid bias.

$\:Mean\:Annual\:Load\:=\left(\frac{\:{\sum\:}_{i=1}^{n}{c}_{i}{q}_{i}{t}_{i}}{{\sum\:}_{i=1}^{n}{t}_{i}}\right)\cdot\:\:seconds\:per\:year$

[Equation 1]

The same equation was also applied to upper and lower 90% prediction interval datasets. Mean annual baseflow loads were calculated in the same way with the exception of using scaled flow output from the recursive baseflow filter.

Comparison with traditional load estimation

Mean annual load estimates calculated from ANN models were compared to estimates calculated from monthly samples for each site-contaminant combination. Datasets for the traditional method required removal of autosampler and opportunistic samples, leaving 24–29 discrete monthly samples at each site.

Symmetric 90th percentile confidence intervals were calculated for the traditional method using Eq. 2⁸, where

$\:\mu\:$

is the true population mean,

$\:{t}_{\left(\frac{a}{2},n-1\right)}$

and

$\:{t}_{\left(1-\frac{a}{2},n-1\right)}$

are critical t values with

$\:a=0.95$

$\:n$

= sample size,

$\:\underset{\_}{x}$

= mean load estimate, and

$\:s$

= the standard deviation of

$\:\underset{\_}{x}$

$\:{{\underset{\_}{x}+{t}_{\left(\frac{a}{2},n-1\right)}}_{}\cdot\:\sqrt{\frac{{s}^{2}}{n}}\le\:\mu\:\le\:\underset{\_}{x}+{t}_{\left(1-\frac{a}{2},n-1\right)}\cdot\:\sqrt{\frac{{s}^{2}}{n}}}_{}$

[Equation 2]

Spatial analysis

Spatial analysis was carried out using the ‘sf’ package⁷² within the R Statistical Computing Environment. Contributing sub-catchments for each monitoring site were delineated by combining upstream watersheds from the River Environment Classification (REC) digital river network⁷³. This information was used to calculate the annual areal load for each sub-catchment (kg ha^− 1 yr^− 1).

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

References

Gluckman, P. New Zealand’s Fresh Waters: Values, State, Trends and Human Impacts. 120 (2017).

Larned, S. T., Snelder, T., Unwin, M. J. & McBride, G. B. Water quality in New Zealand rivers: current state and trends. N. Z. J. Mar. Freshwat. Res. 50, 389–417. 10.1080/00288330.2016.1150309 (2016).

Weeks, E. S. et al. Conservation Science Statement. The demise of New Zealand’s freshwater flora and fauna: a forgotten treasure. Pac. Conserv. Biol. 22, 110–115. 10.1071/PC15038 (2016).

Zahoor, I. & Mushtaq, A. Water pollution from agricultural activities: A critical global review. Int. J. Chem. Biochem. Sci. 23, 164–176 (2023).

Roygard, J., McArthur, K. & Clark, M. Diffuse contributions dominate over point sources of soluble nutrients in two sub-catchments of the Manawatu River, New Zealand. N. Z. J. Mar. Freshwat. Res. 46, 219–241. 10.1080/00288330.2011.632425 (2012).

Kirchner, J. W., Feng, X., Neal, C. & Robson, A. J. The fine structure of water-quality dynamics: the (high-frequency) wave of the future. Hydrol. Process. 18, 1353–1359. https://doi.org/10.1002/hyp.5537 (2004).

Lloyd, C. E. M., Freer, J. E., Johnes, P. J. & Collins, A. L. Using hysteresis analysis of high-resolution water quality monitoring data, including uncertainty, to infer controls on nutrient and sediment transfer in catchments. Sci. Total Environ. 543, 388–404. 10.1016/j.scitotenv.2015.11.028 (2016).

Helsel, D. R., Hirsch, R. M., Ryberg, K. R., Archfield, S. A. & Gilroy, E. J. Statistical Methods in Water Resources. Techniques and Methods https://pubs.usgs.gov/publication/tm4A3 (2020). 10.3133/tm4A3 doi: 10.3133/tm4A3.

Bernhardt, E. S. et al. Control Points in Ecosystems: Moving Beyond the Hot Spot Hot Moment Concept. Ecosystems 20, 665–682. 10.1007/s10021-016-0103-y (2017).

10.

McClain, M. E. et al. Biogeochemical Hot Spots and Hot Moments at the Interface of Terrestrial and Aquatic Ecosystems. Ecosystems 6, 301–312 (2003). 10.1007/s10021-003-0161-9

11.

Meals, D. W., Richards, R. P. & Dressing, S. A. Pollutant load estimation for water quality monitoring projects. (2013).

12.

Abell, J. M., Hamilton, D. P. & Rutherford, J. C. Quantifying temporal and spatial variations in sediment, nitrogen and phosphorus transport in stream inflows to a large eutrophic lake. Environ. Sci. : Processes Impacts. 15, 1137–1152. 10.1039/C3EM00083D (2013).

13.

McDowell, R. W. et al. High flows contributed a large part of annual contaminant yields in New Zealand’s rivers. Commun. Earth Environ. 6, 335. 10.1038/s43247-025-02238-9 (2025).

14.

MfE. National Policy Statement for Freshwater Management 2020. (2022).

15.

Rissmann, C. et al. A hydrochemically guided landscape classification system for modelling spatial variation in multiple water quality indices: Process-attribute mapping. Science total environment (2019).

16.

Horsburgh, J. S. et al. Low-Cost, Open-Source, and Low-Power: But What to Do With the Data? Front. Earth Sci. 7 10.3389/feart.2019.00067 (2019).

17.

Westerhoff, R. et al. Towards implementation of robust monitoring technologies alongside freshwater improvement policy in Aotearoa New Zealand. Environ. Sci. Policy. 132, 1–12. 10.1016/j.envsci.2022.01.020 (2022).

18.

Chan, K. et al. Low-cost electronic sensors for environmental research: Pitfalls and opportunities. Progress Phys. Geography: Earth Environ. 0309133320956567 10.1177/0309133320956567 (2020).

19.

Castrillo, M. & García, Á. L. Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods. Water Res. 172, 115490. 10.1016/j.watres.2020.115490 (2020).

20.

Wang, G. et al. Artificial neural networks for water quality soft-sensing in wastewater treatment: a review. Artif. Intell. Rev. 55, 565–587. 10.1007/s10462-021-10038-8 (2022).

21.

Chen, Y., Song, L., Liu, Y., Yang, L. & Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Appl. Sci. 10, 5776. 10.3390/app10175776 (2020).

22.

Paepae, T., Bokoro, P. N. & Kyamakya, K. From Fully Physical to Virtual Sensing for Water Quality Assessment: A Comprehensive Review of the Relevant State-of-the-Art. Sensors 21, 6971. 10.3390/s21216971 (2021).

23.

Ibrahim, A. et al. Water quality modelling using principal component analysis and artificial neural network. Mar. Pollut. Bull. 187, 114493. 10.1016/j.marpolbul.2022.114493 (2023).

24.

Tiyasha, Tung, T. M. & Yaseen, Z. M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 585, 124670. 10.1016/j.jhydrol.2020.124670 (2020).

25.

Banda, T. D. & Kumarasamy, M. Artificial Neural Network (ANN)-Based Water Quality Index (WQI) for Assessing Spatiotemporal Trends in Surface Water Quality—A Case Study of South African River Basins. Water 16, 1485. 10.3390/w16111485 (2024).

26.

He, X., Wang, H., Zhuang, W., Liang, D. & Ao, Y. Risk prediction of microcystins based on water quality surrogates: A case study in a eutrophicated urban river network. Environ. Pollut. 275, 116651. 10.1016/j.envpol.2021.116651 (2021).

27.

Sinshaw, T. A., Surbeck, C. Q., Yasarer, H. & Najjar, Y. Artificial Neural Network for Prediction of Total Nitrogen and Phosphorus in US Lakes. J. Environ. Eng. 145, 04019032. 10.1061/(ASCE)EE.1943-7870.0001528 (2019).

28.

Iglesias, C. et al. Turbidity Prediction in a River Basin by Using Artificial Neural Networks: A Case Study in Northern Spain. Water Resour. Manage. 28, 319–331. 10.1007/s11269-013-0487-9 (2014).

29.

Shi, B., Wang, P., Jiang, J. & Liu, R. Applying high-frequency surrogate measurements and a wavelet-ANN model to provide early warnings of rapid surface water quality anomalies. Sci. Total Environ. 610–611, 1390–1399. 10.1016/j.scitotenv.2017.08.232 (2018).

30.

Harrison, J. W., Lucius, M. A., Farrell, J. L., Eichler, L. W. & Relyea, R. A. Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression. Sci. Total Environ. 763, 143005. 10.1016/j.scitotenv.2020.143005 (2021).

31.

Shyu, H. Y., Castro, C. J., Bair, R. A., Lu, Q. & Yeh, D. H. Development of a Soft Sensor Using Machine Learning Algorithms for Predicting the Water Quality of an Onsite Wastewater Treatment System. ACS Environ. Au. 3, 308–318. 10.1021/acsenvironau.2c00072 (2023).

32.

Holmberg, M., Forsius, M., Starr, M. & Huttunen, M. An application of artificial neural networks to carbon, nitrogen and phosphorus concentrations in three boreal streams and impacts of climate change. Ecol. Model. 195, 51–60. 10.1016/j.ecolmodel.2005.11.009 (2006).

33.

Cassidy, R. & Jordan, P. Limitations of instantaneous water quality sampling in surface-water catchments: Comparison with near-continuous phosphorus time-series data. J. Hydrol. 405, 182–193. 10.1016/j.jhydrol.2011.05.020 (2011).

34.

Johnes, P. J. Uncertainties in annual riverine phosphorus load estimation: Impact of load estimation methodology, sampling frequency, baseflow index and catchment population density. J. Hydrol. 332, 241–258. 10.1016/j.jhydrol.2006.07.006 (2007).

35.

Piniewski, M., Marcinkowski, P., Koskiaho, J. & Tattari, S. The effect of sampling frequency and strategy on water quality modelling driven by high-frequency monitoring data in a boreal catchment. J. Hydrol. 579, 124186. 10.1016/j.jhydrol.2019.124186 (2019).

36.

Yulianti, M. & Ridwansyah, I. Özkundakci,Denizand Simulating water balance and nutrient losses in a pastoral catchment using the SWAT + model: added value of isotope data. Hydrological Sciences Journal 70, 127–143 (2025). 10.1080/02626667.2024.2423047

37.

Howard-Williams, C., Davies-Colley, R., Rutherford, K. & Wilcock, R. Diffuse pollution and freshwater degradation: New Zealand perspectivesin 16 (OECD, 2010).

38.

Power, J. F., Wiese, R. & Flowerday, D. Managing farming systems for nitrate control: A research review from management systems evaluation areas. J. Environ. Qual. 30, 1866–1880 (2001).

39.

Sahoo, P. K., Kim, K. & Powell, M. A. Managing Groundwater Nitrate Contamination from Livestock Farms: Implication for Nitrate Management Guidelines. Curr. Pollution Rep. 2, 178–187. 10.1007/s40726-016-0033-5 (2016).

40.

Rekha, P. N., Kanwar, R. S., Nayak, A. K., Hoang, C. K. & Pederson, C. H. Nitrate leaching to shallow groundwater systems from agricultural fields with different management practices. J. Environ. Monit. 13, 2550–2558. 10.1039/C1EM10120J (2011).

41.

McDowell, R. W., Simpson, Z. P., Ausseil, A. G., Etheridge, Z. & Law, R. The implications of lag times between nitrate leaching losses and riverine loads for water quality policy. Sci. Rep. 11, 16450. 10.1038/s41598-021-95302-1 (2021).

42.

Morgenstern, U. et al. Using groundwater age and hydrochemistry to understand sources and dynamics of nutrient contamination through the catchment into Lake Rotorua, New Zealand. Hydrol. Earth Syst. Sci. 19, 803–822. 10.5194/hess-19-803-2015 (2015).

43.

Burns, N., McIntosh, J. & Scholes, P. Managing the lakes of the Rotorua District, New Zealand. Lake Reserv. Manag. 25, 284–296. https://doi.org/10.1080/07438140903083815 (2009).

44.

Levine, B. et al. The ability of detainment bunds to decrease surface runoff leaving pastoral catchments: Investigating a novel approach to agricultural stormwater management. Agric. Water Manage. 243, 106423. 10.1016/j.agwat.2020.106423 (2021).

45.

Tanner, C. C., Tomer, M. D. & Goeller, B. C. A framework for applying interceptive mitigations for diffuse agricultural pollution. New. Z. J. Agricultural Res. 0, 1–22. 10.1080/00288233.2023.2252773 (2023).

46.

Acreman, M. & Holden, J. How Wetlands Affect Floods. Wetlands 33, 773–786. 10.1007/s13157-013-0473-2 (2013).

47.

Dosskey, M. G. et al. The Role of Riparian Vegetation in Protecting and Improving Chemical Water Quality in Streams1. JAWRA J. Am. Water Resour. Association. 46, 261–277. https://doi.org/10.1111/j.1752-1688.2010.00419.x (2010).

48.

Macintosh, K. A. et al. Managing Diffuse Phosphorus at the Source versus at the Sink. Environ. Sci. Technol. 52, 11995–12009. 10.1021/acs.est.8b01143 (2018).

49.

McDowell, R. W. & Srinivasan, M. S. Identifying critical source areas for water quality: 2. Validating the approach for phosphorus and sediment losses in grazed headwater catchments. J. Hydrol. 379, 68–80. 10.1016/j.jhydrol.2009.09.045 (2009).

50.

Frankl, A., Nyssen, J., Vanmaercke, M. & Poesen, J. Gully prevention and control: Techniques, failures and effectiveness. Earth. Surf. Proc. Land. 46, 220–238. 10.1002/esp.5033 (2021).

51.

Paepae, T., Bokoro, P. N. & Kyamakya, K. A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques. Sensors 22, 7338. 10.3390/s22197338 (2022).

52.

Bannister, S. et al. Imaging sub-caldera structure with local seismicity, Okataina Volcanic Centre, Taupo Volcanic Zone, using double-difference seismic tomography. J. Volcanol. Geoth. Res. 431, 107653. 10.1016/j.jvolgeores.2022.107653 (2022).

53.

GNS Science. Geology 2.0.0. (2025). https://data.gns.cri.nz/geology

54.

Landcare Research. S-MAP Online. (2021). https://smap.landcareresearch.co.nz/

55.

Chakravarthy, K., Monetti, F. & Tuckey, B. Waihi Estuary - Numerical Modelling to Support Healthy Environments. 84 (2021). https://atlas.boprc.govt.nz/api/v1/edms/document/A4115261/content

56.

Carter, R. et al. Nutrient, Sediment and E. Coli Load Reductions to Meet Desired Target Attribute State Outcomes within Freshwater Management Units in the Bay of Plenty. (2025). https://atlas.boprc.govt.nz/api/v1/edms/document/BOPRC-856456781-112793/content

57.

Atkinson, M. J. & Smith, S. V. C:N:P ratios of benthic marine plants. Limnol. Oceanogr. 28, 568–574. 10.4319/lo.1983.28.3.0568 (1983).

58.

Hicks, S. D., Aufdenkampe, A. K., Montgomery, D. S., Damiano, S. G. & Brooks, H. P. A New Arduino Datalogger Board for Simple, Low Cost Environmental Monitoring and the EnviroDIY Web Community. AGU Fall Meeting Abstracts 23, H23G-1658 (2015).

59.

4500-P phosphorus. in. in Standard methods for the examination of water and wastewater. Vol. 24 (eds Lipps, W. C., Baxter, T. E. & Braun-Howland, E.) (APHA, 2023). 10.2105/SMWW.2882.093

60.

2540 solids. in. in Standard methods for the examination of water and wastewater. Vol. 24 (eds Lipps, W. C., Baxter, T. E. & Braun-Howland, E.) (APHA, 2023). 10.2105/SMWW.2882.030

61.

Zeileis, A. & Grothendieck, G. Zoo:S3Infrastructure for regular and irregular time series. J. Stat. Softw. 14 10.18637/jss.v014.i06 (2005).

62.

Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2016). Springer International Publishing: Imprint10.1007/978-3-319-24277-4 doi: 10.1007/978-3-319-24277-4

63.

Kuhn, M. & Falbel, D. Brulee: High-Level Modeling Functions with ‘Torch’. (2025). https://github.com/tidymodels/brulee

64.

Kuhn, M. & Wickham, H. Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles. (2020).

65.

R Core Team. R: A language and environment for statistical computing (R Foundation for Statistical Computing, 2024).

66.

Khun, M., Vaughan, D. & Ruiz, E. probably: Tools for Post-Processing Class Probability Estimates. (2023).

67.

Lyne, V. & Hollick, M. Stochastic timevariable rainfall-runoff modelling. in Proceedings of the Hydrology and Water Resources Symposium vol. 79/10 89–92Perth, Australia., (1979).

68.

Samsonov, T. River Hydrograph Separation and Analysis. (2023).

69.

Singh, S. K., Pahlow, M., Booker, D. J., Shankar, U. & Chamorro, A. Towards baseflow index characterisation at national scale in New Zealand. J. Hydrol. 568, 646–657. 10.1016/j.jhydrol.2018.11.025 (2019).

70.

Alavi, G. & Tomer, M. D. Estimation of soil hydraulic parameters to simulate water flux in volcanic soils. New Z. J. Forestry Science 31, (1999).

71.

Li, L., Maier, H. R., Lambert, M. F., Simmons, C. T. & Partington, D. Framework for assessing and improving the performance of recursive digital filters for baseflow estimation with application to the Lyne and Hollick filter. Environ. Model. Softw. 41, 163–175. 10.1016/j.envsoft.2012.11.009 (2013).

72.

Pebesma, E. Simple Features for R: Standardized Support for Spatial Vector Data. (2018).

73.

Snelder, T. H. & Biggs, B. J. F. Multiscale River Environment Classification for Water Resources Management. JAWRA J. Am. Water Resour. Association. 38, 1225–1239. 10.1111/j.1752-1688.2002.tb04344.x (2002).

Acknowledgement

This work would not have been possible without assistance and support from the EIS and Science teams at Bay of Plenty Regional Council. Special thanks to: Malcolm Burningham, Rob Donald, Natalie Fenwick, Thomas Grant, Max Mackay, Claire McCorkindale, Angie Perks, and Rigo Sanchez Medina. Thank you also to Troy Baisden was integral in the initial phases of this study.

Funding

This research received in-kind funding from Bay of Plenty Regional Council.

Author Contribution

J.E.D. established the project, carried out the fieldwork, analysed the data, and wrote the manuscript. D.Ö. and R.W.M supervised the project and co-wrote the manuscript.

Competing interests

The authors declare no competing interests

Additional information

Supplementary Information is available for this paper.

Table 1.xlsx

Table 2.xlsx

Yes