“Denoising of Groundwater Level Data Using Wavelet Transform, SSA, and VMD: A Python-Based Approach”

Assistant Professor

K V Sumith 1✉ EmailSumith4121995@gmail.com

1 Department of Civil Engineering Sir M. Visvesvaraya Institute of Technology Bangalore India

K V Sumith¹

¹Assistant Professor, Department of Civil Engineering,

Sir M. Visvesvaraya Institute of Technology, Bangalore, India

¹Sumith4121995@gmail.com

Abstract

The research aims to improve the quality of groundwater level data by utilizing advanced signal decomposition techniques for denoising. Groundwater level measurements can be influenced by environmental noise, sensor errors, and disturbances, potentially concealing true hydrological signals and affecting the accuracy of subsequent analysis. This study compares three popular denoising methods—Wavelet Transform, Singular Spectrum Analysis (SSA), and Variational Mode Decomposition (VMD)—within a Python-based computational setup. The goal is to evaluate how effectively each method removes noise while maintaining critical groundwater level variations. The performance of each technique was evaluated using statistical metrics. The coefficient of determination (R²), Nash-Sutcliffe Efficiency (NSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) for both training and testing data. The Wavelet Transform performed better than the other methods, with R² values of 0.982 for training and 0.954 for testing, along with NSE scores of 0.982 and 0.954. VMD showed good results, with R² scores of 0.949 (training) and 0.835 (testing), and NSE values indicating similar accuracy. SSA was less effective at denoising, with R² of 0.906 for training and 0.590 for testing, showing less ability to generalize to new data. These results demonstrate the superiority of the Wavelet Transform in separating noise from important hydrological signals, leading to cleaner groundwater datasets. The denoised data can significantly improve hydrological modeling, forecasting, and groundwater management. Better data quality supports more precise water resource planning, sustainable use, and risk reduction related to groundwater level changes, making these techniques valuable tools for environmental scientists and water resource managers.

1. Introduction

Groundwater is crucial for supporting ecosystems, agriculture, and household water needs, especially in water-scarce areas. Accurate groundwater level forecasting has gained importance with increasing demand and climate variability. Rainfall significantly influences groundwater levels, particularly during monsoon and drought periods (Jan et al., 2007; Van Gaalen et al., 2013). Understanding how rainfall interacts with subsurface hydrology is vital for sustainable groundwater management and disaster prevention, including landslides and flooding (Hong & Wan, 2011; Wei et al., 2019). Traditional groundwater prediction methods rely heavily on statistical correlations and regression models. Although useful, these methods often fail to capture the nonlinear and complex nature of hydrogeological systems (Nayak et al., 2006; Daliakopoulos et al., 2005). Artificial neural networks (ANNs) marked an advancement in modeling groundwater systems, offering better accuracy by learning patterns from historical data (Tsanis et al., 2008; Chang et al., 2016). Genetic programming approaches have also highlighted the importance of rainfall data and station proximity in improving prediction accuracy (Sadat-Noori et al., 2020). Additionally, hybrid models combining regression and neural networks have improved forecasting by addressing seasonal changes and complex groundwater interactions (Sun, 2013; Hsieh et al., 2019).

Recent advancements in machine learning (ML) have further enhanced the groundwater level forecasting. SVM, random forests, and hybrid KNN-RF models have demonstrated superior performance in various hydrogeological settings (Kombo et al., 2020; Hsieh et al., 2019). Studies in drought-prone and urban areas have examined the effectiveness of ML models for capturing groundwater variability amid changing rainfall patterns (Pham et al., 2022; Yadav et al., 2020). Deep learning methods, including long short-term memory (LSTM) and Recurrent neural networks (RNN), are used to model sequential dependencies in groundwater time series with improved accuracy (Bowes et al., 2019; Chen et al., 2023). Moreover, combining deep learning with regional climate model outputs shows promise for predicting long-term groundwater changes under future climate scenarios (Jang et al., 2015; Kochhar et al., 2022). However, pure data-driven models can struggle with noisy, nonlinear signals typical in environmental data. To improve robustness, signal decomposition techniques have been integrated into prediction frameworks. Wavelet transforms decompose groundwater signals into multiple frequency components, aiding feature extraction for machine learning models (Adamowski & Chan, 2011; Suryanarayana et al., 2014). Singular spectrum analysis (SSA) extracts dominant patterns from groundwater time series for hybrid prediction models (Cui et al., 2024; Patle et al., 2015). Empirical mode decomposition (EMD) and its variants isolate intrinsic modes of variability, enhancing accuracy (Hong, 2017; Seo et al., 2018). Variational Mode Decomposition (VMD), a recent technique, offers improved decomposition by adaptively separating signals into intrinsic modes. When combined with ML techniques like extreme learning machines (ELM), LSTM, and ensemble frameworks, VMD significantly boosts prediction accuracy and stability, especially in nonstationary series (Nazari et al., 2025; Guo et al., 2023). Recent research also explores hybrid methods, multiple decomposition techniques such as CEEMDAN and VMD, to capture complementary temporal features and further improve forecasting (Katipoğlu, 2024; Ladouali et al., 2024).

Despite the promising results from hybrid models integrating signal decomposition and machine learning, there remains a significant need to explore the comparative effectiveness of different decomposition methods on groundwater level prediction. For example, studies have evaluated wavelet, SSA, and empirical mode decomposition (EMD) approaches, but the potential of variational mode decomposition (VMD) as a standalone or combined method requires further investigation (Seo et al., 2018; Seidu et al., 2022). Furthermore, hybrid models employing multiple decomposition techniques sequentially have shown improved predictive capabilities, suggesting that leveraging complementary features can enhance model robustness (Ladouali et al., 2024; Katipoğlu, 2024).

Recent research highlights the benefits of combining signal decomposition with advanced deep learning architectures, such as LSTM networks, to effectively capture temporal dependencies and nonlinear dynamics in groundwater and runoff data (Nazari et al., 2025; Zhang et al., 2023). Ensemble learning frameworks have also shown promise in reducing prediction uncertainties and enhancing the generalizability of groundwater level forecasts, especially in complex hydrological systems influenced by climate variability and human activities (Yadav et al., 2020; Parsaie et al., 2024). Additionally, quantitative studies on the lag effect of rainfall on groundwater response emphasize the importance of considering time delays and seasonal variability within predictive models to improve accuracy (Wang et al., 2024; Natarajan et al., 2022). Coastal and snow-dominated regions pose unique challenges due to additional factors like tides and snowmelt, requiring specialized modeling approaches that incorporate both physical and data-driven techniques (Shiri et al., 2022; Gezici et al., 2024). For example, integrated models that account for tidal influence and rainfall have improved predictions in coastal aquifers, while snowmelt contributions have been successfully integrated using VMD-based hybrid approaches in snow-fed catchments (Shiri et al., 2022; Gezici et al., 2024). To enhance groundwater forecasting at various scales and under diverse climatic conditions, recent studies advocate for combining regional climate modeling, data assimilation techniques, and hybrid machine learning models (Jang et al., 2015; Gonzalez & Arsanjani, 2021). The integration of multiple data sources, including satellite observations, hydrometeorological inputs, and ground-based measurements, is essential for improving prediction reliability (Sun et al., 2022; Gonzalez & Arsanjani, 2021). Furthermore, model adaptability through continuous learning, retraining, and uncertainty quantification remains a key focus for future research (Sapitang et al., 2021; Kajewska-Szkudlarek et al., 2022). Based on these insights, this research titled “Denoising of Groundwater Level Data Using Signal Decomposition Techniques: Wavelet, SSA, and VMD Implemented in Python” aims to develop and compare methods for effectively reducing noise in groundwater level time series using rainfall and other hydroclimatic inputs. The study will implement wavelet transform, singular spectrum analysis (SSA), and variational mode decomposition (VMD) within a Python environment to isolate meaningful signal components from noisy observations. This research seeks to fill existing gaps by systematically evaluating the performance of each decomposition technique, exploring their impact on signal clarity, robustness, and adaptability across different hydroclimatic conditions. It also aims to optimize decomposition parameters using Python’s flexible computational libraries to create scalable and reproducible denoising workflows. Ultimately, the study aims to improve the reliability of groundwater level datasets, supporting better-informed decisions for sustainable water resource planning and reducing uncertainty caused by noisy measurements.

2. Study Area

The current study focuses on Bagalkote District, located in northern Karnataka, India. Geographically, it is situated roughly at latitude 16.3472°N and longitude 75.6222°E. Bagalkote lies in a mainly semi-arid region marked by variable rainfall, frequent droughts, and a heavy reliance on groundwater for agriculture and domestic use. The district is predominantly rural and agriculturally driven, with crops like sugarcane, cotton, and pulses being common. It faces challenges such as falling groundwater levels, over-extraction, and limited recharge due to seasonal rainfall fluctuations. One key reason for choosing this district is the lack of long-term, continuous Groundwater Level (GWL) data. Gaps and inconsistencies in the data set pose both problems and opportunities for research, emphasizing the need for data-efficient modeling approaches and the development of predictive or interpolative tools to address water resource data gaps. Additionally, the groundwater stress in the area, combined with data limitations, makes it a suitable site for testing methods that can operate under real-world conditions, where monitoring infrastructure may be sparse or inconsistent. Therefore, Bagalkote serves as an ideal location for research aimed at improving groundwater data management, modeling, and strategies in regions with limited data.

3. Methodology

The study collects groundwater level and rainfall data from May 2013 to December 2024 for the Bagalkote district. To address missing data, advanced signal decomposition techniques such as Wavelet Transform, Singular Spectrum Analysis, and Variational Mode Decomposition are applied to pre-process and reconstruct datasets. These techniques extract meaningful patterns and reduce noise, enabling effective prediction of groundwater levels using rainfall as a key input.

3.1. Data Collection and Pre-processing:

This study utilizes monthly rainfall and groundwater level (GWL) data for the Bagalkote district spanning from May 2013 to December 2024. Rainfall data is collected from Indian meteorological sources(https://www.imdpune.gov.in/), while GWL data is obtained from the India Water Resources Information System (https://indiawris.gov.in/wris/#/). Due to occasional missing or inconsistent GWL records, advanced signal decomposition techniques—Wavelet Transform, Singular Spectrum Analysis (SSA), and Variational Mode Decomposition (VMD)—are applied to pre-process and reconstruct the datasets, as discussed below. These methods help extract key temporal patterns and reduce noise, facilitating accurate prediction of groundwater levels based on rainfall inputs. All data processing and modeling are implemented in Python.

3.1.1. Min-Max Normalization

Min-max normalization is a statistical technique for scaling the data within a specific range, typically between 0 and 1. It transforms the original values by subtracting the minimum value and dividing by the range (max - min) of the data, ensuring that all features have equal influence on the model by eliminating the effects of varying scales (Sidle, R C. et al., 2021). Eq. 1

$\:{\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:X}^{{\prime\:}}=\frac{X-{X}_{\text{min}}}{{X}_{\text{max}}-{X}_{\text{min}}}$

Where:

$\:X$

is the original value.

$\:{X}_{\text{min}}$

is the minimum value in the dataset.

$\:{X}_{\text{max}}$

is the maximum value in the dataset.

$\:{X}^{{\prime\:}}$

is the normalized value.

3.2. Wavelet Transform, Singular Spectrum Analysis (SSA), and Variational Mode Decomposition (VMD)

Groundwater level (GWL) prediction is performed using rainfall data as the primary input, with signal decomposition techniques-Wavelet Transform, SSA, and VMD—enhancing the quality of the GWL time series. These methods improve the model’s ability to capture complex patterns and trends, enabling more accurate GWL forecasting in data-sparse conditions.

3.2.1. Wavelet Transforms

Wavelet Transform is a powerful signal processing technique that decomposes a time series into components at multiple scales, capturing both time and frequency information. It separates low-frequency trends (approximations) from high-frequency fluctuations (details), enabling effective noise reduction and feature extraction in non-stationary data such as groundwater levels.

Let the GWL time series be denoted by: Eq. 2

Y(t)= [y1, y2,..., yn] (2)

Decompose the GWL signal into multiple frequency components using a Discrete Wavelet Transform (DWT). Eq. 3

$\:Y\left(t\right)\text{-----}{A}_{L}\left(t\right),{D}_{1}\left(t\right),{D}_{2}\left(t\right),\dots\:,{D}_{L}\left(t\right)$

Y(t) applied DWT: Discrete Wavelet Transform, used for time-frequency analysis.

Where:

A_L(t) = Approximation at level L (low-frequency trends)

D_i(t) = Detail components at level i(high-frequency noise and variations)

Mathematically, the decomposition can be expressed as: Eq. 4

$\:Y\left(t\right)={A}_{L}\left(t\right)+{\sum\:}_{i=1}^{L}{D}_{i}\left(t\right)$

Remove or suppress high-frequency noise by zeroing or modifying the highest-level detail components:

$\:{D}_{L}\left(t\right)=0,\hspace{1em}{D}_{L-1}\left(t\right)=0$

This eliminates fine-scale noise, retaining only the significant structure of the signal.

Reconstruct the denoised signal using the inverse wavelet transform: Eq. 5

$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\widehat{Y}\left(t\right)=\text{IDWT}\left({A}_{L}\left(t\right),{D}_{1}\left(t\right),{D}_{2}\left(t\right),\dots\:,{D}_{L}\left(t\right)\right)$

Where high-frequency components like

$\:{D}_{L}\left(t\right),{D}_{L-1}\left(t\right)$

may be removed, Eq. 6

$\:\widehat{Y}\left(t\right)={A}_{L}\left(t\right)+{\sum\:}_{i=1}^{L-2}{D}_{i}\left(t\right)$

This reconstructed

$\:\widehat{Y}\left(t\right)$

is a cleaned version of the original GWL signal.

Use the rainfall time series X(t) as the independent variable to model and predict

$\:\widehat{Y}\left(t\right)$

, The denoised GWL: Eq. 7

$\:\widehat{Y}\left(t\right)=f\left(X\left(t\right)\right)=Y\left(t\right)$

Where f represents the trained prediction model mapping rainfall to groundwater level.

3.2.2. Singular Spectrum Analysis (SSA)

Singular Spectrum Analysis (SSA) is a data-driven technique used for time series decomposition that separates a series into trend, periodic, and noise components. It involves embedding the data into a trajectory matrix, performing singular value decomposition (SVD), and reconstructing the series from dominant components, making it useful for denoising and extracting meaningful patterns in groundwater level data.

Let the normalized groundwater level time series be Eq. 8

Y= [y1, y2,..., yn] (8)

And rainfall time series as: Eq. 9

X= [x1, x2,..., xn] (9)

n = Number of observations.

Choose a window length L and embed y into a trajectory matrix X of size L×K, where K = n − L + 1:

$\:\begin{array}{ccc}\text{y}1&\:---&\:\text{y}\text{k}\\\:--&\:---&\:---\\\:\text{y}&\:\text{y}\text{L}+1&\:\text{y}\text{n}\end{array}$

This step transforms the 1D time series into a 2D matrix for analysis.

Decompose the trajectory matrix: Eq. 10

$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:X=US{V}^{T}$

Where,

U: left singular vectors

S: diagonal matrix of singular values

V^T: right singular vectors

Select dominant components i∈{i₁,i₂,…} based on singular values (e.g., first two components):

$\:\{{i}_{1},{i}_{2}\}=\left\{\text{0,1}\right\}$

Reconstruct the denoised groundwater level

$\:\widehat{y}\left(t\right)$

using only the selected components: Eq. 11

$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\widehat{Y}\left(t\right)={\sum\:}_{i\in\:\text{selected}}\text{Hankelize}\left({s}_{i}\cdot\:{U}_{i}\cdot\:{V}_{i}^{T}\right)$

"Hankelization" means converting the matrix back into a 1D time series by averaging over anti-diagonals.

Using rainfall x(t) as input and denoised GWL as target

$\:\widehat{y}\left(t\right)$

, train a predictive model: Eq. 12

$\:\widehat{Y}\left(t\right)=f\left(X\left(t\right)\right)=Y\left(t\right)$

Where f represents the trained prediction model mapping rainfall to groundwater level.

3.2.3. Variational Mode Decomposition (VMD)

Variational Mode Decomposition (VMD) is an adaptive signal processing technique that decomposes a time series into a set of band-limited intrinsic mode functions (IMFs). It separates the original groundwater level signal into components of varying frequencies, enabling the removal of noise and the extraction of dominant trends. The denoised signal is then used for modeling and predicting groundwater levels based on rainfall inputs.

Let the normalized groundwater level time series be Eq. 13

Y= [y1, y2,..., yn] (13)

And rainfall time series as: Eq. 14

X= [x1, x2,..., xn] (14)

n = Number of observations.

VMD decomposes the signal y(t) into K Intrinsic Mode Functions (IMFs), each representing oscillatory components at different frequency bands: Eq. 15

$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:y\left(t\right)\to\:{\sum\:}_{k=1}^{K}{u}_{k}\left(t\right)$

Where:

$\:{u}_{k}\left(t\right)$

the kth mode (IMF) of the original signal

K: number of modes (user-defined, e.g., 4–6)

The goal is to minimize the total bandwidth of each mode, subject to the constraint that their sum reconstructs the original signal.

Mathematically, the VMD problem is formulated as: Eq. 16

$\:\:\:\:\:\:\:\:\:\:\underset{\left\{{u}_{k}\right\},\left\{{\omega\:}_{k}\right\}}{\text{min}}\left\{{\sum\:}_{k=1}^{K}|{\partial\:}_{t}\left[\left(\delta\:\left(t\right)+\frac{\pi\:}{tj}\right)*{u}_{k}\left(t\right)\cdot\:{e}^{-j{\omega\:}_{k}t}\right]{|}_{2}^{2}\right\}$

Subject to: Eq. 17

$\:{\sum\:}_{k=1}^{K}{u}_{k}\left(t\right)=y\left(t\right)$

Where:

ωk is the center frequency of each mode

∗ denotes convolution

∥⋅∥22 is the squared L₂ -norm (energy)

Once the decomposition is complete, reconstruct the denoised groundwater level signal

$\:\widehat{y}\left(t\right)$

using the relevant IMFs: Eq. 18

$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\widehat{y}\left(t\right)={\sum\:}_{k\in\:\text{selected}}{u}_{k}\left(t\right)$

Typically, you select the low-frequency IMFs that capture trends and discard the noisy high-frequency components.

Train a model that maps rainfall x(t) to the denoised GWL

$\:\widehat{y}\left(t\right)$

. Eq. 19

$\:\widehat{Y}\left(t\right)=f\left(X\left(t\right)\right)=Y\left(t\right)$

Where f represents the trained prediction model mapping rainfall to groundwater level.

3.3. Key Evaluation Metrics for Model Performance in Groundwater Level Prediction

In this study, the performance of various models developed for predicting groundwater levels was assessed using established statistical evaluation metrics. These indicators help in quantifying the models’ accuracy, consistency, and ability to generalize across varying datasets. The primary metrics used are described below:

3.3.1. R² (Coefficient of Determination)

The R² value indicates the model's ability to explain observed data variability, with a score closer to 1 indicating a strong correlation. Eq. 20

$\:{\text{R}}^{2}=1-\frac{\sum\:{\left({\text{y}}_{\text{i}}-\widehat{{\text{y}}_{\text{i}}}\right)}^{2}}{\sum\:{\left({\text{y}}_{\text{i}}-\stackrel{-}{\text{y}}\right)}^{2}}$

Where:

$\:{y}_{i}$

is the actual value,

$\:\widehat{{y}_{i}}$

$\:\:is$

the predicted value,

$\:\stackrel{-}{y}$

is the mean of the actual values.

R² = 1 means the model explains all the variance in the data (perfect fit).

R² = 0 means the model explains none of the variance in the data.

3.3.2. RMSE (Root Mean Squared Error)

RMSE, the square root of MSE, estimates the average prediction error in the same unit as the target variable, like groundwater level in meters, and is sensitive to large errors, often used in model evaluation. Eq. 21

$\:\:\text{RMSE}=\sqrt{\frac{1}{\text{n}}{\sum\:}_{\text{i}=1}^{\text{n}}{\left({\text{y}}_{\text{i}}-\widehat{{\text{y}}_{\text{i}}}\right)}^{2}}$

Where:

$\:{y}_{i}\:$

is the actual value,

$\:\widehat{{y}_{i}}\:$

is the predicted value,

and n is the number of data points.

Lower RMSE values indicate a better fit of the model to the data.

3.3.3. MSE (Mean Squared Error)

MSE is a fundamental metric that calculates the average of squared differences between predicted and observed values, penalizing larger errors more significantly. Despite lacking interpretability in units, MSE is commonly used for model optimization and comparison.Eq. 22

$\:\text{\:\:\:\:\:\:\:\:\:MSE}=\frac{1}{n}{\sum\:}_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}$

3.3.4. MAE (Mean Absolute Error)

MAE is a measure of prediction accuracy that measures the mean absolute difference between predicted and observed values, making it less sensitive to outliers. Eq. 23

$\:\text{\:\:\:\:\:\:MAE}=\frac{1}{n}{\sum\:}_{i=1}^{n}\left|{y}_{i}-\widehat{{y}_{i}}\right|$

Where:

∣

$\:{y}_{i}-\widehat{{y}_{i}\mid\:}$

is the absolute error for each observation.

Smaller MAE values indicate improved model accuracy and easier interpretation due to their alignment with the data.

3.3.5. Nash–Sutcliffe Efficiency (NSE)

Nash–Sutcliffe Efficiency (NSE) measures how well a model’s predictions match observed data. Values close to 1 indicate good accuracy, while values near or below 0 show poor performance. It compares prediction errors to the variability of observed data. Eq. 24

$\:\text{\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:NSE}=1-\frac{{\sum\:}_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}{{\sum\:}_{i=1}^{n}{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}$

4. Results and Discussion

In this study, three distinct signal denoising methods—Wavelet Transform (WT), Singular Spectrum Analysis (SSA), and Variational Mode Decomposition (VMD)—were used to improve the quality of groundwater level (GWL) time series data. These techniques aim to reduce short-term noise while maintaining the key features of the signal that indicate hydrogeological processes. To assess the performance of each denoising method, the GWL dataset was split into two subsets: 70% for training and 30% for testing. The ratio balances data learning from sources while keeping some for reliable generalization, while maintaining the chronological order of the dataset, crucial in groundwater studies involving trends and lags.

4.1. Denoising of Groundwater Level Data Using Wavelet Transform

Figure 1 shows the observed groundwater level (GWL) data alongside the Wavelet Transform denoised series, clearly delineating training and testing periods. Figure 2 compares the observed and Wavelet-denoised groundwater levels, with the coefficient of determination (R²) quantifying their close alignment. Table 1 presents the evaluation metrics, where the Wavelet Transform achieves an R² of 0.982 for training and 0.954 for testing, indicating strong consistency between the denoised and original signals. The low Root Mean Square Error (RMSE) values (0.229 for training, 0.206 for testing) and Mean Absolute Error (MAE) values (0.165 for training, 0.147 for testing) further confirm the method’s effectiveness in reducing noise while maintaining signal fidelity. Additionally, the Nash-Sutcliffe Efficiency (NSE) scores align with R² values, reinforcing the robustness of the denoising process. The superior performance of the Wavelet Transform can be attributed to its multi-resolution capability, which decomposes the groundwater level data into different frequency components. Allows effective separation of noise—primarily in higher frequency bands—from meaningful hydrological signals concentrated in lower frequencies. Consequently, Wavelet-based denoising preserves important trends and fluctuations critical for groundwater analysis, while eliminating random noise. The balanced reduction of errors across training and testing datasets also suggests good generalizability and stability of this approach. Overall, these results validate the suitability of the Wavelet Transform for denoising groundwater level data, providing cleaner datasets that are essential for reliable hydrological assessments and resource management.

Figs. 1

and 2: Visualization of observed groundwater levels alongside Wavelet Transform denoised data, showing training/testing periods

4.2. Denoising of Groundwater Level Data Using Singular Spectrum Analysis (SSA)

Figure 3 presents the observed groundwater level (GWL) data alongside the Singular Spectrum Analysis (SSA) denoised series, highlighting the division between training and testing periods. Figure 4 compares the observed and SSA-denoised groundwater levels, with the coefficient of determination (R²) illustrating their alignment. Table 1 summarizes the evaluation metrics for SSA, showing an R² of 0.906 for training but a noticeably lower 0.590 for testing. The Root Mean Square Error (RMSE) values are higher than those for Wavelet Transform, with 0.522 for training and 0.614 for testing. Similarly, the Mean Absolute Error (MAE) values (0.394 train, 0.345 test) reflect less precise noise reduction. The Nash-Sutcliffe Efficiency (NSE) scores correspond with these trends, indicating weaker performance in capturing groundwater dynamics, especially on unseen data. The relatively lower performance of SSA may be due to its sensitivity to the choice of window length and components used for reconstruction. While SSA effectively extracts dominant patterns in the training data, it may inadequately filter noise or capture complex fluctuations in the testing phase. This suggests that SSA’s denoising capability might be less robust or generalizable for the groundwater level data considered. Nevertheless, SSA still manages to reduce noise to some extent, but with less accuracy compared to the Wavelet Transform. Overall, these results indicate that although SSA can be used for groundwater level denoising, its performance is less consistent and may require careful parameter tuning to improve generalizability.

Figs. 3

and 4: Visualization of observed groundwater levels alongside Singular Spectrum Analysis (SSA)

denoised data, showing training/testing periods

4.3. Denoising of Groundwater Level Data Using Variational Mode Decomposition (VMD)

Figure 5 illustrates the observed groundwater level (GWL) data alongside the Variational Mode Decomposition (VMD) denoised series, clearly distinguishing the training and testing periods. Figure 6 compares the observed and VMD-denoised groundwater levels, with the coefficient of determination (R²) reflecting their strong alignment. Table 1 summarizes the performance metrics for VMD, which shows an R² of 0.949 for training and 0.835 for testing, indicating a robust fit with the original data. The Root Mean Square Error (RMSE) values, 0.384 for training and 0.389 for testing, are moderately low, while the Mean Absolute Error (MAE) values (0.286 train, 0.264 test) further demonstrate effective noise reduction. The Nash-Sutcliffe Efficiency (NSE) scores, closely matching the R² values, confirm the reliability of the denoising process. The strong performance of VMD can be attributed to its adaptive decomposition of the groundwater level data into intrinsic mode functions, which allows for effective separation of noise components from the underlying signal. Unlike fixed basis methods, VMD flexibly captures non-stationary features, preserving important hydrological trends while filtering out noise. Although its performance is slightly lower than the Wavelet Transform, VMD outperforms SSA in both accuracy and generalizability, as shown by the higher R² and NSE and lower error metrics. These results highlight VMD as a promising method for groundwater level denoising, offering a good balance between noise suppression and signal preservation, which is critical for subsequent hydrological analysis and water resource management.

Figs. 5

and 6: Visualization of observed groundwater levels alongside Variational Mode Decomposition (VMD)

denoised data, showing training/testing periods

Table 1

Comparison of Model Performance Using Different Decomposition Techniques
Method	R2_Train	R2_Test	RMSE_Train	RMSE_Test	MAE_Train	MAE_Test	MSE_Train	MSE_Test	NSE_Train	NSE_Test
Wavelet	0.98	0.95	0.23	0.21	0.17	0.15	0.05	0.04	0.982	0.954
SSA	0.91	0.59	0.52	0.61	0.40	0.35	0.27	0.37	0.906	0.590
VMD	0.95	0.84	0.39	0.39	0.29	0.26	0.15	0.151	0.949	0.835

5. Conclusion and Applications

This research evaluated and compared the effectiveness of three signal decomposition techniques—Wavelet Transform, Variational Mode Decomposition (VMD), and Singular Spectrum Analysis (SSA)—for denoising groundwater level data. The results clearly show that the Wavelet Transform provides the most reliable denoising performance, demonstrated by its highest coefficient of determination (R²), Nash-Sutcliffe Efficiency (NSE), and the lowest error metrics (RMSE and MAE) in both training and testing datasets. The multi-resolution nature of the Wavelet Transform allows it to effectively separate noise components present at higher frequencies from meaningful hydrological signals in lower frequencies, preserving important groundwater trends and fluctuations. VMD also showed strong potential, with robust adaptability in decomposing non-stationary signals and delivering better denoising performance than SSA, which was comparatively less effective, especially in testing data, likely due to its sensitivity to parameter selection and limited ability to generalize complex noise patterns. These findings confirm the suitability of Wavelet Transform as the preferred denoising tool for groundwater level data, while highlighting VMD as a competitive alternative for handling non-linear and non-stationary time series. Denoising groundwater level data plays a vital role in hydrological studies and water resource management. Clean datasets are crucial for improving the accuracy of groundwater modeling, trend analysis, drought prediction, and recharge assessment. By removing noise and preserving key fluctuations, these denoising techniques enable more reliable input data for machine learning models and physical simulation tools. Specifically, the Wavelet Transform’s capacity to maintain signal integrity makes it highly valuable for monitoring groundwater responses to climatic variability, human impacts, and seasonal changes. Enhanced data quality can also improve early warning systems for groundwater depletion and support sustainable groundwater extraction policies. Moreover, these techniques facilitate better integration of groundwater monitoring networks, leading to more efficient water resource planning and management. Building on these findings, future research could explore hybrid denoising frameworks that combine the strengths of Wavelet Transform, VMD, and SSA. For instance, hybrid models could leverage Wavelet’s multi-resolution analysis with VMD’s adaptability to non-stationary components, potentially achieving superior noise removal while preserving complex hydrological patterns. Additionally, coupling these signal decomposition methods with advanced machine learning algorithms—such as deep learning or ensemble approaches—may optimize denoising by automatically tuning parameters and adapting to evolving data characteristics. Further investigation into adaptive parameter selection techniques is also necessary to improve the robustness and generalizability of SSA and VMD across diverse hydroclimatic conditions. Extending this work to real-time groundwater monitoring systems would enable continuous data cleaning, supporting timely management decisions and rapid responses to groundwater fluctuations. Finally, expanding the analysis to include other noise sources, such as human interference or sensor errors, can broaden the applicability of these denoising methods in practical water resource monitoring scenarios.

References

Adamowski J, Chan HF (2011) A wavelet neural network conjunction model for groundwater level forecasting. J Hydrol 407(1–4):28–40

Ahmadi F, Tohidi M, Sadrianzade M (2023) Streamflow prediction using a hybrid methodology based on variational mode decomposition (VMD) and machine learning approaches. Appl Water Sci 13(6):135

Bowes BD, Sadler JM, Morsy MM, Behl M, Goodall JL (2019) Forecasting groundwater table in a flood prone coastal city with long short-term memory and recurrent neural networks. Water 11(5):1098

Chang FJ, Chang LC, Huang CW, Kao IF (2016) Prediction of monthly regional groundwater levels through hybrid soft-computing techniques. J Hydrol 541:965–976

Chen HY, Vojinovic Z, Lo W, Lee JW (2023) Groundwater level prediction with deep learning methods. Water 15(17):3118

Cui X, Wang Z, Xu N, Wu J, Yao Z (2024) A secondary modal decomposition ensemble deep learning model for groundwater level prediction using multi-data. Environ Model Softw 175:105969

Daliakopoulos IN, Coulibaly P, Tsanis IK (2005) Groundwater level forecasting using artificial neural networks. J Hydrol 309(1–4):229–240

Gezici K, Katipoğlu OM, Şengül S (2024) Hybrid machine learning models for groundwater level prediction in a snow-dominated region: An evaluation of EEMD, VMD and EWT decomposition techniques. Hydrol Process 38(5):e15169

Gonzalez RQ, Arsanjani JJ (2021) Prediction of groundwater level variations in a changing climate: a Danish case study. ISPRS Int J Geo-Information 10(11):792

Guo Z, Zhang QQ, Li N, Zhai YQ, Teng WT, Liu SS, Ying GG (2023) Runoff time series prediction based on hybrid models of two-stage signal decomposition methods and LSTM for the Pearl River in China. Hydrol Res 54(12):1505–1521

Hong YM (2017) Feasibility of using artificial neural networks to forecast groundwater levels in real time. Landslides 14(5):1815–1826

Hong YM, Wan S (2011) Forecasting groundwater level fluctuations for rainfall-induced landslide. Nat Hazards 57:167–184

Hsieh PC, Tong WA, Wang YC (2019) A hybrid approach of artificial neural network and multiple regression to forecast typhoon rainfall and groundwater-level change. Hydrol Sci J 64(14):1793–1802

Jang S, Hamm SY, Yoon H, Kim GB, Park JH, Kim M (2015) Predicting long-term change of groundwater level with regional climate model in South Korea. Geosci J 19:503–513

Jan CD, Chen TH, Lo WC (2007) Effect of rainfall intensity and distribution on groundwater level fluctuations. J Hydrol 332(3–4):348–360

Kajewska-Szkudlarek J, Kubicz J, Kajewski I (2022) Correlation approach in predictor selection for groundwater level forecasting in areas threatened by water deficits. J Hydroinformatics 24(1):143–159

Katipoğlu OM (2024) Integration of extreme learning machines with CEEMDAN and VMD techniques in the prediction of the multiscalar standardized runoff index and standardized precipitation evapotranspiration index. Nat Hazards 120(1):825–849

Kochhar A, Singh H, Sahoo S, Litoria PK, Pateriya B (2022) Prediction and forecast of pre-monsoon and post-monsoon groundwater level: using deep learning and statistical modelling. Model Earth Syst Environ 8(2):2317–2329

Kombo OH, Kumaran S, Sheikh YH, Bovim A, Jayavel K (2020) Long-term groundwater level prediction model based on hybrid KNN-RF technique. Hydrology 7(3):59

Ladouali S, Katipoğlu OM, Bahrami M, Kartal V, Sakaa B, Elshaboury N, Elbeltagi A (2024) Short lead time standard precipitation index forecasting: Extreme learning machine and variational mode decomposition. J Hydrology: Reg Stud 54:101861

Natarajan VA, Tamizhazhagan V, Tangudu N, Kumar MS (2022) Analysis of groundwater level fluctuations and its association with rainfall using statistical methods. J Algebraic Stat 13(3):1895–1904

Nazari A, Jamshidi M, Roozbahani A, Golparvar B (2025) Groundwater level forecasting using empirical mode decomposition and wavelet-based long short-term memory (LSTM) neural networks. Groundw Sustainable Dev 28:101397

Nayak PC, Rao YS, Sudheer KP (2006) Groundwater level forecasting in a shallow aquifer using artificial neural network approach. Water Resour Manage 20:77–90

Parsaie A, Ghasemlounia R, Gharehbaghi A, Haghiabi A, Chadee AA, Nou MRG (2024) Novel hybrid intelligence predictive model based on successive variational mode decomposition algorithm for monthly runoff series. J Hydrol 634:131041

Patle GT, Singh DK, Sarangi A, Rai A, Khanna M, Sahoo RN (2015) Time series analysis of groundwater levels and projection of future trend. J Geol Soc India 85:232–242

Pham QB, Kumar M, Di Nunno F, Elbeltagi A, Granata F, Islam ARMT, Anh DT (2022) Groundwater level prediction using machine learning algorithms in a drought-prone area. Neural Comput Appl 34(13):10751–10773

Sapitang M, Ridwan WM, Ahmed AN, Fai CM, El-Shafie A (2021) Groundwater level as an input to monthly predicting of water level using various machine learning algorithms. Earth Sci Inf 14(3):1269–1283

Sadat-Noori M, Glamore W, Khojasteh D (2020) Groundwater level prediction using genetic programming: the importance of precipitation data and weather station location on model accuracy. Environ Earth Sci 79:1–10

Seidu J, Ewusi A, Kuma JSY, Ziggah YY, Voigt HJ (2022) A hybrid groundwater level prediction model using signal decomposition and optimised extreme learning machine. Model Earth Syst Environ 8(3):3607–3624

Seo Y, Kim S, Singh VP (2018) Machine learning models coupled with variational mode decomposition: a new approach for modeling daily rainfall-runoff. Atmosphere 9(7):251

Shiri J, Kisi O, Yoon H, Kazemi MH, Shiri N, Poorrajabali M, Karimi S (2022) Prediction of groundwater level variations in coastal aquifers with tide and rainfall effects using heuristic data driven models. ISH J Hydraulic Eng 28(sup1):188–198

Sidley RC (2021) Strategies for smarter catchment hydrology models: Incorporating scaling and better process representation. Geoscience Lett 8(1):24

Sun AY (2013) Predicting groundwater level changes using GRACE data. Water Resour Res 49(9):5900–5912

Sun J, Hu L, Li D, Sun K, Yang Z (2022) Data-driven models for accurate groundwater level prediction and their practical significance in groundwater management. J Hydrol 608:127630

Suryanarayana C, Sudheer C, Mahammood V, Panigrahi BK (2014) An integrated wavelet-support vector machine for groundwater level prediction in Visakhapatnam, India. Neurocomputing 145:324–335

Tao H, Hameed MM, Marhoon HA, Zounemat-Kermani M, Heddam S, Kim S, Yaseen ZM (2022) Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 489:271–308

Tsanis IK, Coulibaly P, Daliakopoulos IN (2008) Improving groundwater level forecasting with a feedforward neural network and linearly regressed projected precipitation. J Hydroinformatics 10(4):317–330

Van Gaalen JF, Kruse S, Lafrenz WB, Burroughs SM (2013) Predicting water table response to rainfall events, central Florida. Groundwater 51(3):350–362

Wang Y, Guo F, Chen S, Zhang H, Zhang Z, Li A (2024) Quantitative study of rainfall lag effects and integration of machine learning methods for groundwater level prediction modelling. Hydrol Process 38(5):e15171

Wei X, Chen M, Zhou Y, Zou J, Ran L, Shi R (2024) Research on optimal selection of runoff prediction models based on coupled machine learning methods. Sci Rep 14(1):32008

Wei ZL, Lü Q, Sun HY, Shang YQ (2019) Estimating the rainfall threshold of a deep-seated landslide by integrating models for predicting the groundwater level and stability analysis of the slope. Eng Geol 253:14–26

Yadav B, Gupta PK, Patidar N, Himanshu SK (2020) Ensemble modelling framework for groundwater level prediction in urban areas of India. Sci Total Environ 712:135539

Zhang X, Chen H, Wen Y, Shi J, Xiao Y (2023) A new water level prediction model based on ESMD – VMD – WSD – ESN. Stoch Env Res Risk Assess 37(8):3221–3241

Statement & Declarations

Ethical Approval Ethical approval was not necessary for this study, as it did not involve human participants or the use of animal data

Consent to Participate Not applicable, as this study did not involve any human participants

Consent to Publish Not applicable This study does not include any individual data requiring consent for publication

Author’s Contribution

The author 1. [K V Sumith] conceptualized and designed the study,

performed the analysis using Python, and wrote the manuscript.

Funding

The author declares that no funding was received for this research.

Competing Interests

• The author declares no competing interests.

Availability of Data and Materials

The data utilized in this study are publicly accessible through the India WRIS.