Evaluating Adversarial Robustness of AI Intrusion Detection Systems Using Automated Traffic Generation

Samer Aoudi, Hussain Al-Aqrabi

Department of Computer Information Science, Higher Colleges of Technology, Sharjah, UAE

Abstract

Artificial intelligence (AI) has substantially advanced intrusion detection systems (IDS) by enabling scalable and adaptive analysis of network flows, yet these models remain vulnerable to adversarial manipulation. This study develops a comprehensive and reproducible framework for evaluating the adversarial robustness of machine learning-based IDS under realistic black-box threat conditions. Using CICIDS2017 as the primary benchmark, four representative IDS models, Random Forest, Logistic Regression, Multilayer Perceptron, and CNN1D, are trained and assessed under clean conditions and multiple adversarial scenarios, including surrogate-based FGSM and PGD perturbations, as well as HopSkipJump (HSJA) and Zeroth-Order Optimization (ZOO) black-box attacks. The results reveal substantial differences in robustness across model families: Random Forest remains consistently stable across all attacks, whereas the MLP exhibits severe performance degradation under PGD transfer. Logistic Regression and CNN1D show mixed susceptibility depending on attack strength and feature sensitivity. To evaluate generalization under distribution shift, the models are further tested on the CICIDS2018 Friday slice, showing limited cross-dataset transferability of adversarial examples. These findings reveal significant differences between white-box and operational black-box vulnerability and demonstrate that adversarial robustness depends strongly on model architecture and dataset alignment. The proposed evaluation methodology provides a practical basis for integrating adversarial stress testing into IDS development and deployment workflows.

Keywords:

adversarial machine learning

network intrusion detection

artificial intelligence

cybersecurity

flow features

model robustness

transfer-based attacks

black-box evasion

1. Introduction

Artificial intelligence (AI) has become integral to modern cybersecurity, particularly in intrusion detection systems (IDS) designed to identify malicious activity within increasingly complex and dynamic network environments. Traditional rule-based IDS rely on static signatures or handcrafted heuristics, making them effective against known threats but limited in detecting zero-day attacks, polymorphic malware, and subtle behavioral anomalies [1] These limitations have motivated the transition toward machine learning (ML)-based IDS, which learn statistical patterns from network flows and offer greater adaptability in detecting previously unseen intrusions [2,3]. Despite these advances, AI-driven IDS introduce new security challenges, most notably the susceptibility of ML models to adversarial manipulation.

Recent studies in adversarial machine learning (AML) demonstrate that even small, carefully crafted perturbations can cause ML models to misclassify malicious traffic as benign while preserving domain-level semantics [6]. In flow-based IDS, adversarial examples may involve subtle modifications to packet counts, byte volumes, inter-arrival times, and directionality, changes that remain plausible within real network conditions yet significantly degrade classifier performance. This vulnerability has raised concerns about the operational reliability of AI-based IDS, particularly when adversaries operate under realistic constraints and lack full access to model internals.

1.1. Importance of AI in Intrusion Detection

AI-based IDS have improved detection capabilities through supervised learning, anomaly detection, and automated feature extraction. Classical models such as Random Forests and Logistic Regression offer transparent and computationally efficient baselines [1,7], while deep architectures, including multilayer perceptrons and convolutional neural networks, enable richer nonlinear representations of traffic flows [2, 8]. These approaches, combined with datasets such as CICIDS2017 and UNSW-NB15, have established strong performance benchmarks across diverse attack categories [9,10].

However, the effectiveness of these systems remains tightly coupled to data quality and robustness. Prior analyses highlight labeling inconsistencies, feature artifacts, and sampling biases in common IDS datasets [10,11] all of which may distort classifier behavior and amplify vulnerability to adversarial manipulation. As AI becomes increasingly embedded in high-risk domains, ranging from cybersecurity to education [12], ensuring robustness and trustworthiness is critical to preventing harmful decision failures.

1.2. Adversarial Machine Learning Threat

Adversarial machine learning represents a significant threat to AI-based IDS. In white-box scenarios, adversaries with full model access leverage gradient-based attacks such as the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) to produce highly effective perturbations [13,14]. In contrast, black-box adversaries infer decision boundaries through probing, statistical approximation, or surrogate modeling [15,16]. Because production IDS rarely expose architectural details or model outputs beyond final decisions, black-box threats are more representative of real-world attacker capabilities [17].

The broader cybersecurity landscape further amplifies these risks. Recent advances in generative AI enable adversaries to create realistic deepfake-based cyber threats [18], synthesize AI-generated malware with adaptive behaviors [19], and automate vulnerability discovery [20]. These developments illustrate the increasing sophistication of offensive AI tools and reinforce the need to rigorously evaluate IDS performance under adversarial conditions.

Emerging AML research also critiques the overreliance on gradient-dependent evaluations that assume unrealistic attacker capabilities. Consequently, transfer-based black-box attacks, where adversaries train local surrogate models to approximate the target IDS, have become a preferred methodology for assessing operationally relevant robustness [21]. Complementary approaches, such as GAN-based traffic generation [22], reinforcement learning-driven evasion [23], and protocol-aware semantic constraints, further emphasize realism and plausibility in adversarial perturbations.

1.3. Problem Statement

Despite widespread research on ML-based IDS, there remains a significant gap in understanding their adversarial robustness under realistic operational constraints. Many existing studies emphasize clean accuracy without examining model fragility under targeted perturbations or distribution shift. Furthermore, adversarial transferability across models and datasets remains underexplored, even though real IDS deployments routinely encounter heterogeneous traffic conditions. Addressing these gaps requires a systematic, semantically grounded framework for evaluating adversarial robustness under black-box assumptions.

1.4. Research Objectives

This study addresses the following research questions:

RQ1

How do adversarial perturbations affect the detection performance of mainstream IDS models trained on flow-level features?

RQ2

Which IDS model families and attack types exhibit the greatest susceptibility to adversarial evasion?

RQ3

To what extent do adversarial examples generated on a surrogate model transfer to different IDS architectures and to a distinct dataset (CICIDS2018)?

1.5.

Contributions of the Paper

By leveraging a proposed, fully reproducible adversarial evaluation framework which integrates flow preprocessing, IDS trainings, surrogate-based attack generations and semantic-validity constrain, this study establishes a unified pipeline for systematic stress-testing for intrusion detection system.

By incorporating multi-model adversarial robustness analysis across four representative IDS architecture (RF, LR, MLP, CNN-1D) and four attacks families (FGSM, PGD, HSJA, ZOO), the study uncover how architecture difference is shaping vulnerability pattern and highlights a distinct robustness profile across classical and deep model.

By combining cross-dataset evaluations using the CICIDS2018 Friday subsets with practical deployment-oriented recommendation, the study is providing insight into adversarial transferability under distribution shifts and deliver an actionable guidance for integrating robustness assessment into real-world IDS developments workflow.

1.1. Paper Organization

The remainder of this paper is organized as follows: Section II reviews prior work on intrusion detection, adversarial machine learning, and model robustness evaluation. Section III describes the dataset, preprocessing workflow, IDS models, surrogate-based attack generation, semantic constraints, evaluation metrics, and experimental setup. Section IV presents the results of clean and adversarial evaluation on CICIDS2017, as well as cross-dataset testing on CICIDS2018. Section V discusses the findings and their implications for operational IDS deployment. Section VI outlines limitations, future research directions, and concludes the paper.

2. Related Work

In the realm of AI-based NIDSs, AI-driven intrusion detection has emerged as a prominent cybersecurity research topic, especially in Internet of Things (IoT) contexts [32]. Advancements in machine learning (ML), deep learning (DL), and adversarial machine learning (AML) have all contributed significantly to this achievement. This section reviews three interrelated bodies of work: (1) ML and DL approaches for intrusion detection, (2) adversarial attacks targeting IDS models, and (3) robustness evaluation methodologies and dataset considerations. The goal is to position the present study within ongoing efforts to develop resilient AI-based IDS that remain reliable under adversarial pressure.

2.1. Machine Learning for Intrusion Detection

Traditional IDS relied heavily on static signatures or heuristic rules, limiting their ability to detect previously unseen threats or adapt to dynamic network behaviors [1]. ML-based IDS addressed these limitations by learning statistical patterns from flow and packet-level features. Classical models such as Random Forests, Support Vector Machines, Logistic Regression, and k-Nearest Neighbors have demonstrated strong performance across several benchmark datasets, with ensemble methods in particular showing robustness and interpretability advantages [24,25].

DL-based IDS extend these capabilities by automatically extracting high-level representations from raw traffic. Autoencoders, multilayer perceptrons (MLP), convolutional neural networks (CNN), and recurrent architectures (LSTM/GRU) have been widely applied to detect complex attacks and temporal dependencies [2,8,26]. Recent hybrid and attention-based architectures further improve granularity by capturing spatial and temporal correlations across flows [27].

Benchmark datasets underpinning IDS evaluation include the CICIDS2017 dataset [9], UNSW-NB15 [28], NSL-KDD [29], and newer IoT-oriented datasets such as BoT-IoT [30] and TONIoT [31,32]. These datasets vary in attack coverage, network design, and labeling fidelity, which affects model generalizability and robustness. A summary of commonly used IDS datasets is provided in Table 1.

TABLE I

Dataset Comparison

Dataset	Year	Traffic Source	Features	Attack Types	Notes
NSL-KDD (Tavallaee et al.)	2009	Synthetic	41	4 categories	Removes duplicate records from KDD’99; outdated for modern threats.
UNSW-NB15 (Moustafa & Slay, 2015)	2015	IXIA PerfectStorm	49	9 families	Realistic traffic generation; modern attack taxonomy.
CICIDS2017 (Sharafaldin et al., 2018)	2018	Simulated enterprise environment	80+	Multiple	Most widely used; captures diverse attack scenarios.
CICIDS2018 (Sharafaldin et al., 2018)	2018	Emulated daily traffic	80+	Multiple	Offers improved realism; used for cross-dataset generalization.
BoT-IoT (Koroniotis et al., 2019)	2019	IoT environment	46	4	Large-scale IoT traffic; high imbalance.
TON_IoT (Moustafa, 2021)	2021	IoT and Edge	Multimodal	Multiple	Integrates system logs, telemetry, and network flows.

This expanding ecosystem of IDS datasets helps benchmark detection capabilities, but it also introduces inconsistencies, particularly in data quality, class imbalance, and feature engineering, which complicate robustness assessment [10,33]. These limitations motivate the need for methodologically rigorous evaluations such as those undertaken in the present study.

2.2. Adversarial Attacks on IDS and Cybersecurity Systems

Adversarial attacks against ML models have demonstrated that small, imperceptible perturbations can be sufficient to cause misclassification across computer vision, NLP, and increasingly, network security domains [13, 15, 6]. In the context of flow-based IDS, attackers may modify packet counts, byte volumes, durations, or timing features to evade detection while maintaining realistic communication semantics.

White-box attacks such as FGSM and PGD exploit knowledge of model gradients to produce highly effective perturbations [13,14]. However, real-world adversaries rarely possess full model access. As a result, black-box attacks, where adversaries rely on query probing, zeroth-order approximation, or surrogate modeling, have become a core focus in IDS research [13,16,17].

Transfer-based attacks are among the most operationally realistic: adversaries train a surrogate classifier using locally collected or publicly available data and then optimize perturbations that transfer to the target model. Prior studies show varying degrees of transferability across architectures, with nonlinear deep networks often more vulnerable than tree ensembles [21]. Decision-based attacks such as HopSkipJump [34] and score-based attacks such as ZOO [35] further mimic practical constraints where only final decisions or soft outputs are observable.

Beyond handcrafted perturbations, generative models have emerged as powerful tools for adversarial traffic synthesis. IDS-GAN and similar frameworks use GANs to mimic network traffic patterns, injecting adversarial flows capable of triggering misclassification [22]. Reinforcement learning-based attack agents have also demonstrated success in learning query-efficient evasion strategies within constrained environments [23]. These methods highlight the rapid evolution of offensive AI techniques and the need for robust defenses.

2.3. Robustness Evaluation Methods and Research Gaps

Despite substantial progress, existing evaluations of IDS robustness often rely on unrealistic assumptions, limited datasets, or narrow attack models. Many studies report high clean accuracy without examining how models behave under adversarial stress or distribution shift [10]. Gradient-based attacks frequently assume white-box access that adversaries do not possess in practice [17]. Meanwhile, most robustness research evaluates models on a single dataset, even though real network environments involve diverse traffic patterns and evolving attack behaviors.

Recent work has called for more realistic threat modeling, emphasizing black-box constraints, semantic feature validity, and cross-dataset evaluation [21,17]. These gaps motivate the approach used in this study, which integrates:

Surrogate-based black-box adversarial attacks reflecting operational attacker capabilities.

Semantically constrained feature perturbations ensuring realistic network flow modifications.

Cross-dataset evaluation using both CICIDS2017 and CICIDS2018 to assess generalization.

Comparative robustness analysis across classical and deep architectures.

By addressing these gaps, the present study contributes a rigorous and practically relevant assessment of adversarial robustness for AI-based IDS.

3. Methodology

This section outlines the methodological framework used to evaluate the adversarial robustness of machine learning-based intrusion detection systems (IDS). The framework integrates a modern flow-based dataset, multiple IDS model families, a realistic black-box adversarial threat model, semantically constrained perturbation mechanisms, and an evaluation protocol aligned with contemporary adversarial machine learning (AML) research. The design of this methodology draws upon foundational work in AML [36, 13, 6] and established practices in intrusion detection [1,2].

3.1. Dataset and Preprocessing

All experiments are conducted using the CICIDS2017 dataset, one of the most comprehensive and widely adopted corpora for evaluating network intrusion detection systems. Developed by the Canadian Institute for Cybersecurity, CICIDS2017 [43] was created to overcome the limitations of earlier datasets such as KDD’99 by offering realistic, heterogeneous traffic patterns, modern attack classes, and high-quality flow-based features [9,29]. Its broad adoption in benchmarking studies [10,7] underscores its suitability for evaluating robustness under adversarial manipulation.

This study utilizes the MachineLearningCVE CSV files, which contain bidirectional flow records represented by 78 numerical features extracted with CICFlowMeter [9]. The features capture statistical and temporal properties of network flows, including packet counts, byte volumes, inter-arrival times, and TCP flag dynamics. Standard preprocessing procedures are applied, including removal of invalid or infinite values, elimination of duplicates, and normalization of all features to the [0,1] range using Min-Max scaling [37]. This normalization is essential for ensuring meaningful and controlled adversarial perturbations.

To protect the natural imbalance characteristic of intrusion datasets, where benign flows vastly outnumber many attack classes, the data is partitioned using a stratified 70/10/20 train-validation-test split. Maintaining this imbalance is consistent with prior IDS recommendations [10,38], and it also reflects realistic operating conditions in which minority attack samples often have weaker and more brittle decision boundaries [39]. Table 2 summarizes the dataset distribution.

Table 2

Dataset Split Summary
Split	Samples	% of Total
Training	1,979,512	70%
Validation	282,788	10%
Test	565,576	20%

This preprocessing pipeline ensures clean, normalized, and representative input data for both baseline and adversarial robustness evaluation.

To assess cross-dataset generalization and adversarial transfer, the CICIDS2018 Friday 02-03-2018 slice was additionally preprocessed using identical normalization and feature constraints as CICIDS2017, enabling a controlled external evaluation under real distribution shift.

3.2. IDS Models and Surrogate Architecture

The IDS models used in this study represent diverse algorithmic paradigms commonly employed in intrusion detection. As emphasized in several surveys [1,2], IDS performance varies by model family due to differences in inductive bias, feature sensitivity, and resilience to noise. To reflect this diversity, two classical machine learning classifiers and two deep learning architectures are selected:

Random Forest (RF)

Logistic Regression (LR)

Multilayer Perceptron (MLP)

1D Convolutional Neural Network (CNN1D)

Random Forest (RF) serves as a strong nonlinear baseline well suited for tabular intrusion detection tasks. Its ensemble structure provides robustness to noise and offers interpretable feature importance insights, contributing to its wide use in security analytics [39,17]. Logistic Regression (LR) represents a transparent linear model commonly employed in anomaly detection pipelines [1]. Its interpretability and predictable behavior under adversarial manipulation make it an informative baseline.

Deep learning models have shown increasing promise in IDS research due to their capacity to model nonlinear feature relationships. The Multilayer Perceptron (MLP) architecture used in this study follows designs previously shown to perform well on flow-based datasets [2,8]. Complementing the MLP, a 1D Convolutional Neural Network (CNN1D) is included to explore local structural patterns in the flow features. Prior studies have demonstrated the effectiveness of CNNs for both payload and flow-level intrusion detection tasks [26].

To generate adversarial examples in a realistic black-box threat setting, a separate surrogate model is trained. Following the methodology of [13,21], the surrogate is an independently trained MLP designed to approximate the decision boundaries of the target models. All adversarial perturbations are computed using gradients from this surrogate and then transferred to the target IDS. This approach reflects operationally realistic adversarial conditions, wherein attackers typically do not possess detailed knowledge of deployed IDS architectures [29,17].

3.3. Threat Model

This study assumes a black-box, transfer-based adversarial threat model, which has become increasingly relevant in AML research due to its alignment with real-world attacker constraints [35,6]. The adversary is assumed to have no access to the target model’s architecture, gradients, training data, or hyperparameters. Instead, the adversary trains a surrogate classifier on data drawn from an equivalent distribution and uses it to craft perturbations.

To clearly illustrate the attacker’s capabilities and the flow of surrogate-based perturbation generation, the overall threat model is depicted in Fig. 1. The figure shows how the adversary trains a local surrogate model on data drawn from the same distribution as the target IDS and subsequently uses this surrogate to craft adversarial perturbations for transfer. This visual representation clarifies the separation between the adversary’s knowledge and the target IDS internals and highlights the operational constraints driving the choice of a black-box, transfer-based threat model.

Fig. 1

Threat Model

The adversary’s goal is to produce a perturbed flow x' from an original input x such that the predicted label changes while keeping the perturbation within a bounded norm region:

$\:Find\:x{\prime\:}\:such\:that:\:\:\:\:f\left(x{\prime\:}\right)\:\ne\:\:ysubject\:to:\:\:\:\:\left|\right|\:x{\prime\:}\:-\:x\:\left|\right|\_\infty\:\:\le\:\:\epsilon\:$

The

$\:{L}_{{\infty\:}}$

constraint is widely used in adversarial research because it restricts the maximum allowable change to any individual feature [13,14]. For flow-based IDS, it is a particularly suitable metric given that small changes to individual flow statistics can mimic realistic and stealthy evasion behavior.

3.4. Adversarial Attack Formulation

Adversarial perturbations are generated using two widely studied gradient-based evasion techniques: the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Both attacks operate on the surrogate classifier and are subsequently transferred to the target IDS models.

FGSM generates adversarial examples by taking a single gradient-aligned step that maximizes the surrogate’s loss [13]. The perturbation is computed as:

$\:x{\prime\:}\:=\:clip(\:x\:+\:\epsilon\:\:*\:sign(\:\nabla\:\_x\:L(f\_s(x),\:y)\:\left)\:\right)$

where f_s is the surrogate model, and the clipping function ensures that the perturbed values remain within the [0,1] range.

PGD extends FGSM through iterative refinement, applying multiple perturbation steps while enforcing the same norm constraint [14]. Each iteration updates the input as:

$\:x^(t+1)\:=\:\varPi\:\_\{B\_\epsilon\:(x\left)\right\}\:(\:x^(t)\:+\:\alpha\:\:*\:sign(\:\nabla\:\_x\:L(f\_s(x^\left(t\right)),\:y)\:\left)\:\right)$

with the projection operator defined as:

$\:\varPi\:\_\{B\_\epsilon\:(x\left)\right\}\left(z\right)\:=\:min\left(\:max\right(z,\:x\:-\:\epsilon\:),\:x\:+\:\epsilon\:\:)$

PGD is widely regarded as one of the strongest first-order attacks and provides a rigorous measure of IDS robustness.

3.5. Semantic Validity Constraints

A common limitation of feature-space adversarial attacks is the possibility of producing flows that violate protocol-level semantics or physical plausibility. Addressing this issue is essential for ensuring that adversarial examples reflect realistic network behavior [32,17]. Accordingly, each adversarial example undergoes a semantic validation stage.

First, non-negativity constraints preserve basic flow properties:

$\:Flow\_Duration\:\ge\:\:0Total\_Fwd\_Packets\:\ge\:\:0Total\_Bwd\_Packets\:\ge\:\:0$

Second, monotonicity constraints derived from dataset characteristics ensure preservation of feature relationships:

$\:Total\_Length\_Fwd\_Packets\:\ge\:\:Min\_Packet\_Length$

Third, statistical plausibility is maintained by projecting feature values outside empirical training-set ranges back into valid bounds. Finally, features encoding TCP flag states are validated to avoid impossible flag combinations. This constraint layer ensures that adversarial examples remain sufficiently realistic to represent plausible evasion attempts in operational networks.

To provide a clear visual summary of how these constraints are applied during attack generation, the semantic constraint enforcement pipeline is illustrated in Fig. 2. The figure shows how raw adversarial perturbations produced by the surrogate model are projected onto valid feature ranges, adjusted to maintain protocol consistency, and filtered to preserve domain-specific semantics before being evaluated by the target IDS models. This ensures that adversarial examples remain both realistic and operationally meaningful.

Fig. 2

Semantic Constraints Block

3.6. Evaluation Metrics

Model robustness is evaluated using metrics that capture both baseline classification performance and degradation under adversarial conditions. Given the inherent class imbalance in CICIDS2017, macro-averaged precision, recall, and F1-score provide balanced insights across minority and majority classes. Accuracy is included for completeness.

Adversarial performance is evaluated using three additional metrics. The attack success rate (ASR) measures the fraction of originally correct predictions that become incorrect:

$\:ASR=\:\left(\#\:originally\:correct\:samples\:misclassified\:after\:attack\right)\:\:\:\:\:\:\:/\:(\#\:originally\:correct\:samples)$

The robust accuracy quantifies accuracy on adversarially perturbed samples:

$\:Robust\_Accuracy\:=\:(\#\:correctly\:classified\:adversarial\:samples)\:/\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:(\#\:adversarial\:samples)$

Finally, the accuracy drop provides a direct indicator of adversarial degradation:

$\:\varDelta\:\:=\:Clean\_Accuracy\:-\:Adversarial\_Accuracy$

These metrics align with established recommendations for AML robustness evaluation (Yuan et al., 2019; Zhang et al., 2021).

3.7. Cross-Dataset Evaluation

To evaluate robustness under realistic distribution shift, adversarial transferability is assessed using the CICIDS2018 Friday 02-03-2018 slice. This dataset provides traffic collected under different temporal, operational, and environmental conditions from CICIDS2017, making it suitable for examining whether perturbations crafted on one dataset remain effective when applied to another. Using identical preprocessing and semantic constraint enforcement ensures that differences in performance reflect dataset drift rather than preprocessing artifacts.

In this evaluation, IDS models are trained solely on CICIDS2017. Adversarial examples are generated using the surrogate model trained on CICIDS2017 and then applied directly to the CICIDS2018 flows without retraining or fine-tuning. This setup reflects realistic operational settings in which deployed IDS must confront new traffic distributions without full model retraining. Performance is measured using clean accuracy, adversarial accuracy, accuracy drop (Δ accuracy), and attack success rate (ASR), enabling a direct comparison with in-distribution robustness.

The cross-dataset evaluation pipeline is illustrated in Fig. 3, which depicts the sequence of model training on CICIDS2017, adversarial example generation, and external validation on CICIDS2018. This visual summary clarifies how transferability is measured and how adversarial behavior changes under distribution shift.

Fig. 3

Cross-Dataset Evaluation Sub-Architecture

3.8. Experimental Setup

All experiments are implemented using Python, with machine learning models built using scikit-learn [36] and deep learning models built using PyTorch. Training and evaluation are conducted on a CPU-only workstation equipped with an Intel i7 processor and 32 GB of RAM, reflecting constraints commonly encountered in lightweight or distributed IDS deployments [2]. Deep learning models use early stopping based on validation loss, and fixed random seeds ensure reproducibility. All models share identical data partitions, and adversarial examples are generated on validation and test subsets to ensure a fair and controlled comparison across models. This experimental configuration provides a consistent and reproducible environment for assessing adversarial robustness under realistic assumptions and resource conditions.

Fig. 4

Adversarial IDS Robustness Evaluation Architecture

Figure 4 summarizes the end-to-end methodology used in this study, integrating dataset preprocessing, IDS model training, surrogate-based attack generation, semantic and protocol-aware constraint enforcement, and evaluation under both in-distribution (CICIDS2017) and cross-dataset (CICIDS2018) conditions. The architecture reflects the black-box, transfer-based threat model adopted in this work and illustrates how gradient-based (FGSM, PGD) and black-box (HSJA, ZOO) adversarial examples are generated from the surrogate and applied to the target IDS models. Evaluation includes clean and adversarial accuracy, Δ accuracy, ASR, per-class degradation, and cross-dataset transferability.

4. Experimental Results

This section presents the empirical evaluation of the four intrusion detection models, Random Forest (RF), Logistic Regression (LR), Multilayer Perceptron (MLP), and CNN1D, on the CICIDS2017 dataset under both clean and adversarial conditions. We first report training and baseline classification performance on the full validation and test splits, then analyze robustness under multiple adversarial threat models, including FGSM, PGD, HopSkipJump (HSJA), and Zeroth-Order Optimization (ZOO). The evaluation follows best practices in adversarial machine learning [13,14,6] and reflects the realistic black-box assumptions discussed earlier [16,17].

4.1. Training and Baseline Classification Performance

The first set of experiments evaluates all four models in a purely supervised setting, without adversarial perturbations, using the full CICIDS2017 validation and test splits. These results establish the clean-performance baseline against which adversarial degradation is later measured.

4.1.1. 4.1.1 Classical Models

Table 3 summarizes the performance of the classical baselines, Random Forest and Logistic Regression, on the validation (282,788 flows) and test (565,576 flows) splits.

Table 3

Classical baseline performance on CICIDS2017
Model	Split	Accuracy	Macro F1	Weighted F1
Random Forest	Validation	0.9984	0.8617	0.9984
Random Forest	Test	0.9985	0.8714	0.9985
Logistic Regression	Validation	0.9596	0.3997	0.9560
Logistic Regression	Test	0.9591	0.4003	0.9562

Random Forest attains near-perfect accuracy on both validation and test splits while maintaining comparatively high macro F1-scores above 0.86. This indicates that RF not only models the dominant benign and frequent attack classes well but also performs reasonably across several minority classes despite the pronounced class imbalance, echoing prior observations about the strength of ensemble tree methods in IDS [39,17].

Logistic Regression achieves high overall accuracy (~ 0.96) but substantially lower macro F1 (~ 0.40). This discrepancy reflects a well-known phenomenon in intrusion detection: linear decision boundaries can capture the benign versus frequent-attack separation but do not adequately represent numerous rare attack types, leading to strong weighted averages yet poor macro-level performance [1].

4.1.2. 4.1.2 Deep Models

The MLP converges quickly, achieving its best validation accuracy (0.9829) at epoch 7 and generalizing well to the test set with accuracy 0.9804 and a small gap between validation and test loss. This positions the MLP as a strong deep baseline, only slightly below RF in clean accuracy. The CNN1D achieves a best validation accuracy of 0.9528 and test accuracy of 0.9368, with a more noticeable loss gap, suggesting a somewhat higher sensitivity to class imbalance and a greater tendency to overfit the validation distribution. Nonetheless, CNN1D provides valuable architectural diversity, leveraging local patterns in the feature dimension, which becomes important in the robustness analysis. Table 4 presents the training dynamics and test performance of the deep baselines: MLP and CNN1D.

Table 4

Deep baseline performance on CICIDS2017
Model	Best Val Epoch	Best Val Accuracy	Best Val Loss	Test Accuracy	Test Loss
MLP	7	0.9829	0.0418	0.9804	0.0468
CNN1D	17	0.9528	0.1699	0.9368	0.2212

4.1.3. 4.1.3 Comparative View

From a clean-data perspective, RF is the strongest model, followed closely by the MLP, with CNN1D and LR trailing. This diversity of inductive biases, nonlinear ensembles, linear models, fully connected deep networks, and convolutional architectures, creates a rich setting for studying adversarial robustness, allowing us to compare how different model families trade off between clean performance and robustness [2,20].

4.2. Clean Performance on the Adversarial Evaluation Subset

To enable controlled adversarial evaluation, a 20,000-flow subset of the CICIDS2017 test split is used for FGSM and PGD experiments. Table 5 reports the clean accuracy of all models on this subset.

Table 5

Clean Accuracy on 20,000-Flow Adversarial Evaluation Subset
Model	Clean Accuracy
Random Forest	0.9987
Logistic Regression	0.9610
MLP	0.9817
CNN1D	0.9411

The ranking is consistent with full-test results: RF > MLP > CNN1D > LR. These values serve as the reference point for computing robust accuracy, accuracy drops, and attack success rates (ASR) under adversarial perturbations.

4.3. FGSM Robustness (Single-Step, Surrogate-Based Attack)

We first evaluate robustness against the Fast Gradient Sign Method (FGSM), a single-step gradient-based attack computed on the surrogate model and transferred to all target models [10,13]. Table 6 summarizes the robust accuracy, accuracy drop, and ASR for perturbation budgets ε ∈ {0.01, 0.03, 0.05} on the 20,000-flow subset.

Table 6

FGSM Robust Accuracy and ASR (ε ∈ {0.01, 0.03, 0.05})
Model	ε	Adv Accuracy	Δ Accuracy	ASR
RF	0.01	0.8049	−0.1938	0.195
RF	0.03	0.8049	−0.1938	0.195
RF	0.05	0.8049	−0.1938	0.195
LR	0.01	0.8308	−0.1302	0.144
LR	0.03	0.7437	−0.2173	0.238
LR	0.05	0.6149	−0.3461	0.371
MLP	0.01	0.8059	−0.1758	0.181
MLP	0.03	0.7095	−0.2721	0.287
MLP	0.05	0.6435	−0.3382	0.356
CNN1D	0.01	0.7853	−0.1558	0.203
CNN1D	0.03	0.7246	−0.2165	0.268
CNN1D	0.05	0.6245	−0.3166	0.373

All models exhibit noticeable degradation under FGSM, with accuracy drops ranging from roughly 13 to 35 percentage points at higher ε values. RF shows a characteristic “saturation” effect: its robust accuracy decreases to ~ 0.80 even at the smallest ε and then plateaus, suggesting that a subset of flows is consistently vulnerable while the rest remain robust. LR, MLP, and CNN1D, by contrast, display monotonic vulnerability, robust accuracy deteriorates steadily as ε increases, reflecting how FGSM exploits both linear and nonlinear decision boundaries (Goodfellow et al., 2014; Yuan et al., 2019).

To visualize these patterns across perturbation strengths, Figs. 5 and 6 summarize the FGSM robustness behavior of all models, showing the corresponding trends in robust accuracy and attack success rate (ASR) as ε increases.

Fig. 5

FGSM robust accuracy vs. ε for all models

Fig. 6

FGSM attack success rate (ASR) vs. ε

(Plots show RF’s flat robustness curve versus sharper declines for LR, MLP, and CNN1D.)

4.4. PGD Robustness (Iterative, Surrogate-Based Attack)

We next evaluate robustness under Projected Gradient Descent (PGD), an iterative refinement of FGSM known to be among the strongest first-order adversarial attacks [14]. Table 7 reports the results for ε ∈ {0.01, 0.03, 0.05} on the same 20,000-flow subset.

Table 7

PGD Robust Accuracy and ASR (ε ∈ {0.01, 0.03, 0.05})
Model	ε	Adv Accuracy	Δ Accuracy	ASR
RF	0.01	0.8049	−0.1938	0.195
RF	0.03	0.8049	−0.1938	0.195
RF	0.05	0.8049	−0.1938	0.195
LR	0.01	0.8222	−0.1388	0.151
LR	0.03	0.7544	−0.2066	0.223
LR	0.05	0.6997	−0.2613	0.279
MLP	0.01	0.6927	−0.2890	0.294
MLP	0.03	0.3495	−0.6322	0.644
MLP	0.05	0.3075	−0.6741	0.687
CNN1D	0.01	0.8016	−0.1395	0.185
CNN1D	0.03	0.8047	−0.1364	0.185
CNN1D	0.05	0.7989	−0.1422	0.190

PGD reveals more severe vulnerabilities than FGSM, especially for the MLP. At ε = 0.03, the MLP’s accuracy collapses from 0.9817 to 0.3495, with ASR ≈ 0.64; at ε = 0.05, accuracy further declines to 0.3075 with ASR ≈ 0.69. This catastrophic behavior is consistent with prior findings that iterative attacks can exploit sharp decision boundaries in deep networks [14].

In contrast, RF and CNN1D remain remarkably stable under PGD transfer: robust accuracy is ~ 0.80 across all ε values, and PGD offers no additional damage beyond FGSM, suggesting weak cross-architecture transferability from the MLP surrogate. LR shows intermediate behavior, with progressive but not catastrophic deterioration. These PGD-induced degradation patterns are summarized in Fig. 7, which illustrates the robust accuracy curves for all models across increasing ε values.

Fig. 7

PGD robust accuracy vs. ε for all models

4.5. HSJA Robustness (Decision-Based Black-Box Attack)

To model realistic black-box attackers who only observe final model decisions, we evaluate the HopSkipJump Attack (HSJA) on a 1,000-flow subset of the test data with ε = 0.03, 10 refinement iterations, and 10 binary search steps. HSJA operates purely on hard-label outputs and does not rely on gradients or confidence scores [34].

Table 8

HSJA Robustness (ε = 0.03, 1,000-Flow Subset)
Model	Clean Accuracy	Adv Accuracy	Δ Accuracy	ASR
RF	0.997	0.817	−0.180	0.1815
LR	0.955	0.909	−0.046	0.0660
MLP	0.979	0.921	−0.058	0.0725
CNN1D	0.938	0.833	−0.105	0.1535

HSJA induces moderate but non-catastrophic degradation. RF loses about 18 percentage points of accuracy, while LR and MLP suffer relatively small drops (4–6 percentage points). CNN1D occupies a middle ground, with a ~ 10.5 percentage-point decline and ASR ~ 0.15. These results (reported in Table 8) are consistent with expectations that decision-based attacks are weaker than gradient-based methods, particularly in high-dimensional tabular spaces and under limited query budgets [16,34]. These patterns are visualized in Fig. 8, which compares clean and adversarial accuracy for all models under HSJA.

Fig. 8

HSJA clean vs. adversarial accuracy for all models

4.6. ZOO Robustness (Score-Based Black-Box Attack)

We also evaluate a score-based black-box attack using a lightweight Zeroth-Order Optimization (ZOO) implementation on a 1,000-flow subset with ε = 0.03, 10 iterations, and four coordinates per step [35]. Because ZOO requires access to model confidence scores or probabilities, it is applied only to RF and LR.

Table 9

ZOO Robustness (ε = 0.03, 1,000-Flow Subset)
Model	Clean Accuracy	Adv Accuracy	Δ Accuracy	ASR
RF	0.997	0.949	−0.048	0.0481
LR	0.955	0.955	0.000	0.0000

Under the constrained query budget, ZOO achieves only mild degradation for RF and fails to significantly affect LR. This outcome aligns with prior observations that score-based black-box attacks struggle in high-dimensional, structured feature spaces when query budgets and coordinate sampling are limited [16,6]). These effects are illustrated in Fig. 9, which shows the clean versus adversarial accuracy for RF and LR under the ZOO attack.

Fig. 9

ZOO clean vs. adversarial accuracy for RF and LR

4.7. Overall Interpretation

Taken together, the results highlight a nuanced robustness landscape across model families and threat models. From a clean-data standpoint, RF and MLP are the strongest performers, with CNN1D and LR trailing. Under adversarial conditions, however, their behaviors diverge markedly. RF demonstrates consistently strong robustness across FGSM, PGD, HSJA, and ZOO, with robust accuracy stabilizing around 80% under gradient-based attacks and only moderate degradation under decision- and score-based attacks. These findings reinforce the value of ensemble methods for robust tabular intrusion detection [39]

The MLP, in contrast, is highly vulnerable to PGD transfer from the surrogate. While it maintains strong clean accuracy, its robust accuracy collapses below 35% for moderate ε, with ASR approaching 0.70, demonstrating the deleterious effect of sharp decision boundaries when exploited by iterative attacks [2,6]. CNN1D exhibits mixed behavior: it is noticeably affected by FGSM but remains relatively robust to PGD transfer and experiences moderate degradation under HSJA. LR behaves as a predictable linear baseline, sensitive but not catastrophically fragile.

The contrast between white-box-style PGD transfer and realistic black-box attacks (HSJA and ZOO) underscores an important conceptual point: models that appear catastrophically vulnerable in idealized white-box settings may exhibit more moderate degradation under operationally realistic black-box assumptions [17]. Finally, the differing transfer patterns across RF, LR, MLP, and CNN1D suggest that architectural diversity can be leveraged as a defensive asset. Heterogeneous ensembles combining models with distinct inductive biases may provide defense-in-depth against transfer-based attacks [32], especially when combined with semantic constraints and protocol-aware validation as described in the methodology.

4.8. Cross-Dataset Generalization: CICIDS2018

To evaluate the generalization of the CICIDS2017-trained models under distribution shift, we conducted an external validation using the CICIDS2018 Friday (02-03-2018) slice, processed into a binary Benign-Attack classification task. The slice contains 731,167 training flows, 156,679 validation flows, and 156,679 test flows, each represented by 78 CICFlowMeter-derived features, with a consistent benign-attack ratio of approximately 72.6% to 27.4% across all splits. This dataset differs meaningfully from CICIDS2017 in traffic composition, attack behaviors, and feature distributions, making it suitable for assessing out-of-distribution robustness.

Table 10 reports clean accuracy on a 20,000-flow subset of the CICIDS2018 Friday test split. Random Forest and MLP retain moderate cross-dataset performance (0.72 and 0.67), while Logistic Regression and CNN1D collapse to near-random behavior (0.08 and 0.04). These results indicate that CICIDS2018 represents a genuinely different distribution rather than a trivial variant of the training data, highlighting the importance of cross-dataset evaluation for IDS robustness studies.

Table 10

CICIDS2018 Cross-Dataset Clean Accuracy (20,000 Flows)
Model	Clean Accuracy
RF	0.72045
LR	0.07950
MLP	0.67445
CNN1D	0.03600

We then applied adversarial examples generated on the CICIDS2017 surrogate model (MLP) using FGSM and PGD with ε = 0.03. Surprisingly, Random Forest remained almost entirely unaffected under both attacks (Δ < 0.0001; ASR < 0.001), indicating that perturbations crafted on CICIDS2017 do not transfer destructively to RF on CICIDS2018. The MLP exhibited modest deterioration under FGSM (Δ = 0.022) and PGD (Δ = 0.037), far smaller than the steep within-dataset degradation reported in Section 4.3 and 4.4. In contrast, LR and CNN1D displayed substantial improvements in accuracy when perturbed, driven by the fact that both models had initially learned miscalibrated decision boundaries under the shifted distribution. In such cases, small L ∞ perturbations behave more like benign data augmentation than intentional adversarial distortions. These patterns are summarized in Tables 11 and 12.

Table 11

CIMIDS2018 FGSM Robustness (ε = 0.03)
Model	Clean Acc	Adv Acc	Δ Acc	ASR
RF	0.72045	0.72045	0.00000	0.00000
LR	0.07950	0.32740	−0.24790	0.02767
MLP	0.67445	0.65220	0.02225	0.05983
CNN1D	0.03600	0.18495	−0.14895	0.86250

Table 12

CICIDS2018 PGD Robustness (ε = 0.03)
Model	Clean Acc	Adv Acc	Δ Acc	ASR
RF	0.72045	0.72040	0.00005	0.00007
LR	0.07950	0.32835	−0.24885	0.03082
MLP	0.67445	0.63780	0.03665	0.08021
CNN1D	0.03600	0.34655	−0.31055	0.78194

These results collectively demonstrate that adversarial perturbations optimized on CICIDS2017 do not automatically transfer to a distinct dataset. Instead, transferability depends strongly on both model architecture and distribution alignment. For well-aligned models (RF, MLP), transferred adversarial examples induce only limited degradation; for poorly aligned models (LR, CNN1D), perturbations often act as random noise that occasionally improves accuracy. This cross-dataset analysis directly addresses the reviewer’s concern and provides a more realistic assessment of adversarial robustness for flow-based IDS.

5. Discussion of Findings

The findings of this study reveal a complex adversarial robustness landscape across machine learning and deep learning intrusion detection models. Although all models achieved high clean accuracy on CICIDS2017, their susceptibility to adversarial perturbations varied substantially once evaluated under realistic black-box conditions. These outcomes underscore the limitations of evaluating IDS performance solely on clean datasets and highlight the necessity of robustness assessments that align with the operational threat environment.

5.1. Interpreting Adversarial Robustness

A clear distinction emerges between models with inherently stable decision boundaries and those whose representations are highly sensitive to small adversarial shifts. Random Forest, for example, consistently maintained approximately 80% accuracy under both FGSM and PGD transfer-based attacks. This resilience aligns with prior observations that ensemble tree methods produce piecewise-constant decision surfaces that are difficult for gradient-driven perturbations to exploit [39,17].

In contrast, the Multilayer Perceptron exhibited pronounced fragility under PGD, with accuracy dropping from 0.98 to below 0.35. This sharp degradation is consistent with adversarial machine learning research showing that iterative attacks can exploit curvature around decision boundaries in deep models [14,6]. Logistic Regression demonstrated predictable linear behavior, experiencing moderate declines without catastrophic failure, while CNN1D showed a mixed profile: comparatively stable under PGD but more exposed to FGSM and HSJA. These differences illustrate that architectural properties, rather than clean accuracy alone, drive adversarial susceptibility.

5.2. Model-Specific Vulnerabilities

The model-specific responses observed across attacks reflect both architectural design and dataset characteristics. Deep networks are particularly exposed to high-curvature adversarial manipulation in feature-rich, imbalanced domains where minority attack classes lie near decision boundaries [38]. CNN1D’s convolutional structure captures local feature patterns that can limit PGD transfer but remain sensitive to directionally coherent perturbations such as FGSM. Logistic Regression, by contrast, demonstrated significant degradation on CICIDS2018, reinforcing the limitations of linear models when confronted with distribution shift and heterogeneous attack behaviors.

These results highlight that IDS designers must evaluate model robustness beyond static benchmarking metrics. Models that perform competitively under clean conditions may be fragile in dynamic adversarial settings, especially when trained on datasets with labeling inconsistencies or imbalanced class distributions [10,32).

5.3. Operational Implications for IDS Design

The divergence between clean and adversarial performance has direct implications for operational IDS deployment. First, high clean accuracy is not a reliable indicator of real-world robustness. SOC engineers and MLOps pipelines should incorporate adversarial stress testing as part of standard model validation, particularly for systems deployed in high-risk environments such as cloud infrastructures, financial systems, and industrial networks.

Second, the severe PGD vulnerability of the MLP suggests caution when deploying deep models for flow-based IDS unless complemented by robustness-enhancing methods such as adversarial training or uncertainty quantification [40]. Conversely, the resilience of Random Forest highlights the value of non-differentiable or non-smooth architectures in resisting gradient-based attacks.

Finally, broader trustworthy AI considerations extend beyond intrusion detection. As shown in fairness-aware educational assessment systems [12], high-impact AI systems must balance accuracy with stability, transparency, and robustness. These parallels reinforce the importance of designing IDS models that maintain consistent performance under adversarial pressure.

5.4. Realistic Threat Models and Cross-Dataset Insights

The transfer-based black-box threat model adopted in this study aligns with real-world attacker capabilities. As emphasized by [17], adversaries typically lack access to model gradients or parameters and instead rely on probing or surrogate approximations. Our results validate this perspective: while PGD induces catastrophic degradation in idealized white-box contexts, its transferability is substantially limited in realistic black-box scenarios and across heterogeneous models.

The CICIDS2018 cross-dataset evaluation reinforces this observation. Perturbations optimized on CICIDS2017 transferred only minimally to CICIDS2018, with Random Forest showing no measurable degradation and even vulnerable models exhibiting sharply reduced susceptibility. These findings demonstrate that adversarial risk must be interpreted within the context of actual dataset alignment and operational conditions rather than assumed to generalize uniformly across traffic distributions.

6. Limitations and Constraints

6.1. Dataset Limitations

Although CICIDS2017 remains one of the most widely adopted IDS benchmarks, it has several limitations that affect robustness evaluation. Prior studies identify labeling inconsistencies, sampling artifacts, and class imbalance that may distort learned decision boundaries [32,10]. While this study applied extensive preprocessing and semantic constraints, residual inconsistencies, particularly in minority attack classes, may influence model susceptibility. The cross-dataset results further demonstrate that models trained on CICIDS2017 do not generalize seamlessly to CICIDS2018, underscoring the need for longitudinal, diverse, and heterogeneous datasets that more accurately represent evolving network environments.

6.2. Methodological Constraints

The adversarial attacks examined here, FGSM, PGD, HSJA, and ZOO, represent widely used techniques in adversarial ML, but they capture only a portion of the adversarial threat space. PGD’s effectiveness remains dependent on the quality of the surrogate model, and transferability to non-differentiable architectures such as Random Forest is inherently limited. Similarly, HSJA and ZOO operate under constrained query budgets, which may underestimate attacker capabilities in settings where extensive probing is possible. Furthermore, the black-box threat model assumes that adversaries can approximate the input distribution sufficiently to train effective surrogates, an assumption that may not hold in encrypted or proprietary network environments.

7. Conclusion and Future work

This study proposed a rigorous and realistic framework for evaluating the adversarial robustness of AI-based intrusion detection systems using surrogate-based attack generation, semantic feature constraints, and cross-dataset validation. The experiments demonstrate that adversarial vulnerability is highly dependent on model architecture, attack strength, and dataset alignment. While deep networks achieved strong clean accuracy, they exhibited sharp fragility under iterative perturbations, whereas Random Forest maintained stable performance across all threat models. The limited transferability of adversarial examples across datasets emphasizes the need for evaluating IDS under diverse traffic conditions and realistic adversarial assumptions.

By integrating multiple attack types, semantically aligned constraints, and cross-dataset analysis, this work provides practical insights and methodological guidance for developing IDS models that are not only accurate but robust, deployable, and aligned with modern adversarial threat landscapes.

7.1. Future Directions

Several research avenues can strengthen IDS robustness and extend this work:

Adversarial training: Incorporating FGSM, PGD, randomized smoothing, and distributionally robust optimization may improve model resilience without sacrificing clean accuracy [14].

Temporal and sequential modeling: LSTM, GRU, and Transformer-based architectures may better capture temporal dependencies in network flows, potentially stabilizing decision boundaries under adversarial shift.

Explainable AI (XAI): Techniques such as LIME, SHAP, and integrated gradients [41] can expose which features contribute most to vulnerability, supporting finer-grained hardening strategies.

Federated and privacy-preserving IDS: Federated learning and privacy-aware [42] offer promising pathways for enabling collaborative detection while mitigating exposure to poisoning and evasion attacks.

These directions reflect a broader movement toward trustworthy, transparent, and distributionally robust AI systems for securing next-generation IoT infrastructures [32,45]. As IoT settings expand, incorporating these advancements will be vital for providing robust, adaptable, and future-proof intrusion detection.

Declarations

Clinical Trial

Not applicable.

Consent to Publish:

Not applicable.

Ethics Declaration

Not applicable.

Consent to Participate:

Not applicable.

Corresponding Author

Samer Aoudi

Funding:

Not applicable.

Author Contribution

S.A. led the conceptualization, methodology design, investigation, supervision, and original draft preparation. H.A. contributed to visualization, investigation, and manuscript reviewing and editing. All authors reviewed the final manuscript.

Competing Interests:

The authors declare that they have no competing interests as defined by Discover, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Data Availability

The data supporting the findings of this study are based on the publicly available CICIDS2017 dataset released by the Canadian Institute for Cybersecurity. The dataset comprises labeled, flow-based network traffic records and is available at the following repository:Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). CICIDS2017 Dataset. Canadian Institute for Cybersecurity. [https://www.unb.ca/cic/datasets/ids-2017.html](https:/www.unb.ca/cic/datasets/ids-2017.html) . DOI: 10.21227/H25T-YA52No new primary datasets were collected for this study. The adversarial traffic samples analyzed were generated programmatically from the original CICIDS2017 data during experimentation and were used solely for model evaluation. These derived data are not shared separately, as they depend directly on the structure and licensing terms of the original dataset. All results supporting the conclusions of this study, including evaluation metrics, figures, and analyses, are included within the published article.

References

Moustafa N, Hu J, Slay J. A holistic review of network anomaly detection systems. J Netw Comput Appl. 2019;128:33–55.

Shone N, Ngoc TN, Phai VD, Shi Q. A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell. 2018;2(1):41–50.

Liu H, Lang B, Liu M, Yan H. CNN and RNN based payload classification methods for attack detection. Knowl Based Syst. 2019;163:332–41.

Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A. (2017). Practical black-box attacks against machine learning. In Proceedings of the ACM Asia Conference on Computer and Communications Security (pp. 506–519).

Pedregosa F, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

Yuan X, He P, Zhu Q, Li X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans Neural Networks Learn Syst. 2019;30(9):2805–24.

Maseer ZK, Yusof R, Bahaman N, Mostafa SA, Foozy CFM. Benchmarking machine learning for anomaly-based intrusion detection systems using CICIDS2017. IEEE Access. 2021;9:22351–70.

Diro AA, Chilamkurti N. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Comput Syst. 2018;82:761–8.

Sharafaldin I, Lashkari AH, Ghorbani AA. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy (pp. 108–116).

Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A. A survey of network-based intrusion detection datasets. Computers Secur. 2019;86:147–67.

Liu L, Engelen G, Lynar T, Essam D, Joosen W. (2022). Error prevalence in NIDS datasets. In Proceedings of the IEEE Conference on Communications and Network Security (CNS) (pp. 254–262). https://doi.org/10.1109/CNS56114.2022.9947235

Aoudi S, Al-Aqrabi H. (2025). Fairness-aware AI in education: Detecting and reducing bias in student assessment systems. Proceedings of the 10th International Conference on Information Technology Trends (ITT 2025). (In press).

Goodfellow IJ, et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27:2672–80.

Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.

Tramèr F, Zhang F, Juels A, Reiter MK, Ristenpart T. (2016). Stealing machine learning models via prediction APIs. In Proceedings of the USENIX Security Symposium (pp. 601–618).

Apruzzese G, Anderson HS, Dambra S, Freeman D, Pierazzi F, Roundy K. (2023, February). real attackers don't compute gradients: bridging the gap between adversarial ml research and practice. In 2023 IEEE conference on secure and trustworthy machine learning (SaTML) (pp. 339–364). IEEE.

Malik A, AlAqrabi H, Aoudi S. (2025). Deepfake-based cyber threats: Security challenges and countermeasures. In A. Almomani & M. Alauthman, editors, Examining cybersecurity risks produced by generative AI (pp. 163–188). IGI Global. https://doi.org/10.4018/979-8-3373-0832-6.ch008

Almomani A, Aoudi S, al-Qerem A, Aldweesh A, Alkasassbeh M. (2025). Behavioral analysis of AI-generated malware: New frontiers in threat detection. In A. Almomani & M. Alauthman, editors, Examining cybersecurity risks produced by generative AI (pp. 211–234). IGI Global. https://doi.org/10.4018/979-8-3373-0832-6.ch010

Alauthman M, Almomani A, Aoudi S, al-Qerem A, Aldweesh A. (2025). Automated vulnerability discovery: Generative AI in offensive security. In A. Almomani & M. Alauthman, editors, Examining cybersecurity risks produced by generative AI (pp. 309–328). IGI Global. https://doi.org/10.4018/979-8-3373-0832-6.ch013

Demontis A, Melis M, Pintor M, Jagielski M, Biggio B, Oprea A, Roli F. (2019). Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In Proceedings of the 28th USENIX Security Symposium (pp. 321–338).

Lin Z, Shi Y, Xue Z. (2022). IDSGAN: Generative adversarial networks for attack generation against intrusion detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 79–91). Springer.

Randhawa RH, Aslam N, Alauthman M, Khalid M, Rafiq H. Deep reinforcement learning based Evasion Generative Adversarial Network for botnet detection. Future Generation Comput Syst. 2024;150:294–302.

Maseer ZK, Yusof R, Bahaman N, Mostafa SA, Foozy CFM. Benchmarking machine learning for anomaly-based intrusion detection systems using CICIDS2017. IEEE Access. 2021;9:22351–70.

Liu H, Lang B, Liu M, Yan H. CNN and RNN based payload classification methods for attack detection. Knowl Based Syst. 2019;163:332–41.

Sun P, Liu P, Li Q, Liu C, Lu X, Hao R, Chen J. (2020). DL-IDS: Extracting Features Using CNN‐LSTM Hybrid Network for Intrusion Detection System. Security and communication networks, 2020(1), 8890306.

Moustafa N, Slay J. (2015). UNSW-NB15: A comprehensive dataset for network intrusion detection systems. In Military Communications and Information Systems Conference (MilCIS).

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. (2009). A detailed analysis of the KDD CUP 99 dataset. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (pp. 1–6).

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of a realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Generation Comput Syst. 2019;100:779–96.

Moustafa N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustainable Cities Soc. 2021;72:102994.

Alsaedi A, Moustafa N, Tari Z, Mahmood A, Anwar A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access. 2020;8:165130–50.

Al-Aqrabi H, et al. Dynamic authentication for intelligent sensor clouds in the Internet of Things. Int J Inf Secur. 2024;23(3):2003–21.

Chen J, Jordan MI, Wainwright MJ. (2020, May). Hopskipjumpattack: A query-efficient decision-based attack. In 2020 ieee symposium on security and privacy (sp) (pp. 1277–1294). IEEE.

Chen PY, Zhang H, Sharma Y, Yi J, Hsieh CJ. (2017, November). Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security (pp. 15–26).

Biggio B, Roli F. (2018). Wild patterns: Ten years after the rise of adversarial machine learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (pp. 2154–2156).

Pedregosa F, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

Panigrahi R, Borah S. A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int J Eng Technol. 2018;7(324):479–82.

Aljawarneh S, Aldwairi M, Yassein MB. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci. 2018;25:152–60.

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Abdar M, et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inform Fusion. 2021;76:243–97.

Ribeiro MT, Singh S, Guestrin C. (2016). Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1135–1144).

Abbasi N, Al-Mhiqani M, Al-Aqrabi H, Aoudi S. (2025). A privacy-aware federated learning approach for insider threat detection. In Proceedings of the 10th International Conference on Information Technology Trends (ITT 2025). (In press).

Canadian Institute for Cybersecurity. (2017). CICIDS2017: Intrusion detection system dataset. https://www.unb.ca/cic/datasets/ids-2017.html

Aoudi S, Al-Aqrabi H. Integrating IoT Security Practices into a Risk-Based Framework for Small and Medium Enterprises (SMEs), Elsevier. Computer Standards & Interfaces; 2025.

Yes