AI-Assisted Swarm Robotics for Autonomous Exploration Using Ant Colony Algorithms

BSivakumarReddy1

SKHarisha1

JinkaRanganayakulu1

MKrishna1✉Emailkrishnam@rvce.edu.in

1RV College of Engineering560059BengaluruIndia

B Sivakumar Reddy, S K Harisha^*, Jinka Ranganayakulu, M Krishna

RV College of Engineering

Bengaluru-560059, India

e-mail: krishnam@rvce.edu.in

Abstract

This study explores the integration of Artificial Intelligence (AI) techniques with Ant Colony Optimization (ACO) to enhance swarm robotic exploration in dynamic and uncertain environments. While traditional ACO provides a decentralized, pheromone-based mechanism for path planning, it suffers from slow convergence, stagnation, and limited adaptability when confronted with environmental changes. To address these challenges, an AI-assisted ACO framework was developed, incorporating RL principles to adapt pheromone update rules in real time. Robots learn from past interactions, adjust strategies dynamically, and maintain efficient exploration under varying conditions. Python-based simulations were conducted across four environments—50×50, 100×100, 250×250, and 500×500 grids—with swarm sizes ranging from 20 to 200 robots and obstacle densities of 10%, 20%, and 30%. Results show that AI-assisted ACO consistently outperformed traditional ACO, achieving 7–15% higher exploration coverage (≈ 52.1% vs. 44.6% in the baseline case and ≈ 61% vs. 48% in the largest grid), with ≈ 20–25% faster convergence and ≈ 25% higher adaptability under dynamic obstacle scenarios.

Keywords:

Swarm Robotics

Ant Colony Optimization (ACO)

Reinforcement Learning (RL)

Autonomous Exploration

Multi-Robot Systems

Artificial Intelligence (AI) Integration

1. Introduction

Swarm robotics has gained significant importance in production industries for performing complex tasks, owing to its collaborative behavior that resembles natural systems such as ants, bees, and termites [1, 2]. This collaboration enables decentralized task execution, making the system more flexible, scalable, and robust. During operation, if any robot in the swarm fails, the remaining robots can continue executing the task without interruption [3]. However, path planning and exploration remain challenging problems in swarm robotics and still require further optimization.

Many researchers have addressed this challenge using nature-inspired algorithms, particularly the Ant Colony Optimization (ACO) algorithm, which has been successfully applied to swarm robotics [4]. The ACO algorithm, developed based on bio-inspired principles, enables effective exploration and convergence toward optimal solutions [5–7]. Nevertheless, ACO often fails to reach the target in dynamic or unpredictable environments because of its reliance on predefined rules that cannot adapt to changing conditions [8]. Furthermore, ACO lacks memory or learning capability—it does not retain past experiences of success or failure—which limits its performance in real-world scenarios [9].

To overcome these limitations, Artificial Intelligence (AI) techniques such as Reinforcement Learning (RL) have been integrated with swarm robotics. RL enables robots to learn from previous experiences and continuously update their strategies to accomplish tasks more efficiently [10]. The integration of AI and ACO forms a hybrid model that significantly enhances the swarm’s adaptability in dynamic conditions, leading to improved path planning, obstacle avoidance, and time optimization.

In this hybrid approach, robots make decisions instantaneously based on real-time sensor data and the behavioral patterns learned from their neighboring agents [11]. This collective learning mechanism helps reduce conflicts, prevent collisions, and improve area coverage efficiency. Recent studies indicate that models combining ACO and RL identify optimal paths faster and navigate cluttered environments more effectively than either method used independently [13]. Moreover, advanced systems are now incorporating real-time sensor feedback to dynamically fine-tune robot behavior, reflecting adaptive mechanisms similar to those observed in intelligent biological systems [14].

This paper focuses on designing an AI-assisted swarm robotic system using ACO for autonomous exploration. The goal is to develop a system that can explore unknown areas efficiently while adapting to changes in the environment. The system leverages the strengths of ACO for path optimization and AI for adaptability and learning. This approach aims to improve exploration efficiency, reduce collisions, and enhance overall swarm performance in complex and dynamic environments.

2. Related research work

Research on applying ACO to multi-robot exploration and path planning has advanced significantly, evolving from purely bio-inspired algorithms to hybrid approaches incorporating learning and adaptive mechanisms. Early studies demonstrated that ACO could generate robust and distributed path solutions for swarms operating in partially known or dynamic environments. Agrawal et al. [15] proposed one of the foundational ACO-based implementations for swarm robot path planning in dynamic settings, showing improved obstacle avoidance and coverage efficiency compared to greedy strategies.

Later work addressed convergence speed and responsiveness to environmental changes. Ali et al. [16] enhanced ACO by integrating deterministic search heuristics, which improved global path smoothness and local obstacle negotiation in grid-based environments. Similarly, Morin et al. [17] framed exploration as an NP-hard search optimization problem, adapting pheromone update strategies to reduce redundant coverage and improve detection time in search-and-rescue scenarios.

Several authors have developed new algorithms that combine ACO and RL to enhance swarm performance in terms of area coverage, collision avoidance, and overall efficiency, thereby creating a more powerful synergy between the two approaches [18]. The proposed models incorporating adaptive Q-learning have demonstrated the ability to dynamically fine-tune pheromone deposition and evaporation rates, enabling better adaptability to changing environments [19]. Furthermore, the Adaptive Deep ACO framework integrates deep learning and parameter adaptation directly into pheromone control, allowing the swarm to efficiently converge toward optimal paths in complex navigation environments with flexible or moving obstacles [20–21].

Moving beyond pure algorithm design, researchers have also tackled the messy realities of real-world deployment, such as limited communication, sensor noise, and teams of different robot types. Work on multi-robot scheduling has proven that ACO can effectively manage these heterogeneous teams in extreme environments—from deep mining tunnels to the Antarctic ice—by merging pheromone-based cues with explicit scheduling rules [22, 23]. This body of work drives home a critical point: while simulations are promising, real-world robustness depends on designing systems that can gracefully handle these practical constraints.

3. Methodology

3.1 Ant Colony Optimization

ACO is a metaheuristic inspired by the foraging patterns of natural ant colonies, where ants deposit pheromones to communicate information about resource-rich paths. In swarm robotics, this principle is abstracted into a computational model where artificial ants represent mobile agents that traverse the environment in search of optimal paths for exploration or target acquisition. Each agent probabilistically selects its next step based on two key factors: (1) the pheromone intensity on potential paths and (2) a heuristic measure, such as distance or obstacle density.

Mathematically, the probability of an artificial ant moving from node i to node j at time t is expressed as:

$\:{P}_{ij}\left(t\right)=\frac{{\left[{}_{ij}\left(t\right)\right]}^{}\:{{}_{ij}}^{}}{\sum\:_{k{N}_{i}}{\left[{}_{ik}\left(t\right)\right]}^{}\:{\left[{}_{ik}\right]}^{}}$

(Eq. 1)

Where τ_ij(t) represents the pheromone concentration on edge (i, j), η_ij is the heuristic desirability (e.g., inverse of distance), α and β control the influence of pheromone and heuristic information, N_i denotes the set of possible next nodes from node i.

As ants complete successful paths, pheromone trails are reinforced using a global update rule:

τ_ij(t + 1)= (1 − ρ)τ_ij(t)+Δτ_ij(t) (Eq. 2)

where ρ is the evaporation rate and Δτ_ij(t)) is the deposited pheromone proportional to the path quality. This evaporation mechanism prevents premature convergence by allowing exploration of alternative solutions. Over multiple iterations, the swarm collectively converges toward optimal or near-optimal exploration strategies.

Table 1
Key Parameters of the ACO Algorithm
Parameter	Role	Typical Range
α	Controls influence of pheromone concentration on path selection	1–2
β	Controls influence of heuristic information (e.g., distance)	2–5
ρ	Pheromone evaporation rate (prevents unlimited accumulation)	0.1–0.5
Q	Pheromone deposit factor proportional to path quality	Depends on problem scale
τ_ij	Pheromone level on edge between node i and j	Dynamic (updated iteratively)

Figure 1 presents the workflow of the ACO algorithm, beginning with initialization of parameters, followed by ant movement, path evaluation, and pheromone updating. The process iterates until convergence, ensuring adaptive exploration and convergence to an optimal or near-optimal path. Table 1 summarizes the key parameters influencing this process. The factors α\alphaα and β control the balance between pheromone intensity and heuristic desirability, while ρ\rhoρ regulates evaporation to maintain exploration. Parameter Q defines pheromone deposition, and τ_ij denotes the pheromone concentration on a path segment, updated iteratively to guide swarm decision-making.

Fig. 1

Flowchart of the ACO Process

3.2 AI Integration

While ACO provides an efficient decentralized mechanism for path planning and exploration, its effectiveness diminishes in highly dynamic and uncertain environments where rapid adaptation is essential. In this evolved framework, each robot functions as an autonomous explorer, continuously interacting with its surroundings and learning from the outcomes of its actions. Through this continuous cycle of trial, feedback, and refinement, the swarm collectively develops sophisticated and highly effective strategies over time.

For exploration missions, these learned behaviors translate into direct and measurable advantages: the swarm achieves more comprehensive area coverage, navigates obstacles with greater precision, and completes its objectives in significantly less time. Table 2 presents a comparison between the traditional ACO and the AI-assisted ACO models, highlighting their respective operational characteristics and performance metrics.

Table 2
Comparison of Traditional ACO vs AI-Assisted ACO
Aspect	Traditional ACO	AI-Assisted ACO
Adaptability	Limited, fixed pheromone update rules	High, adjusts dynamically via learning
Environment handling	Struggles with moving obstacles/dynamic terrain	Adapts efficiently to non-stationary changes
Learning capability	No learning from past experiences	Learns policies through reinforcement signals
Convergence speed	Slower, risk of stagnation	Faster due to adaptive pheromone control
Scalability & robustness	Good, but reduced under uncertainty	Enhanced scalability and robustness

Fig. 2

Conceptual Integration of RL with ACO

Fig, 2 illustrates how RL is integrated into the ACO framework to enhance swarm robotic exploration.

Left Panel (ACO process): The cycle begins with Ant Movement, where robots probabilistically choose paths based on pheromone intensity and heuristic information. This is followed by Path Evaluation, in which the quality of the chosen path (e.g., distance, coverage, obstacle avoidance) is assessed. Next, robots perform Pheromone Deposition and Evaporation, updating the pheromone trail strength. The loop continues until Path Convergence is reached, representing stabilization around an optimal or near-optimal path.

Integration Module (Center): Pheromone logic is influenced by RL signals. Specifically, pheromone updates τ_ij(t) are adjusted dynamically according to reward signals.

Right Panel (RL Process): The reinforcement learning (RL) cycle enables the robot to interact with its surroundings and receive rewards for positive behavior and penalties for negative behavior. This process updates the Q-values for subsequent actions, forming a continuous feedback loop between the RL and ACO mechanisms.

Interaction between ACO and RL: They highlight the two-way communication between the environment and the system, which involves performance feedback from the surroundings and reinforcement of pheromone rules based on previous learning outcomes.

3.3 Experimental Design

To evaluate the performance of the proposed AI-assisted ACO, simulations were designed across multiple grid sizes, swarm scales, and obstacle densities. The objective was to assess exploration coverage, adaptability, and scalability under increasingly complex conditions.

Environment Sizes & Swarm Scales:

50 × 50 grid with 20 robots (baseline, small-scale).

100 × 100 grid with 40 robots (medium-scale).

250 × 250 grid with 100 robots (large-scale).

500 × 500 grid with 200 robots (very large-scale, scalability test).

Obstacle Densities:

Low density (10%): sparse terrain, fewer obstructions.

Medium density (20%): balanced obstacle distribution.

High density (30%): cluttered, complex environments.

Performance Metrics:

Exploration Coverage (%): proportion of free space explored.

Convergence Speed: iterations required to reach 80% coverage.

Redundant Revisits (%): frequency of repeated exploration.

Adaptability: coverage recovery after introducing/removing obstacles mid-run.

Computation Time: execution time for scalability assessment.

This experimental design provides a rigorous evaluation of both traditional and AI-assisted ACO, ensuring scalability and robustness are tested across varying levels of complexity.

4. Simulation Setup

To evaluate the effectiveness of the proposed AI-assisted ACO framework, a series of simulations were carried out in Python within a grid-based 2D environment. Each grid cell was defined as either free space or an obstacle. Robots were modeled as autonomous agents with limited sensing ability and decision-making capabilities driven by pheromone levels and heuristic information. The objective of the swarm was to maximize exploration coverage while minimizing collisions and redundant revisits.

4.1 Environment Configurations

Four cases of increasing complexity were considered to analyze scalability and robustness:

• Case 1

• 50 × 50 grid with 20 robots.

• Case 2

• 100 × 100 grid with 40 robots.

• Case 3

• 250 × 250 grid with 100 robots.

• Case 4

• 500 × 500 grid with 200 robots.

Each case was tested under three obstacle densities: low (10%), medium (20%), and high (30%). Obstacles were randomly distributed to emulate uncertain environments such as cluttered terrain or debris.

4.2 Simulation Parameters

All robots used pheromone-based probabilistic decision-making with a pheromone evaporation factor ρ = 0.99 to avoid stagnation. In the AI-assisted ACO approach, reinforcement-learning-inspired adaptability was introduced by dynamically adjusting pheromone deposition and evaporation based on exploration rewards.

Each simulation ran for 300 iterations, where robots performed the following steps at each iteration:

Generate candidate moves (up, down, left, right).

Select next position using pheromone-weighted probability rules.

Deposit pheromones while applying global evaporation.

Mark explored cells to minimize redundant revisits.

4.3 Simulation Configurations

Table 4.1 summarizes the simulation parameters across all test cases, including environment size, swarm scale, obstacle densities, and performance metrics considered.

Table 4.1
Simulation Configurations
Parameter	Case 1	Case 2	Case 3	Case 4
Environment Size	50 × 50	100 × 100	250 × 250	500 × 500
Number of Robots	20	40	100	200
Iterations	300	300	300	300
Obstacle Density	10%, 20%, 30%	10%, 20%, 30%	10%, 20%, 30%	10%, 20%, 30%
Pheromone Evaporation ρ\rhoρ	0.99	0.99	0.99	0.99
Metrics Recorded	Coverage, Convergence Speed, Revisits, Adaptability, Computation Time	Same	Same	Same

This structured setup ensures fair and reproducible comparisons between traditional ACO and AI-assisted ACO across multiple scales and complexities.

4.4 Algorithmic Workflows

Two flowcharts were developed to illustrate the difference between traditional and AI-assisted approaches: Fig. 3: Algorithm 1: AI-Assisted ACO for Multi-Robot Exploration: Describes the hybrid algorithm integrating reinforcement-learning-inspired adaptability. Robots dynamically adjust pheromone deposition and evaporation based on performance rewards, enabling faster convergence and better adaptability in dynamic environments.

Fig: 4: Algorithm 2 – Traditional ACO for Multi-Robot Exploration:Represents the baseline ACO workflow with fixed pheromone rules. While effective in static setups, this approach lacks adaptability, leading to stagnation under dynamic obstacle conditions.

Fig. 3

: Algorithm 1: AI-Assisted ACO for Multi-Robot Exploration

5. Results and Discussion

The experimental evaluation compared the performance of the proposed AI-assisted ACO framework with the traditional ACO algorithm under multiple experimental configurations. Simulations were carried out across four grid sizes (50×50, 100×100, 250×250, and 500×500), with swarm sizes ranging from 20 to 200 robots and obstacle densities of 10%, 20%, and 30%. The analysis focused on four critical performance metrics: exploration coverage, convergence speed, adaptability in dynamic environments, and scalability with robustness. Across all conditions, AI-assisted ACO consistently outperformed traditional ACO, with the performance gap widening in larger and more complex environments.

5.1 Exploration Coverage

Fig. 4

Algorithm 2: Traditional ACO for Multi-Robot Exploration

Exploration coverage measures the proportion of free space successfully visited by the swarm, providing insight into the efficiency of area mapping. Higher coverage correlates with better exploration and fewer blind spots. In the baseline case (50×50 grid with 20 robots), the traditional ACO alg orithm achieved ≈ 45% coverage after 300 iterations. Under identical conditions, the AI-assisted ACO achieved ≈ 52% coverage, demonstrating the advantage of reinforcement-learning-inspired pheromone updates in reducing stagnation and promoting systematic exploration.

The benefits of AI integration became more pronounced in larger environments. In the 500×500 grid with 200 robots, AI-assisted ACO achieved ≈ 61% coverage, while traditional ACO stagnated at ≈ 48%. The growing performance divergence underscores a critical lesson: adaptability becomes paramount as environments grow larger and more cluttered. While conventional ACO, with its static pheromone rules, struggles to reorient the swarm's efforts, the AI-enhanced method dynamically steers robots into neglected zones, ensuring no area is left behind.

Fig. 5

Comparative Exploration Converge-Traditional ACO vs. AI-Assisted

The superior performance is illustrated in Fig. 5, which presents the simulation outcomes under the four conditions described in Section 4.4. In the graph, the red dots represent the robot positions, the gray cells denote obstacles, and the blue areas indicate the total region explored by the swarm agents. The AI-assisted agents cover a larger area with minimal gaps, regardless of the map size.In contrast, traditional ACO frequently left conspicuous unexplored patches, particularly in expansive grids, highlighting its inability to manage robot distribution as effectively as its intelligent counterpart.

Fig. 6

Convergence Speed of Traditional vs. AI-Assisted ACO

To quantify these findings, Fig. 6 shows a comparative bar chart of exploration coverage for all grid sizes. The gray bars represent traditional ACO, and the blue bars represent AI-assisted ACO. Across all test cases, AI-assisted ACO consistently outperformed the baseline by ≈ 7–15%. This consistency confirms the scalability of the approach, validating its capability to handle increasingly complex exploration tasks.

5.2 Convergence Speed

Convergence speed is defined as the rate at which the swarm stabilizes into efficient exploration patterns. Faster convergence reduces redundant movements, minimizing overall exploration time. Traditional ACO exhibited slower convergence due to its reliance on static pheromone reinforcement rules. Once suboptimal trails were reinforced, robots repeatedly revisited them, delaying discovery of unexplored regions. This inefficiency was particularly evident in large environments, where delayed convergence significantly reduced exploration performance.

The AI-assisted ACO framework addressed this limitation by incorporating reinforcement-learning-inspired reward signals. These signals dynamically influenced pheromone deposition and evaporation, strengthening trails that led to efficient exploration while suppressing ineffective paths. Consequently, robots were able to adapt more quickly to environmental feedback, achieving faster stabilization. Simulation results showed that AI-assisted ACO reduced convergence time by approximately 20–25% compared to traditional ACO. This advantage was amplified in the 250×250 and 500×500 grids, where convergence delays in traditional ACO caused significant stagnation.

Fig. 7

Convergence Trends of Traditional ACO vs. AI-Assisted ACO Across Different Grid Sizes

Figure 7 illustrates convergence trends over iterations. The AI-assisted ACO curve rises more steeply and reaches higher coverage earlier, while the traditional ACO curve grows slowly and plateaus at lower coverage levels. In practice, this means that AI-assisted ACO enables swarms to complete exploration tasks more rapidly, an essential advantage for time-critical missions such as search and rescue operations.

5.3 Adaptability in Dynamic Environments

Adaptability was tested by modifying the environment mid-simulation, introducing new obstacles and removing existing ones to replicate real-world dynamics such as collapsing structures or shifting debris. Traditional ACO struggled in these conditions. Robots continued following outdated pheromone trails that no longer led to valid routes, resulting in stagnation and large unexplored regions. In contrast, AI-assisted ACO recalibrated pheromone intensities using reinforcement-learning updates. Robots quickly adjusted their paths in response to environmental changes, redistributing coverage more effectively. This adaptability improved responsiveness by approximately 25%, allowing the swarm to maintain efficient exploration despite disruptions.

Fig. 8

Adaptability Test of Traditional ACO and AI-Assisted ACO Under Dynamic Obstacle Conditions

Figure 8 compares adaptability performance between the two approaches. The AI-assisted ACO swarm rapidly redirected exploration around new obstacles, while the traditional ACO swarm exhibited significant delays and left large coverage gaps. The ability to adapt in real time is particularly valuable for applications such as disaster recovery, where the environment is unpredictable and continuously changing.

5.4 Scalability and Robustness

Scalability is crucial for swarm robotics, as practical deployments often involve large numbers of agents. In the experiments, swarm sizes were scaled proportionally with grid size to evaluate scalability. Traditional ACO performed adequately in smaller swarms but showed diminishing returns in larger groups. The main limitation was pheromone saturation: as more robots deposited pheromones in the same region, trails became congested, leading to inefficient exploration and redundant revisits. AI-assisted ACO, however, maintained strong scalability. By combining pheromone-based coordination with decentralized reinforcement-learning-inspired adaptability, robots were able to make independent decisions that reduced congestion and promoted balanced exploration. Performance remained consistent even as swarm size increased from 20 to 200 robots.

Fig. 9

Scalability Performance of Traditional ACO and AI-Assisted ACO Across Increasing Swarm Sizes

Robustness was evaluated by simulating robot failures and sensor noise. In traditional ACO, performance degraded significantly, as reduced pheromone distribution left unexplored regions. In contrast, AI-assisted ACO maintained exploration efficiency, with the remaining robots compensating for failures through adaptive learning and pheromone redistribution. Even with up to 15% of robots disabl ed, exploration coverage declined by less than 5% in AI-assisted ACO, compared to nearly 12% in traditional ACO. Figure 9 presents scalability results. AI-assisted ACO maintained consistent efficiency across increasing swarm sizes, while traditional ACO performance plateaued in larger swarms. This result underscores the robustness and scalability advantages of the proposed framework, confirming its suitability for large-scale, real-world deployments.

6. Conclusion

The study conclusively demonstrated that AI-assisted ACO significantly outperforms traditional ACO in swarm robotic exploration tasks across diverse environments.

Exploration coverage improved consistently, rising from ≈ 44.6% with traditional ACO to ≈ 52.1% in the baseline 50×50 grid and reaching ≈ 61% in the 500×500 grid. These results highlight more efficient mapping of unknown and obstacle-rich environments.

Convergence speed was accelerated by ≈ 20–25% in AI-assisted ACO due to reinforcement-learning-inspired adaptability, which minimized redundant revisits and prevented stagnation in pheromone trails.

Adaptability was notably enhanced under dynamic obstacle scenarios. While traditional ACO struggled to recover from environmental changes, AI-assisted ACO rapidly recalibrated pheromone intensities, improving responsiveness by ≈ 25%.

Scalability and robustness were preserved even as the swarm scaled from 20 robots in small grids to 200 robots in a 500×500 grid. The decentralized adaptive framework also maintained stable performance under partial robot failures and sensor noise.

The results confirm that integrating pheromone-based coordination with reinforcement-learning mechanisms enables swarms to operate more intelligently, adaptively, and efficiently, even under high obstacle density (10–30%) and large-scale deployments.

This hybrid approach demonstrates strong potential for real-world applications including search and rescue missions, disaster recovery, planetary exploration, and environmental monitoring, where speed, adaptability, and resilience are critical.

Author Contribution

Author ContributionsB. Sivakumar Reddy – Ph.D. student; contributed to the development of all mathematical models and prepared the detailed research methodology for the study.Dr. S. K. Harisha – Ph.D. supervisor and corresponding author; conceptualized the research, guided the overall study, drafted the manuscript, and managed all correspondence with the journal.Dr. Jinka Ranaganayakalu – Assisted in executing the Python code and generating visual graphs and diagrams for data analysis and result visualization.Dr. M. Krishna – Contributed to the development and refinement of Python simulations and assisted in implementing and validating the computational models used in the research.

Declarations

Competing Interests

We need to verify this results to develop hardware swarm robotics for further research work.

Acknowledgement

We are grateful to RSST Trust and the Principal, RV College of Engineering for financing this research work and encouragement

References

Y. Alqudsi and M. Makaraci, “Swarm Robotics for Autonomous Aerial Robots: Features, Algorithms, Control Techniques, and Challenges,” IEEE Access, 2024.

E. Sahin, “Swarm robotics: From sources of inspiration to domains of application,” Springer, 2005.

M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarm robotics: a review from the swarm engineering perspective,” Swarm Intelligence, vol. 7, pp. 1–41, 2013.

M. Dorigo and T. Stützle, Ant Colony Optimization, MIT Press, 2004.

M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: Optimization by a colony of cooperating agents,” IEEE Trans. Syst. Man Cybern. B, vol. 26, no. 1, pp. 29–41, 1996.

A. Agrawal, S. Patel, R. Mehta, and D. Sharma, “Ant colony-based path planning for swarm robots,” ACM Digital Library, 2015.

M. Dorigo and C. Blum, “Ant colony optimization theory: A survey,” Theoretical Computer Science, vol. 344, pp. 243–278, 2005.

P. T. Bogdan, et al., “Dynamic path planning for swarm robotics using adaptive ACO,” Robotics and Autonomous Systems, vol. 101, pp. 1–12, 2018.

H. Pham and A. Wong, “Autonomous maneuver discovery for flight vehicles using deep reinforcement learning,” J. of Guidance, Control, and Dynamics, vol. 43, no. 12, 2020.

10.

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed., Pearson, 2020.

11.

L. Liu, Z. Jiang, and L. Zhang, “Intelligent pilot assistance system based on deep reinforcement learning,” Aerospace Sci. Technol., vol. 118, 107026, 2021.

12.

Y. Zhang and P. Li, “A pilot behavior prediction model using LSTM networks,” Aerospace, vol. 8, no. 5, 128, 2021.

13.

Wang, H. Li, X. Zhang, and M. Chen, “A hybrid deep learning framework for adaptive flight control and pilot assistance,” Engineering Applications of Artificial Intelligence, vol. 116, p. 105455, 2022.

14.

B. Zheng and Y. Lu, “Estimating cognitive load using multimodal physiological data fusion in a flight simulator,” IEEE Trans. Human-Machine Systems, vol. 52, no. 1, pp. 112–121, 2022.

15.

A. Agrawal, A. Sudheer, and S. Ashok, “Ant colony based path planning for swarm robots,” ACM Digital Library, 2015.

16.

H. Ali, S. K. Ghosh, and M. Pal, “Path Planning of Mobile Robot With Improved Ant Colony Optimization,” Frontiers in Neurorobotics, 2020.

17.

M. Morin, X. Zhu, and J. Wang, “Ant colony optimization for path planning in search and optimal search problems,” 2023.

18.

Blais, Marc-André, and Moulay A. Akhloufi, “Reinforcement learning for swarm robotics: An overview of applications, algorithms and simulators,” 2023.

19.

Wenkai Fang, Zhigao Liao, and Yufeng Bai, “Improved ACO algorithm fused with improved Q-Learning algorithm for Bessel curve global path planning of search and rescue robots,” Robotics and Autonomous Systems, vol. 182, article 104822, 2024.

20.

Xiangcheng Li, Zhaokai Ruan, Yang Ou, Dongri Ban, Youming Sun, Tuanfa Qin, and Yiyi Cai, “Adaptive Deep Ant Colony Optimization–Asymmetric Strategy Network Twin Delayed Deep Deterministic Policy Gradient Algorithm: Path Planning for Mobile Robots in Dynamic Environments,” Electronics, vol. 13, no. 20, article 4071, 2024.

21.

Bo Fu, Yuming Chen, Yi Quan, Xilin Zhao, and Chaoshun Li, “Bidirectional Artificial Potential Field-Based Ant Colony Optimization for Robot Path Planning,” 2025.

22.

P. Zhang, Y. Li, C. Wu, and H. Zhao, “ACO-based scheduling for multi-robot heterogeneous systems in harsh environments,” 2022.

23.

K. Kumar, R. Sharma, and T. Verma, “Task allocation in heterogeneous swarm robots using pheromone-inspired coordination,” 2021.

Funding

Declaration

The authors declare that no funding, grants, or other financial support were received in connection with this research work or the preparation of this manuscript.

Yes