Dynamic Risk-Aware Lane Change Decision-Making for Autonomous Vehicles Using Deep Contextual Learning
LakshmiNarayana1Email
I1
TMNVamsi2Email
1
A
A
A
A
Department of Information TechnologySeshadri Rao Gudlavalleru Engineering CollegeAndhra PradeshIndia
2Department of Computer Science and EngineeringGITAM Deemed UniversityAndhra PradeshIndia
Lakshmi Narayana I
Department of Information Technology, Seshadri Rao Gudlavalleru Engineering College, Andhra Pradesh, India
E-mail: ilnarayana1226@gmail.com
ORCID iD: https://orcid.org/0000-0002-5083-6813
TMN Vamsi
Department of Computer Science and Engineering, GITAM Deemed University, Andhra Pradesh, India
E-mail: mthalata@gitam.edu
ORCID iD: https://orcid.org/0000-0001-6454-3934
Abstract.
In real traffic, change a lane safely mostly relies on the system’s ability to judge the distance to the car behind and in front of it. Applying rigid rules with specific limits frequently ends up being too restrictive and doesn’t help when making decisions about changing a lane. In this research, I propose a new system that learns from the context and improves it with reinforcement learning that makes it a more accurate and reliable system. To understand the lane change risks in the traffic at the moment I apply ResNet50 with transfer learning and enhance it with LSTM layers. To detect and track cars and also know what they might do I use Mask R-CNN with CNN and LSTM so that all these three things can be done by the model.Since the traffic conditions are always different I also apply an analysis of the weather, speed, acceleration, steering angle, and the road surface conditions as additional inputs to the system.To make the decisions safer I added a Double Deep Q-Network, which was found to be a steadier and faster to train than older reinforcement learning methods in heavy traffic conditions. From the simulation results, we can see that the check of the risks is clearer and more accurate, the decisions are better, and the change of a lane is smoother. So the system is more safe and reliable, and we move one step closer to the smarter transport systems.
Keywords:
Autonomous Vehicles
Lane Change Decision-Making
Risk Assessment
Inter-Vehicle Distance
Intelligent Transportation Systems
1. Introduction
A
A
1.1 Background and Motivation
Lane changing is one of the most primitive and challenging tasks in autonomous driving. It demands that vehicles make fast and meaningful decisions in different traffic scenarios while navigating safely and efficiently. Unlike lane keeping, which simply concerns the task of staying in the lane, lane changing is a highly interactive process, where vehicles need to cooperate with surrounding vehicles and exhibit predictive understanding of their future behaviors[1]. Failures in lane changing may lead to safety issues such as side-impact, following, or piling accidents. Driven by the fast development of intelligent transportation systems and connected vehicles, it has become an urgent challenge to solve the problem of safe and efficient lane-changing in real traffic[2]. This has motivated us to shift from rule-based heuristics to more data-driven and context-aware methods that can handle various traffic scenarios.
Traditional methods for lane-changing decision-making adopt heuristic rules with static thresholds, such as fixed minimum gaps or time-to-collision margins. Although these heuristics are computationally inexpensive, they are unable to model the variation of drivers’ behaviors, uncertainties in external environment, and traffic heterogeneity[13]. For instance, the former assumes that aggressive drivers may try more aggressive behaviors at smaller gaps, while normal drivers are more cautious and keep larger gaps. Thus, simple threshold-based methods cannot be generalized to different traffic participants. In addition, road conditions, such as rain, fog, or tilted roads, will affect the vehicle’s dynamic behaviors. Simple rule-based models cannot cover all possible situations, and thus, robust predictive systems that can understand various contextual information and output adaptive and risk-aware strategies for lane-changing are needed.
Recent developments in deep learning have shown promising potential in solving these problems. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) like LSTMs have shown remarkable performance in capturing spatial and temporal dependencies from traffic scenes. By combining models such as ResNet50 for visual perception and LSTM for sequential risk analysis, lane-change decisions can now be achieved with improved accuracy and adaptability[18],[10]. Similarly, detection approaches like Mask R-CNN have been shown to effectively detect vehicles with high accuracy even in occlusions and different traffic densities. With deep learning, autonomous vehicles are no longer limited to merely perceiving and tracking neighboring cars, but also able to infer their intentions, facilitating autonomous vehicles to avoid collisions with neighboring cars in lane change maneuver.
1)
However, perception cannot solve the decision-making problem. An autonomous vehicle also needs to know when and how to execute a lane change under competing pressures of safety, comfort and efficiency. Given this nature of sequential decision making, reinforcement learning (RL) provides a natural solution for the vehicle to learn policies that maximize long-term rewards instead of fixed rules. Double Deep Q-Networks (DDQNs) provide more stable and convergent reinforcement learning compared to standard Q-learning by mitigating overestimation bias in action-value function approximation. Furthermore, when coupled with contextual dimensions, such as vehicle speed, acceleration profile and other environmental conditions, autonomous systems can learn policies that map these context dimensions to safe and resilient action strategies that generalize over different traffic scenarios.
2)
The motivation for this work arises from the desire to link what the vehicle sees with the choices it makes, and have all of this in one system that can adapt to different driving conditions. By combining computer vision methods with sequence learning and reinforcement learning, this system goes beyond simple rules and becomes easier to scale up for real use. When extra information such as weather and road conditions are added, the vehicle can learn stronger lane-change strategies that work in normal as well as challenging situations. In summary, our work is aimed at advancing autonomous driving by providing a context-aware, reliable, and careful-with-risk decision-making framework, and enhancing safety and efficiency in traffic with many moving agents.
3) 1.2 Importance of Lane Detection in Autonomous Vehicles
Knowing, predicting, and deciding in rapidly changing traffic situations is one of the most challenging aspects of lane changing for self-driving cars. Real roads are full of unexpected situations vehicles going the same speed as you crazy drivers sometimes even missing lane markings due to rain or fog [3]. As expected, rule-based systems fail here because they are based on fixed settings and cannot adapt to the changing traffic conditions.On top of this, how drivers behave whether they are cooperative, aggressive, or passive is also unpredictable, making it difficult to assess the risks and decide on safe trajectories.
Another challenge is fusing information from multiple sources in real time, such as cameras, LiDAR, radar, GPS, and vehicle information like acceleration, braking, and steering angle [4]. To make safe lane-change decisions, the system has to handle all of this information in real time without any delays.Most existing models still assume sensors always work perfectly and the road conditions never change. However, this assumption breaks down when there is poor weather or heavy traffic [5]. As a result, today's lane-change systems lack powerful predictive models that can work in different traffic situations and adapt to them, and also keep on learning from them, which limits their reliability and use in practice.
4) 1.3 Research Objectives and Contributions
The aim of this work is to develop a context-aware lane-change decision system to make autonomous cars safer, more dependable, and more flexible when changing traffic. Unlike conventional rule-based models with limited scopes, our method integrates deep learning, prediction, and reinforcement learning to model both timing and spatial patterns in driving. We apply Double Deep Q-Network (DDQN) to reinforce step-by-step decisions in a stochastic environment, employ Mask R-CNN with CNN-LSTM to detect vehicles and predict their future behavior, and use ResNet50 with LSTM to evaluate the risk in real time. To make sure that the above system can adapt to the road layout, traffic density, and weather conditions, we also model the vehicle motion, environmental factors, and driving styles.
Our work makes three contributions. First, we design a multi-model architecture to correlate perception and prediction with the decision making for planning the lane change. Second, we apply temporal modeling to model aggressive driving and careful driving separately, which improves behavior prediction accuracy. Third, we improve the stability and reliability of the choice of changing in heavy traffic by applying DDQN to fix overestimation issues in reinforcement learning. Our work demonstrates how strong contextual modeling and risk-aware decision making can improve the safety, efficiency, and flexibility of autonomous transport systems and set a new benchmark for research toward self-driving cars.
2.Related Work
Autonomous vehicle research has long focused on lane-change prediction and decision-making. A dual-branch model that identified lane change intent and vehicle status by combining trajectory prediction networks with behavior categorization modules was presented in a widely regarded study by Yuan et al. [1]. Although the system showed encouraging results in situations with less traffic, it was not well-suited to high-density situations and had unpredictable driving patterns. Similarly, to customize lane change behavior, Liao et al. [2] created a driver digital twin architecture based on attention-based sequence modeling. However, real-time responsiveness and generalization to unknown drivers were hampered by the system's need on extensive historical data.
Patel et al. [3] implemented RNN-based deep models for future lane change prediction using past trajectories. Although it effectively captured temporal patterns, its exclusion of environmental variables such as weather and visibility reduced its real-world applicability. Scheel et al. [4] advanced interpretability through attention-based encoder-decoder architectures, but their models exhibited latency in reacting to dynamic traffic flows. Han et al. [5] proposed a rule-enhanced deep learning approach for highway intention recognition; however, the continued use of fixed decision thresholds impaired adaptability in congested and non-cooperative traffic environments.
Li et al. [6] introduced a computer vision-based driver intention recognition model using facial expression cues and lane proximity. Despite fair accuracy in structured lighting conditions, its heavy dependence on visual-only data made it ineffective under occlusion, shadows, or low-light conditions. Yu et al.[7] presented a machine learning-based vehicle intention and trajectory recognition model that utilized XGBoost and clustering algorithms. While computationally efficient, the model lacked spatial depth and contextual awareness from fused sensor modalities. Sun et al. [8] utilized a multi-layer LSTM for lane-change safety classification, reporting over 85% accuracy; yet, the model failed to maintain consistency under sudden trajectory perturbations or sensor noise.
Zhang et al.[ 9] provided a systematic survey on perception challenges under adverse weather. Although comprehensive in sensor comparison, their review lacked integration of predictive frameworks like behavior forecasting or planning under uncertainty. Wang and Chan [10] used dynamic game theory for lane-change negotiation among vehicles. While analytically robust, the model assumed rational agents and failed to model real-world unpredictability or latency in decision propagation.
Deo and Trivedi [11] proposed convolutional social pooling in conjunction with LSTM networks to predict trajectories in crowded highway settings, achieving significant gains in trajectory accuracy. Yet, the model’s focus was restricted to position prediction without explicit intent estimation. Altché and de La Fortelle [12] also used LSTM networks to forecast vehicle positions but did not integrate surrounding vehicle behavior or road semantics, limiting its contextual understanding. Similarly, Kim and Ghosh [13]used simple LSTM models for binary intent prediction (change or not), but their model suffered from overfitting due to a limited dataset and lacked interpretability.
Earlier works like Houenou et al. [14] developed a maneuver recognition system using motion models, yet it required finely tuned heuristics and lacked learning-based adaptability. Lefèvre et al. [15] surveyed motion prediction and risk estimation frameworks; however, the methods largely predated modern neural architectures and emphasized geometry-based strategies which are insufficient in highly dynamic or heterogeneous driving environments.
5) 3.Proposed Methodology
The proposed system architecture for autonomous lane-change decision-making integrates multiple data processing pipelines, each designed to capture a specific aspect of real-world driving dynamics. At its core, the framework is divided into three synergistic modules: a behavioral risk profiling unit, a visual context perception system, and an inter-vehicle distance estimation component. These modules work in parallel to collect, process, and fuse temporal-spatial features to feed into the final decision engine. The end goal is to dynamically assess the feasibility and safety of executing a lane change, by aggregating multi-modal inputs and predicting near-future traffic behavior in real time.
A
Behavioral profiling employs both past and real-time information of vehicles collected from internal sensors. To model this, we apply sequence learning model based on ResNet50 and LSTM layers to model patterns for both space and time. With pretrained ResNet50, we can extract key features out of compressed driving signals, and LSTM models the order and flow of events, learns long-term driving styles, and detects any suspicious behavior. The combined model outputs a changing risk score that represents driving context of the vehicle now and then updates this score when new data arrives. This score acts like a gate that decides whether to allow the vehicle to perform a lane change or not. Whether the vehicle can perform a lane change depends on this score that represents safety limits learned from past driving behavior.
In addition to the behavioral module, the visual perception pipeline applies Mask R-CNN, followed by CNN and LSTM layers, to model the road scene. With training on traffic datasets, Mask R-CNN models vehicles and lanes separately by applying instance segmentation. Then, CNN-LSTM stack models the motion trend and infers where vehicles are going. This information can help to determine whether the neighboring car is accelerating aggressively or making room. Whether the system can perform a lane change depends on this information.
Complementing the above is a monocular vision-based inter-vehicle distance estimation system. This module utilizes YOLOv3 for real-time object detection, localizing surrounding vehicles in the camera frame. Once detected, the framework estimates the distance to each vehicle using the principle of similar triangles and geometric scaling, assuming fixed camera height and focal length. This lightweight technique allows for rapid, on-device processing without the need for LiDAR or stereo cameras, making the solution deployable on consumer-grade autonomous platforms. This continuous distance monitoring ensures the system maintains safe headways and avoids making decisions in situations with poor lead/follow gaps.
All three modules are orchestrated by a central decision fusion engine. This engine performs late fusion of outputs from each pipeline, combining the risk score, predicted behaviors, and proximity metrics. A rule-based and learning-driven hybrid strategy is applied here, where a neural policy model evaluates whether a lane change is both safe and optimal. The model adapts to traffic conditions such as congestion, weather anomalies, or unpredictable driver behavior by referencing long-term training from varied datasets. As a result, the system doesn’t rely on fixed thresholds, but dynamically adjusts its decision policy based on current context, ultimately aiming to mimic human-like adaptability while maintaining strict safety constraints.
A
Fig. 1
Architectural Diagram
Click here to Correct
6)
3.2 Data Acquisition and Preprocessing
a) 3.2.1 Sensory Data and Input Features
Autonomous cars need a steady stream of sensor data to create a snapshot of their environment. We use sensors like LiDAR, radar, cameras, IMUs, and vehicle telemetry to provide inputs that give the car information about both its internal state and the external traffic situation. This helps the car “see” what it needs to know to safely drive itself.Signals like speed, steering angle, heading, and lateral/longitudinal velocities tell us how the car is moving. For instance, we can see from changes in steering angle how the car handles curves, lane-following, or quick turns. We can also see how the car moves along the x- and y-axes via values for velocity components. This helps with precise trajectory tracking.
A
Table 1
Sensor Data Collection
Metric
Unit/Type
Metric
Unit/Type
Speed
km/h
Acceleration
m/s²
Steering Angle
Degrees (°)
Heading
Degrees (0°–360°)
Trip Duration
Minutes
Trip Distance
Kilometers (km)
Fuel Consumption
Liters (L)
RPM
rev/min
Brake Usage
Count/Intensity
Lane Deviation
Meters
Weather Conditions
Categorical
Road Type
Categorical
But it’s also important to know when things are happening. To ensure that data from different sensors arrives at the car at the same time, most sensors use timestamps or simulation step-time. This helps ensure that lane change planning, risk evaluation, and trajectory prediction are based on the latest information. Temporal resolution is also important because some learning models like LSTMs must be trained with sequential input. These models can tell whether a spike in acceleration is simply an overtaking move or a sign of particularly risky, aggressive driving. As a result, it’s important to be able to combine information about how things move over time with information about how they’re positioned in space when building a context understanding pipeline.
When it comes to making lane changes, how far away other cars are isn’t any different from how close the ego-vehicle is itself. Longitudinal distance tells the car how much space other cars take up in front of or behind it. Lateral distance tells the car how close another car is in the other lane. Smaller gaps mean a higher risk of collision, so the car must constantly measure gaps to brake, steer, and slow down appropriately. By always measuring these gaps, the car can predict interactions and conflicts and adjust its strategy in real time. Pairing this data with outside conditions like the road itself or weather conditions gives the car a complete picture of the driving environment.
A
Fig. 2
Visual input samples captured during autonomous driving under different traffic and environmental conditions
Click here to Correct
7)
To evaluate decisions made by the car, we need to be able to render environment data. We represent the road network as nodes and connections. Outer nodes are responsible for representing entry and exit points, while inner nodes represent intersections. We design lanes to support both inbound and outbound traffic, creating realistic and natural vehicle flow. To create high-quality renderings, we combine structured road models with multimodal sensor data and camera vision. This allows us to see vehicles, signs, lane markings, and more, creating a dynamic and realistic testbed that we can use to evaluate risk assessment, lane detection, and decision-making across a variety of conditions.\
8) 3.2.2 Exploratory Data Analysis
9)
We explored steering and speed using boxplots and histograms. Most speeds tended to cluster around moderate speeds, though we did see occasional outliers for sudden braking or acceleration. These rare events are extremely important, since they represent aggressive driving and emergencies that have a large impact on how and when the car decides to change lanes. Similarly, acceleration analysis showed sharp spikes when the car was overtaking or braking to avoid a collision. Most acceleration values stayed within small oscillations. These results show how important it is to normalize data so that rare but important cases aren’t obscured during training.Then we applied correlation analysis on telemetry features such as speed, rpm, steering angle and lane departure. We can see from the heatmap that acceleration has strong correlation with rpm, and steering angle has strong correlation with lane departure. It also verifies that these features indeed play certain role in forming these two time series, and we can use them as input for prediction..
Fig. 3a
Distribution of steering_angle Fig. 3b Correlation heatmap
Click here to Correct
Click here to Correct
Furthermore, persistent drift patterns that can indicate difficult road conditions or less-than-ideal management tactics could be found using temporal trend plots of lane departure over time. All things considered, this exploratory study not only validated the dataset's modeling applicability but also provided valuable insights for feature engineering processes that are necessary for reliable, risk-aware autonomous driving decision-making.
3.2.3 Steering Angle and Lane Change Prediction Modeling
Using eight dynamic and environmental covariates vehicle speed, longitudinal acceleration, heading angle, lane deviation, categorical road type, weather, engine revolutions per minute and traffic density the steering angle prediction module is designed as a supervised regression task. To reduce scale imbalances and enhance numerical conditioning during optimization, these heterogeneous features are subjected to z-score standardization prior to model feeding, guaranteeing zero mean and unit variance. Following normalization, the feature vectors are sent into a hybrid deep learning pipeline, where a Gated Recurrent Unit (GRU) layer records long-range motion correlations and temporal dependencies across consecutive driving frames. By efficiently managing information flow through update and reset gates, the GRU's gating mechanisms mitigate the vanishing gradient issue and maintain sequential cues that are essential for maneuver prediction.Similar to the first model, the data passes through a feature engineering pipeline that includes min-max scaling, temporal framing, and optional lag feature augmentation. The neural architecture integrates a stacked GRU layer (e.g., with 64 and 32 units respectively), followed by an attention block and a series of fully connected (Dense) layers with dropout regularization. The final output node uses a linear activation to yield a continuous prediction representing lane change timing or risk probability.
10) 3.2.4 Distance Estimation to Nearest Autonomous Vehicle Using YOLOv3
The YOLOv3 object detection architecture leverages multi-scale feature maps for detecting vehicles at varying distances and scales in real-time. The network backbone, a 52-layer Darknet-53, extracts hierarchical spatial features using residual connections and convolutional blocks with Batch Normalization and LeakyReLU activation. The model outputs three detection heads at different resolutions, facilitating high accuracy for both small and large objects. The output tensors of shape (S,S,B×(5 + C)) per scale encode bounding box coordinates, objectness scores, and class probabilities.
A
Fig. 4
YOLOv3 for Inter-Vehicular Distance
Click here to Correct
For distance estimation, YOLOv3 first localizes all vehicles within the frame by calculating bounding box centers (x, y) and dimensions (w, h) using anchor boxes in the yolo_head function. These raw outputs are adjusted for scale and position with the yolo_correct_boxes() function, which performs spatial transformation from feature map coordinates to image pixel space. The corrected bounding boxes are then sorted by Euclidean distance from the ego vehicle’s center projection (typically assumed to be at the center-bottom of the frame). The object with minimum distance and class label “car,” “bus,” or “truck” is marked as the nearest autonomous vehicle.
Using the intrinsic camera parameters and assuming a flat road model, the pixel height hph_p of the detected bounding box is used to estimate real-world distance D based on the pinhole camera model:
Click here to download actual image
............................................Eq. 1
where f is the camera focal length in pixels and H is the actual height of the vehicle. This estimation is calibrated using a lookup table or vehicle type-specific height prior. The bounding box with the smallest calculated DD is considered the closest vehicle. The YOLO evaluation function (yolo_eval), including non-max suppression, ensures that only the most confident and non-overlapping predictions are considered, filtering out low-confidence detections. This real-time distance approximation is critical for autonomous vehicle behavior planning and collision avoidance systems.
11)
3.3 Model using Transfer Learning and Temporal Encoding
a) 3.3.1 ResNet50-Based Visual Feature Encoding
To capture the rich visual semantics from video frames captured through a monocular camera, ResNet50 was employed as a pre-trained backbone network. ResNet50, a deep convolutional neural network architecture with 50 layers, leverages residual learning via skip connections, allowing for the training of very deep networks without the vanishing gradient issue. The model was initialized with ImageNet weights and truncated at the penultimate layer (second-last fully connected layer), which outputs a high-dimensional feature vector for each frame. Mathematically, each input image frame ItI_tIt​ is transformed via a series of convolutional (Conv), batch normalization (BN), ReLU activation (σ), and identity shortcut functions F, expressed as:
…………………………….Eq. (2)
where x is the input tensor, Wi represents the weights of the convolutional layers, and y is the feature map output after residual mapping.
3.3.2 Mask R-CNN for Object Detection
Mask R-CNN is employed for precise instance segmentation to isolate relevant road entities such as vehicles, lanes, and other dynamic objects, which are crucial for autonomous lane change risk evaluation. Initially, the raw monocular images are processed using the Detect Objects module that utilizes the Mask R-CNN pipeline. This model builds upon the Faster R-CNN architecture by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with existing branches for classification and bounding box regression. The segmentation is mathematically modeled as a per-pixel classification task over a fixed spatial grid per object proposal. The key formulation includes the binary mask loss defined as:
Click here to download actual image
....................Eq. (3)
where m×m is the size of the predicted mask, yi is the ground truth for pixel i, and y^i is the predicted probability.
The masked output images are saved and later used as filtered visual inputs, thereby reducing irrelevant noise and focusing on regions directly affecting lane change decisions. These masked frames are resized using a fixed scaling factor (e.g., 0.1 along both axes), normalized, and then organized as sequential data with fixed frame length (e.g., 50 frames per sample).
The extracted video features are then fed into a CNN + LSTM hybrid architecture built using the Models class. The CNN layers extract spatiotemporal embeddings from each masked frame, which are sequentially passed through LSTM layers to capture temporal dependencies critical for anticipating evolving risk. This CNN-LSTM model is trained using cross-validation and class-weighting strategies to handle class imbalance (e.g., {0: 0.05, 1: 0.95}), enhancing generalization. The architecture leverages masked visual information and temporal modeling to improve the predictive accuracy of risky lane change behaviors, forming the backbone of real-time autonomous risk evaluation.
b)
3.3.3 Temporal Sampling and Risk Label Encoding
To ensure temporal uniformity, each driving video is downsampled to a fixed frame count, such as 50 frames per trip. The extract_features() function uses the truncated ResNet50 model to process these frames into a temporal sequence of visual embeddings, with each feature vector representing spatiotemporal vehicle context. The risk assessment data for each trip, pre-recorded in "LCTable", is read and aligned to video sequences. A binary one-hot encoding scheme is applied based on a risk threshold here, a risk value of 0.05 was used. Formally, if the computed risk score rrr for a frame is greater than the threshold τ, then:
Click here to download actual image
.This converts each trip’s risk into a format compatible with categorical cross-entropy loss functions during supervised learning.
c)
3.3.4 Temporal Modeling via LSTM Network
Once the per-frame visual features are extracted, the sequence of embeddings is passed into a Long Short-Term Memory (LSTM) network. The LSTM captures dependencies across time, ideal for modeling vehicle risk evolution throughout a trip. The LSTM unit utilizes a gated memory mechanism composed of input, forget, and output gates that determine the contribution of past and present inputs:
fₜ = σ(Wf · [hₜ₋₁, xₜ] + bf)
iₜ = σ(Wi · [hₜ₋₁, xₜ] + bi)
oₜ = σ(Wo · [hₜ₋₁, xₜ] + bo)
C̃ₜ = tanh(WC · [hₜ₋₁, xₜ] + bC)
Cₜ = fₜ ⊙ Cₜ₋₁ + iₜ ⊙ C̃ₜ
hₜ = oₜ ⊙ tanh(Cₜ).....................................................................Eqs. (4)
Here, ht​ is the hidden state at time t, Ctis the cell state, and xt is the ResNet50-derived input feature at time t. These equations govern how the model propagates memory through each frame, enabling it to detect risk based on sequential environmental changes.
12) 3.3.5Double Deep Q-Network (DDQN) Algorithm
The Double Deep Q-Network (DDQN) extends the standard DQN by addressing the problem of overestimation bias in Q-learning. In the traditional DQN, the target for the Bellman update is computed as:
Click here to download actual image
………………………Eq(5)
where r is the reward, γ is the discount factor, s′ is the next state, and Qθ−​ represents the target network with parameters θ − . The issue with this formulation is that the same network is used to both select and evaluate the next action, leading to a systematic positive bias in Q-value estimation.
To overcome this limitation, DDQN decouples the action selection and action evaluation steps by using the online network for selection and the target network for evaluation. The target in DDQN is defined as:
Click here to download actual image
……………….Eq. (6)
Here, the online network Qθ ​ chooses the action that maximizes the Q-value at the next state, while the target network Qθ − evaluates this action. This separation reduces overestimation and stabilizes training, particularly in complex state-action spaces encountered in dynamic traffic simulations.
The optimization objective of DDQN minimizes the temporal-difference (TD) error between the predicted and target Q-values:
Click here to download actual image
……………..Eq. (7)
where D is the experience replay buffer storing transitions (s,a,r,s′). Mini-batches sampled from D decorrelate training samples and improve learning efficiency. By leveraging experience replay along with target network stabilization, DDQN achieves more reliable convergence compared to DQN. In autonomous lane-changing, this translates to safer and risk-aware decision-making, as the algorithm can better estimate long-term consequences of maneuvers under uncertain traffic conditions.
13) 4. Methodology
14) 4.1 Implementation
A
15)
The proposed system begins with a multi-stage perception module that takes as input video streams from the vehicle’s cameras. With YOLOv3 detection network, the nearby vehicles are detected in real time. Our network convolves video streams and outputs bounding boxes and confidence values at the three different scales from feature pyramid layers and residual blocks. Bounding boxes are represented as (xc,yc,w,h) and rescaled to the original frame size with anchor priors. To compute the distance to other vehicles, we use pixel-to-meter calibrated values. To achieve more accurate results, we use Mask R-CNN for instance segmentation, which masks out the background region and leaves only the vehicle shape.Each segmented video sequence is cropped into 50 frames, bilinearly resized and fed through ResNet50 backbone (pre-trained on ImageNet). At this level, we get 2048-dimensional embeddings from the network output which preserves high-level information such as vehicle’s shape, heading and context - these inputs are fed to the temporal models.
16)
After perception, the features are passed into CNN-LSTM risk classifier. The CNN layers learn the spatial dependencies between neighboring frames, and the LSTM layers learn the sequential changes between consecutive frames, which helps the model understand the traffic flow and risky behaviors in the past and present. Since the number of risky driving samples is relatively small, we use weighted cross-entropy loss function and penalize the model more for misclassifying risky cases. The final classifier output a single safe/risky label. A probability prisk > 0.05 triggers the additional safety checks. To prevent overfitting, we use batch normalization, dropout, and early stopping techniques. This stage is validated at the end with k-fold cross validation (90:10 train-test split) to make sure that our performance estimates are calibrated. This stage works as the safety-critical backbone before control decisions are made.
17)
The control prediction module then takes over and focuses on steering angle estimation and lane change intent. Steering angle prediction uses a set of vehicle and environment features: speed, acceleration, heading, RPM, lane offset, traffic density, road type, and weather. These are standardized using z-score normalization. These features are passed into GRU-Attention network. The GRU learns the driving trend between consecutive frames, and the attention mechanism focuses on important moments, such as other cars braking suddenly or making sharp turns. A final dense regression layer outputs continuous steering angle labels, trained with Mean Squared Error (MSE) and optimized using Adam optimizer.In parallel, we use another GRU-Attention network to predict the lane change intent. Here, we use embeddings to represent the categorical variables for road type and weather, and the sigmoid-activated output layer outputs a probability score for lane change. This is used by the system to make proactive maneuvering decisions.
18)
The adaptive decision-making module employs DDQN because the traffic is highly unpredictable, and better lane change execution is needed for such traffic.12 Compared with the DQN model, DDQN learns a more stable update and has a less optimistic action value.13 Past driving experiences are stored in a replay buffer and then small samples are randomly drawn from it. To break the correlation between two consecutive experiences, the samples are drawn from the replay buffer. The goal of the training process is to minimize the difference between the expected outcome of an action and the actual outcome.
19) 5.Results and Discussion
20) 5.1 Performance Metrics
The experimental tests show how well the proposed multi-stage setup works by comparing different models on precision, recall, accuracy, and F1-score. YOLOv3 delivered the best results in object detection, with about 93.5% precision and 94.2% accuracy, proving it can handle real-time detection tasks reliably. R-CNN, while a bit behind at 88.4% precision and 87.5% accuracy, was still consistent, though slower in processing. The CNN + LSTM hybrid stood out for recognizing traffic patterns over time, scoring 92.7% recall and 93.1% accuracy, which shows its strength in handling both space and time-based features. Finally, the DDQN module came out on top with 94.6% precision and strong balance across all metrics, confirming its ability to make adaptive, risk-aware lane change decisions.
A
Table 2
Performance
Model
Precision (%)
Recall (%)
Accuracy (%)
F1-Score (%)
YOLOv3
93.5
91.8
94.2
92.6
R-CNN
88.4
86.2
87.5
87.3
CNN + LSTM
91.2
92.7
93.1
91.9
DDQN
94.6
93.7
94.5
92.3
.
Fig. 5a
Safe Fig. 5b: Risky
Click here to Correct
Click here to Correct
As shown in Fig. 5a and 5b, the confusion matrix represents the binary classification outcomes of the inter-vehicular distance model, categorizing situations as either Safe or Dangerous. The model accurately identifies 91.69% of Safe cases and 86.05% of Dangerous cases, indicating a high level of reliability in real-world autonomous navigation scenarios. Moreover, the false positive rate for predicting Dangerous as Safe stands at only 13.95%, and the false negative rate for predicting Safe as Dangerous is merely 8.3%. These values reflect the model’s robust decision-making capacity in distinguishing between potentially hazardous and non-hazardous inter-vehicular distances, thereby enhancing the safety intelligence of autonomous driving systems.
Click here to download actual image
Figure 6
YOLOv3-Based Real-Time Vehicle Detection and Inter-Vehicular Distance Estimation
A
In the observed result of Fig. 6, the YOLOv3 model successfully detected and localized surrounding vehicles in a real-time driving scenario using bounding boxes, as seen in the annotated frame. The car in the adjacent lane was identified with high confidence, and the inter-vehicular distance was accurately calculated and displayed as 7.10 ft, leveraging the bounding box parameters and geometric calibration techniques. This quantitative spatial measurement provides critical input for collision risk assessment and safe lane change decisions, demonstrating the capability of YOLOv3 not only in object detection but also in enabling real-time distance estimation essential for autonomous driving systems.
6. Conclusion and Future Work
6.1 Summary of Contributions
In this work we brought vision, prediction and decision-making together in one coherent pipeline for self-driving. Starting from YOLOv3 and Mask R-CNN we grabbed moving cars around the ego vehicle, then ResNet features gave us a small picture of the scene. GRU-Attention block let us change steering while accounting for lane shifts risk with more sense of time and context. Finally, DDQN wrapped everything up together, made lane changes less random, and aware of the traffic. What is nice about the whole pipeline is that we can see how all perception, prediction and control brings together to make safer driving instead of having them as separate modules.
6.2 Future Enhancements
There is a lot of room to extend this work. First, we should add more footage with different weather and night-time so models don’t get confused with fog, glare, and rain. Second, we can add sensor fusion between LiDAR, radar, and cameras which will increase accuracy in foggy and rainy conditions. Third, we can scale up GRU-Attention and DDQN to multi-lane highways with mixed traffic types and see how far can this go. Another direction is explainability, to say give clear reasons why a lane change was made so humans can trust the black-box less. Long run, we can connect this to real vehicles to take it from labs to roads.
1.
7.Declarations:
2.
7.1Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
A
Author Contribution
L.N. and T.M.N.V. conceptualized the study and designed the methodology. L.N. implemented the models, performed the simulations, and analyzed the experimental results. T.M.N.V. supervised the research, provided critical insights, and guided the interpretation of results. Both authors contributed to drafting and revising the manuscript. All authors reviewed and approved the final version of the manuscript.
8.References
1.
Yuan, R., Abdel-Aty, M., Gu, X., Zheng, O., Xiang, Q.: Lane Change Intention Recognition and Vehicle Status Prediction for Autonomous Vehicles. arXiv preprint arXiv:2304.13732. (2023)
2.
Liao, X., Zhao, X., Wang, Z., Zhao, Z., Han, K., Gupta, R., Barth, M.J., Wu, G.: Driver Digital Twin for Online Prediction of Personalized Lane Change Behavior. (2022). arXiv preprint arXiv:2211.01294.
3.
Patel, S., Griffin, B., Kusano, K., Corso, J.J.: Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models. arXiv preprint arXiv:1801.04340. (2018)
4.
Scheel, O., Nagaraja, N.S., Schwarz, L., Navab, N., Tombari, F.: Attention-based Lane Change Prediction. arXiv preprint arXiv:1903.01246. (2019)
5.
Han, T., Jing, J., Ozguner, U.: Driving Intention Recognition and Lane Change Prediction on the Highway. arXiv preprint arXiv:1908.10820. (2019)
6.
Li, X., Chen, H., Hua, H., Wang, Y.: Driver Intention Recognition Based on Computer Vision. SAE Technical Paper 2022-01-7025. (2022)
7.
Yu, H., Huo, S., Zhu, M., Gong, Y., Xiang, Y.: Machine Learning-Based Vehicle Intention Trajectory Recognition and Prediction for Autonomous Driving. arXiv preprint arXiv:2402.16036. (2024)
8.
Sun, W., Pan, L., Xu, J., Wan, W., Wang, Y.: Automatic Driving Lane Change Safety Prediction Model Based on LSTM. arXiv preprint arXiv:2403.06993. (2024)
A
9.
Zhang, Y., Carballo, A., Yang, H., Takeda, K.: Perception and Sensing for Autonomous Vehicles Under Adverse Weather Conditions: A Survey. arXiv preprint arXiv:2112.08936. (2021)
10.
Wang, Y., Chan, C.Y.: Formulation of a Dynamic Lane-Changing Model Based on Game Theory. IEEE Trans. Intell. Transp. Syst. 18(3), 626–636 (2017)
11.
Deo, N., Trivedi, M.M.: Convolutional Social Pooling for Vehicle Trajectory Prediction. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1468–1476. (2018)
12.
Altché, F., de La Fortelle, A.: An LSTM Network for Highway Trajectory Prediction. IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 353–359. (2017)
13.
Kim, J., Ghosh, B.K.: Lane Changing Intent Prediction Using LSTM Network. IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2016)
14.
Houenou, A., Bonnifait, P., Cherfaoui, V., Yao, W.: Vehicle Trajectory Prediction Based on Motion Model and Maneuver Recognition. IEEE/RSJ International Conference on Intelligent Robots and Systems, 4363–4369. (2013)
15.
Lefèvre, S., Vasquez, D., Laugier, C.: A Survey on Motion Prediction and Risk Assessment for Intelligent Vehicles. Robomech J. 1(1), 1 (2014)
A
16.
Schulz, W., Stiefelhagen, R.: Probabilistic Driver Intention Recognition and Trajectory Prediction Based on a Vehicular Sensor Network. IEEE International Conference on Vehicular Electronics and Safety (ICVES), 208–213. (2015)
A
17.
Zyner, A., Worrall, S., Nebot, E.: Naturalistic Driver Intention and Path Prediction Using Recurrent Neural Networks. IEEE Trans. Intell. Transp. Syst. 20(9), 3470–3480 (2018)
18.
Mozaffari, A., Alizadeh, M., Kazemi, R.: A Novel Driver Behavior Recognition Model Based on LSTM Recurrent Neural Networks. IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2017)
A
19.
Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models. Proceedings of the IEEE International Conference on Computer Vision, 3182–3190. (2015)
A
20.
Park, S., Kim, H.: Driver Intention Prediction Based on LSTM Using Vehicle CAN Data. IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), 206–212. (2018)
A
21.
Ding, Z., Wang, Y., Li, Z.: A Lane Change Prediction Method Based on Bidirectional LSTM. IEEE International Conference on Intelligent Transportation Systems (ITSC), 161–166. (2019)
A
22.
Zhao, Y., Sun, J.: A Lane Change Prediction Method Based on Hidden Markov Model. IEEE International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2018)
A
23.
Chandra, R., Bhattacharya, U., Bera, A., Manocha, D.: Traphic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8483–8492. (2019)
A
24.
Hou, Y., Qin, L., Chen, Y.: Driver Intention Prediction Based on Deep Learning Frameworks. IEEE Access. 8, 87940–87947 (2020)
25.
Zhao, L., Sun, J.: A Lane Change Prediction Method Based on LSTM. IEEE International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2019)
A
26.
Wang, Y., Chan, C.Y.: A Game-Theoretic Framework for Autonomous Vehicles Merging Control: A Reinforcement Learning Approach. IEEE Trans. Intell. Veh. 3(4), 375–387 (2018)
A
27.
Xu, Y., Li, Z.: A Lane Change Prediction Method Based on GRU. IEEE International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2019)
28.
Shou, Z., Wang, Z., Han, K., Liu, Y., Tiwari, P., Di, X.: Long-Term Prediction of Lane Change Maneuver Through a Multilayer Perceptron. arXiv preprint arXiv:2006.12769. (2020)
A
29.
Zhang, Y., Zou, Y., Tang, J., Liang, J.: A Lane-Changing Prediction Method Based on Temporal Convolution Network. (2020). arXiv preprint arXiv:2011.01224.
A
30.
Scheel, O., Schwarz, L., Navab, N., Tombari, F.: Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction. arXiv preprint arXiv:1805.06776. (2018)
A
31.
Scheel, O., Nagaraja, N.S., Schwarz, L., Navab, N., Tombari, F.: Attention-based Lane Change Prediction. arXiv preprint arXiv:1903.01246. (2019)
32.
Liu, H., Wu, K., Fu, S., Shi, H., Xu, H.: Predictive Analysis of Vehicular Lane Changes: An Integrated LSTM Approach. Appl. Sci. 13(18), 10157 (2023)
A
33.
Sun, W., Pan, L., Xu, J., Wan, W., Wang, Y.: Automatic Driving Lane Change Safety Prediction Model Based on LSTM. arXiv preprint arXiv:2403.06993. (2024)
A
34.
Prakash, D., Sathiyasekar, K.: An Effective Lane Changing Behaviour Prediction Model Using Optimized CNN and Game Theory. Automatika. 65(3), 982–996 (2024)
A
35.
He, D., Zhao, M., Wang, Z.: Vehicle Driving Intent Recognition Based on Enhanced Bidirectional Long Short-Term Memory Network. J. Artif. Intell. Pract. 6, 20–27 (2023)
A
36.
Zhang, Y., Carballo, A., Yang, H., Takeda, K.: Perception and Sensing for Autonomous Vehicles Under Adverse Weather Conditions: A Survey. arXiv preprint arXiv:2112.08936. (2021)
A
37.
Wang, Y., Chan, C.Y.: Formulation of a Dynamic Lane-Changing Model Based on Game Theory. IEEE Trans. Intell. Transp. Syst. 18(3), 626–636 (2017)
A
38.
Deo, N., Trivedi, M.M.: Convolutional Social Pooling for Vehicle Trajectory Prediction. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1468–1476. (2018)
A
39.
Altché, F., de La Fortelle, A.: An LSTM Network for Highway Trajectory Prediction. IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 353–359. (2017)
A
40.
Kim, J., Ghosh, B.K.: Lane Changing Intent Prediction Using LSTM Network. IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 1–6. (2016)
Authors: ’ Profiles
Lakshmi Narayana
Click here to download actual image
I.Lakshmi Narayana, currently pursuing his PhD in the department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam. He is working as Assistant Professor in the Department of Information Technology, Seshadri Rao Gudlavalleru Engineering College, Gudlavalleru. His research interests include the Application of machine learning algorithms in Autonomous vehicles, Image segmentation and classification, routing protocols for IoT communication.He has published research papers in various international journals and conferences.
Click here to Correct
TMN Vamsi
Dr.T.M.N.Vamsi is working as Associate Professor in the Department of Computer Science and Engineering at GITAM Deemed to be University, Visakhapatnam, Andhra Pradesh. He received his PhD in Computer Science and Engineering from JNTUH, Hyderabad in the year 2016. He is having 25 years of teaching, research, and administrative experience in various technical higher education Institutions. His research interests are in the development of protocols for the Internet of Things and vehicular networks, Soft Computing and Bioinformatics. He authored 28 research articles in various reputed international, national journals and conferences. He is a member of IEEE, CSI, and IEI.
Total words in MS: 5789
Total words in Title: 12
Total words in Abstract: 241
Total Keyword count: 5
Total Images in MS: 6
Total Tables in MS: 2
Total Reference count: 41