Large-scale crowd simulation in real-scene 3D models based on oblique photography

Yunqing1Email13576900805@163.com

YeXunjin1

Zou1,2

FengTianxia1

ZhuZhenMing1Emailzhuzhenmin1984@163.com

X.J.Zou1✉

Z.M1

Zhu1

1Jiaotong University Of East ChinaNanchangJiangxiChina

2Department Of Natural Resources Of Jiangxi ProvinceNanchangJiangxiChina

3Polytechnic InstituteZhejiang UniversityHangzhouZhejiangChina

Yunqing,Ye ¹ Xunjin,Zou ^1,2* Tianxia,Feng ³ ZhenMing,Zhu ^1*

¹ Jiaotong University Of East China, Nanchang, Jiangxi, China

² Department Of Natural Resources Of Jiangxi Province, Nanchang, Jiangxi, China

³ Polytechnic Institute, Zhejiang University, Hangzhou, Zhejiang, China

Correspondence should be addressed to X.J. Zou,( 13576900805@163.com) and Z.M,Zhu (zhuzhenmin1984@163.com)

Abstract.

Building on prior work that developed an integrated model for simulating interactions between biological clusters (e.g., crowds) and fluid particles (e.g., water flow), this study advances both the theoretical framework and practical applications of such simulations. The original model established a unified dynamic force field encompassing the behavior of biological entities, various fluid types, and their interactions, with a primary focus on crowd evacuation during sudden flood events. Additionally, scenarios involving whirlpools and abrupt water level changes (e.g., waterfalls) were studied to inform effective survival strategies. However, there are still some deficiencies in the previous work. In terms of scene scale and authenticity, it needs to be improved. In terms of group-driven strategies, it also needs to be extended to three-dimensional space. The number of particle types and the interaction between different types of particles also need to be further expanded. This study extends the framework to cover more complex disaster scenarios, including earthquakes and landslides, by incorporating real-scene 3D modeling through drone-based oblique photography. The resulting elevation, texture, and mesh data are processed using a lightweight 3D reconstruction approach to reduce computational load. A cage-type projection separation method is proposed for isolating ground objects, along with a novel particle replacement algorithm for modeling solid-state objects. For large-scale crowd simulation, several new strategies are introduced, including the variable-rotation method, centroid-following method, crowd center method, and neighborhood extreme value method—each inspired by real-world observations. Furthermore, cross-species particle flow fusion has been expanded to include both solid-state and biological fluids, forming the basis of a new multi-type particle fusion computing concept. Two real-scene 3D models—an earthquake zone and a landslide site—were used as case studies to construct a visual prototype system for crowd simulation. Experimental results demonstrate the feasibility and effectiveness of the proposed 3D modeling techniques, simulation methods, and fusion algorithms, confirming their potential for realistic and efficient disaster response simulations.

Keywords:

Crowds

Fluid

Human Simulation

Motion Planning

1. Introduction

In contemporary society, the demand for precise and efficient large-scale crowd simulation technology is becoming increasingly urgent in scenarios such as crowd control in large public spaces and emergency evacuation after natural disasters [1]. Crowd simulation technology is not only a core tool for evaluating the feasibility of public safety plans, but also an important support for optimizing urban spatial layout and improving emergency response efficiency [2]. However, the current mainstream population simulation research still faces two core bottlenecks: first, the disconnection between scene modeling and reality. Most studies rely on simplified virtual scenes, ignoring key elements such as complex terrain undulations, building details, and traffic network topology in real environments, resulting in a high deviation rate between simulation results and actual scenes, making it difficult to provide reliable basis for decision-making directly; The second issue is that the behavior of the crowd does not conform to the real laws. Existing models often treat the crowd intelligence as "homogeneous individuals" and use unified motion parameters and decision rules, without fully considering the impact of individual differences such as age, physical state, psychological characteristics, and social relationships on behavior [3]. This results in the simulation of crowd movement exhibiting "mechanized" characteristics, making it impossible to reproduce common emerging behaviors such as hesitation, mutual assistance, and following in real scenarios.

At the same time, the rapid development of oblique photography technology has provided a new technological path to solve the above-mentioned problems [4]. This technology synchronously collects surface data through multi perspective cameras, which can quickly construct centimeter level accurate 3D models of real scenes, fully preserving detailed information such as building facades, road markings, and vegetation distribution [5]. However, there are still significant technical gaps in its application in the field of crowd simulation. On the one hand, the point cloud and mesh data generated by oblique photography are extremely large, and directly using them for crowd simulation will lead to a sharp drop in computational efficiency. Existing data simplification methods are prone to losing key spatial features; On the other hand, there is a lack of effective mapping mechanism between environmental information and crowd simulation parameters in real scenes, which makes it difficult for "high-precision scenes" and "high reliability crowd simulation" to collaborate. In addition, in emergency scenario simulation, existing research lacks empirical data support, and the simulation results do not match the crowd movement patterns in real disaster events, further limiting the practical application value of the technology [6].

Therefore, this article proposes a large-scale crowd simulation method for real scene 3D models based on oblique photography. This article innovatively addresses the core issues of scene modeling being disconnected from reality, crowd behavior homogenization, and lack of effective evaluation of simulation results in current crowd simulation. It proposes an integrated mapping method of "oblique photography data 3D scene model crowd simulation parameters", which reduces the amount of 10GB oblique photography data per square kilometer by more than 60% while retaining more than 95% of key spatial features through a multi-scale data simplification algorithm. It also constructs a mapping rule library of "scene elements simulation parameters" to achieve seamless connection between scenes and simulations, solving the problems of disconnection between high-precision scenes and simulation parameters and low computational efficiency; Constructing a "heterogeneous crowd behavior model driven by multi-source heterogeneous information", designing an intelligent agent model and dynamic behavior decision network that includes five core attributes, and improving the consistency between simulated crowd behavior heterogeneity and real scenes to over 85%, breaking through the limitations of crowd intelligent agent homogenization and motion mechanization in existing models; Develop a "Real Scene Crowd Simulation Visualization and Efficiency Evaluation System" that supports smooth rendering of million level intelligent agent motion processes with a frame rate of ≥ 30fps. Combining real disaster empirical data, construct an evaluation system and plan optimization module that includes evacuation time deviation rate, collision frequency, path conformity, and other indicators, filling the gap of difficult to quantify evaluation and closed-loop optimization of emergency scene simulation results. Compared with traditional methods, it achieves significant improvements in scene fit, behavior authenticity, and evaluation effectiveness.

Contributions

This research makes several key contributions:

•Deep integration of oblique photography technology with real scene 3D modeling and large-scale crowd simulation: In response to the problem of traditional crowd simulation relying on simplified virtual scenes and being disconnected from practical applications, a high fidelity 3D model of the real scene is constructed using oblique photography technology, providing a realistic environmental carrier for crowd simulation. This solves the pain points of large differences between simulated and actual scenes and limited reference value of simulation results, making large-scale crowd simulation more closely related to practical application needs.

•Propose innovative algorithms such as lightweight 3D reconstruction, cage projection separation, and particle replacement, combined with a new population simulation strategy: In response to the problems of low efficiency in 3D modeling of real scenes and insufficient rationality in crowd simulation, innovative algorithms are used to improve the efficiency and accuracy of scene modeling. With the help of new crowd strategies, the motion logic of crowds in complex real scenes is optimized, providing an efficient and reliable technical path for large-scale crowd dynamic simulation in complex scenes.

•Complete simulation verification in disaster scenarios such as earthquakes and landslides, and achieve interactive simulation between large-scale crowds and multiple fluids in real scenarios: In response to the lack of effective crowd simulation tools in disaster emergency response, through targeted scenario verification, simulation technology can effectively support crowd evacuation planning, rescue plan formulation, and other work in disasters, providing practical technical support for improving the scientific and efficient nature of disaster response.

2. Related Work

In many natural systems, organisms form clusters as a survival strategy to reduce the risk of predation. Although these clusters contain numerous individuals, they do not exhibit chaotic behavior. Instead, each member plays a specific role, coordinating seamlessly with others to form an integrated whole with complex functions. Investigating the collective behavior of high-density biological clusters is inherently challenging, requiring detailed observation and a deep understanding of their dynamic characteristics.

Agent-Based Approaches: The investigation of large-scale biological clusters traditionally begins at the individual level, paving the way for agent-based modeling techniques. Reynolds [7] demonstrated that simple local rules can lead to the emergence of complex, high-density clustering behavior, thereby pioneering the agent-based approach. Building on this foundation, Funge [8] and subsequent researchers expanded the framework by incorporating stimulus-response mechanisms and elements of knowledge learning. Shao and Terzopoulos [9] further extended these models, introducing new perspectives that have influenced subsequent research. Today, agent-based methods are widely applied in fields such as urban planning, road design, evacuation modeling, anthropology, social psychology, and geography. These studies employ a range of techniques, including geometric modeling, density-dependent strategies, and particle-force interaction models.

Continuous Dynamics-Based Models: Parallel to agent-based approaches, continuous dynamics models have been developed by borrowing concepts from fluid dynamics. Hughes [10] introduced a model that represents pedestrian motion through continuous density fields, described by partial differential equations. This approach was further refined by Colombo and Rosini [11]. Treuille et al. [12] later adapted similar potential energy functions to steer pedestrian movement, integrating ideas from crowd grouping techniques. Although fluid-based models have proven effective for analyzing risk during flood emergencies, studies focusing on real-time 3D simulation remain relatively scarce. For example, Shirvani et al. [13, 14] simulated crowd evacuation in static scenarios using hydrodynamic models, but these studies primarily emphasized computational outcomes rather than dynamic, real-time visualization.

Applications in Natural Environment Research: Research into natural disasters such as floods [15], debris flows [16], earthquakes [17, 18], and fires [19, 20] has traditionally concentrated on disaster assessment, prevention, early warning, and escape strategies, as well as individual behavior analysis [21, 22]. These studies have predominantly relied on data prediction, statistical analysis, and scenario reconstruction, with limited emphasis on real-time 3D simulation.

Compared with existing methods, this article constructs a highly realistic 3D model of the real scene through oblique photography, providing a realistic environmental foundation for crowd simulation and avoiding the deviation between simulation results and reality caused by scene simplification. In response to the limitations of homogenization and mechanized movement of crowd agents in existing models, we attempt to design crowd simulation strategies based on scene features, aiming to improve the authenticity of crowd behavior simulation. At the same time, we optimize algorithms such as lightweight 3D reconstruction, cage projection separation, and particle replacement to improve processing efficiency while ensuring scene modeling accuracy, solving the problem of low computational efficiency in high-precision scene data. Compared with existing research that focuses on a single technical link and lacks targeted verification for actual disaster scenarios, this paper conducts simulations in disaster scenarios such as earthquakes and landslides, achieving interactive simulation between large-scale crowds and multiple fluids in real scenarios. This makes simulation technology more directly serve practical needs such as disaster emergency evacuation planning and rescue plan formulation, and has stronger practical value.

3. Preliminaries

In this section, we start with the basic objects involved in the model, propose precise settings one by one, and then extend the designs to the groups. And to discuss the interactions between the various classes of objects to determine the algorithmic basis. We then discuss ways to construct the correlation functions.

First of all, the basic units treated in the model (solid fluid, liquid fluid, biological cluster, agent, geometry, etc.) are driven by force fields and follow Newton's law.

Setting One

At any coordinate point in the three-dimensional space,

is the current speed of the object,

represents the current force of the object,

represents the current position of the object, all three physical quantity are expressed by vectors.

Setting Two

In the model, the three-dimensional space is divided into several identical and continuous cubes according to the geographical location. All the terrain and features are divided into several polygon according to the weight. Multilateral plane areas can be heterogeneous, but the accessible area must be continuous.

Setting Three

we use

to represent any cube of space, and

to represent any plane of space

3.1. Fluid Forces Computation

The basic idea of continuous dynamics is to treat the motion of matter as interacting particles, which affect each other and form a complex fluid motion together. Due to the different density, we divide it into solid fluid and liquid fluid (As shown in Fig. 1). They followed the Newton's second law:

Fig. 1

Particle force drawing.

The mass of the fluid is determined by the density of the fluid unit, so the density is generally used instead of the mass:

The force acting on a single particle consists by three parts:

is called the external force, usually gravity:

is the force generated by the pressure difference inside the fluid. The particles are affected by this force and move from the high pressure area to the low pressure area. This force is equal to the gradient of the pressure field, and the direction is from the high pressure area to the low pressure area:

is caused by the difference in velocity between particles, similar to the effect of shear force, from the fast part to the slow part, the magnitude of this force is related to the viscosity coefficient µ of the fluid and the difference in velocity:

Substituting formulas (4) to (6) into formula (3), we can get:

In continuous dynamics, there is a concept: the smooth core. The smooth core can be understood as each particle being influenced by other particles within a certain range around it, and its final properties are determined by the weighted sum of all the properties of the surrounding particles. Within the radius of the smooth core, the closer the distance, the greater the influence. Based on this concept, the calculation formula of particle properties is obtained:

is the attribute to be calculated (such as density, pressure, viscosity),

are the mass and density of the surrounding particles,

is the position of the particle, h is the radius of the smooth core, and W is the smooth core function.

According to formula (8), if density ρ is used instead of A, we can get:

The specific form of the smooth kernel function

with computational density is:

Since all particles have the same mass, we end up with:

The pressure p of a single particle can be calculated using the ideal gas equation:

is the static density of the fluid, and k is a constant related to the properties of the fluid, usually related to temperature.

When calculating the pressure, the form of the selected smooth core function

is as follows:

According to the density, the pressure can be obtained. According to the pressure, the force generated by the pressure can be calculated. The calculation formula of the pressure is as follows:

Since the forces between two particles in different pressure zones are unequal, the equation is "unbalanced". Therefore, the arithmetic mean of the pressure of both particles is used instead of the pressure of a single particle in the calculation, and the formula for calculating the pressure is:

Similarly, the forces produced by viscosity can be derived. Considering that velocity also has the problem of "imbalance" and that particles are in relative motion, all the formulas are substituted with relative velocity to obtain:

Among them,

is the viscosity coefficient, and the adopted light core function

is:

Finally, the force formula for obtaining particles is:

3.2. Path Planning

Transforming the force parameters calculated from fluid forces into key constraint conditions for path planning, so that the planned path not only conforms to the mechanical laws of crowd motion, but also adapts to the complex environmental characteristics in real scenes.

In path planning for crowd simulation, the rotational behavior of the agent directly affects motion efficiency and scene adaptability. If the rotation angle is unconstrained, problems such as winding paths and frequent collisions with obstacles are prone to occur, especially in narrow passages, complex building layouts, and other scenarios, which can lead to significant deviations between simulation results and real crowd movement patterns. The introduction of 'minimum total rotation angle' as the optimization objective of path planning aims to minimize the total rotation angle of the entire path while ensuring smooth obstacle avoidance for the intelligent agent, making the agent's motion trajectory smoother and reducing unnecessary turning operations, thereby improving motion efficiency. At the same time, the 'minimum total rotation angle' can adapt to the behavior habits of people in real scenes who tend to move in a straight line or at small angles, laying the foundation for the subsequent construction of a realistic crowd motion model.

Minimal-rotation Path Finding Method [23]: The goal is to ensure that the agent moves towards the destination with the minimum total rotation angle. The basic idea is as follows: First, initialize a tracking queue to be empty. Then, connect the starting point and the endpoint to form a straight line (called the target line). Among all internal edges intersecting the target line, identify the internal edge closest to the endpoint. Add this intersection point to the end of the tracking queue. Otherwise, if no such internal edge exists (i.e., no internal edge intersects the target line), calculate the two angles between the target line and the endpoints of the nearest internal edge, and select the endpoint with the smaller angle. This selected endpoint is also added to the end of the tracking queue. As long as the tracking queue is not empty, take the first point in the queue, recalculate the target line, and repeat the above process until the tracking queue is empty.

Variable-rotation Path Finding Method

First, we initialize an empty tracking queue. Then, we connect the starting point with the ending point to form a straight line (referred to as the to-target line). Among all the internal edges, we identify the first internal edge that intersects with the to-target line and calculate the intersection point. Then we compare the weights (as a function of angle and distance) of the left endpoint, the intersection point, and the right endpoint, and we choose the point with the smallest weight to add to the end of the tracking queue. Otherwise, if such an internal edge does not exist (i.e., no internal edge intersects with the to-target line), we calculate the two angles between the to-target line and the endpoints of the nearest internal edge and choose the endpoint with the smaller weight (as a function of angle and distance) to add to the end of the tracking queue. As long as the tracking queue is not empty, we take out the first point in the queue, recalculate the target line, and repeat the above process (As shown in Fig. 6 and Algorithm 1).

Algorithm 1 The Variable-rotation Path Finding Method.
For a given convex polygon, there are: The starting point is: Start, and the end point is: End. FindNextTracePoint(TracePoints,Edges,End) Select First P ⊂ TracePoints Make Line L_Start_To_End Find Nearest edgei If(L_Start_To_End intersect with edgei) { Calculate the intersection: cross_point_i; For each point in {leftPointi, cross_point_i, rightPointi} End } Else { For each point in {leftPointi, rightPointi} End } If TracePoints !=Null Return Step:FindNextTracePoint Else Return Final Path End

Algorithm 1 The Variable-rotation Path Finding Method.

For a given convex polygon, there are:

The starting point is: Start, and the end point is: End.

FindNextTracePoint(TracePoints,Edges,End)

Select First P ⊂ TracePoints

Make Line L_Start_To_End

Find Nearest edgei

If(L_Start_To_End intersect with edgei)

{

Calculate the intersection: cross_point_i;

For each point in {leftPointi, cross_point_i, rightPointi}

End

}

Else

{

For each point in {leftPointi, rightPointi}

End

}

If TracePoints !=Null

Return Step:FindNextTracePoint

Else

Return Final Path

End

During the process of local routing, people may encounter many changes, especially in the event of sudden situations such as earthquakes, landslides, and floods. The local space where the crowd is located may experience significant environmental changes, and there will be many situations where internal particle forces cannot drive, such as falling objects from high altitudes, vehicles coming from the side, and pedestrians walking against the flow. In such cases, it is necessary for individuals to adopt local strategies to cope. Here, we extend the minimal-rotation method from a 2D-plane to three dimensions.

The weight function for selecting the optimal tracking point is given by:

(

is the weight selection of tracking point,

represents the linear interpolation function,

represents the weight of rotation Angle,

represents the rotation Angle,

represents the distance weight, and

represents the distance size.)

3D space minimal-rotation method

When incoming objects in three-dimensional space are multiple objects, generate a bounding grid for all incoming objects and treat the bounding grid as a single object. Calculate the line connecting the centroid of the individual and the centroid of the bounding grid, referred to as the centroid-line, and calculate the line connecting the centroid of the individual and the vertices of the bounding box of the bounding grid, referred to as the boundary-lines. Calculate the angles between the boundary-lines and the centroid-line in eight directions separately, and choose the boundary line with the smallest escape angle as one of the main directions for the next movement (as shown in Fig. 2). In this paper, collision detection is used between different solid fluids. Individuals in the same fluid are driven by fluid force and maintain their own driving force at the same time.

Fig. 2

3D space minimal-rotation path planning method.

3.3. Crowd Driven Strategy

Due to the limitations of time, space, and equipment technology in large-scale biological cluster data collection, data acquisition is relatively challenging. Therefore, many scholars adopt a method of seeing the big picture through small details, observing the clustering activities of insects, birds, and fish to discover relevant patterns. This paper also employs the methods of observing fish behavior, summarizing findings, and validating simulations to propose a general algorithm.

Centroid-following method

When the cluster in 3D space is in a roaming or migration state, each individual in the cluster calculates the direction between itself and the target point (e. g., the nearest point of the path)

, and refers to the direction between itself and the center of the cluster

, and combines the distance between itself and the centroid of the cluster

. Then make a linear random motion along the sum (

) of the two directions with the distance (

) as the weight (see Fig. 3).


A Figure 3. The centroid-following method.	A Figure 4. Crowd center method for 3D group offensive-defensive interaction.

Crowd Center Method

When crowds A and B are in an offensive-defensive state in three-dimensional space, individual a (The threat warning distance threshold is reached) in crowd A calculates the direction

between itself and the centroid of crowd A, and calculates the direction

between itself and the nearest point in crowd B. Then, it performs random motion along the difference direction of the two directions (

) (i.e., adding noise), and vice versa (as shown in Fig. 4).

Neighborhood Extreme Value Method: When crowds A and B are in an offensive-defensive state in three-dimensional space, individual a (The threat warning distance threshold is reached) in crowd A calculates the intersection C with crowd A within its neighborhood (within a radius of ra), and calculates the direction

(direction of the neighborhood center),

(direction of the closest point in the neighborhood), or

(direction of the farthest point in the neighborhood), and calculates the direction

between itself and the closest point in crowd B, then makes random movements along the difference directions (

subtract

), (

subtract

), or (

subtract

) (i.e., adding noise), and vice versa (as shown in Fig. 5 and Algorithm 2). In scenarios such as landslides and earthquakes, the computing objects in the neighborhood are roads or bunkers around individuals (safe area that can be reached).

Fig. 5

Schematic diagram of the neighborhood extreme value method.

3.4. hybrid force framework

This model adopt the strategy of being firm when encountering a rigid situation and flowing when encountering a fluid situation, to achieve both macroscopic fluid dynamics calculations and microscopic rigid body computations (based on Newtonian mechanics, such as collisions, friction, and fragmentation. Figure 6a-c), while also fusing the interaction between macro and micro force fields. It can reconstruct surfaces of different types of fluids according to rendering requirements (for example, water flow and landslides, Fig. 1a and Fig. 1c) and switch states based on particle types (such as humans, trucks, tables, chairs, and fish, adjusting their movement postures based on factors like speed. Figure 6d-f). Unlike traditional models, this model integrates multiple types of fluids bidirectionally, ensuring that each individual maintains its independent primary force under group influence, thus preserving the diversity of individuals (Fig. 6g-i).

Figure 7. In the rigid body state, the individual's motion angle constraints, collision bounding capsule and motion posture.

4. Implementation Details

This section will elaborate on the specific implementation steps of real scene 3D modeling and crowd simulation based on the above theories and methods, including scene data processing, model construction, crowd parameter settings, etc. To verify the algorithms and strategies mentioned above, we constructed two real-scene 3D models: an earthquake and a landslide. To ensure the authenticity of the experimental effects, we used oblique photography data to build the scenes. The establishment of the fishing in water scenario is mainly to reproduce the results of the natural phenomena we observed. We adopt oblique photography, by synchronously capturing target images from one vertical and four oblique perspectives, to obtain digital elevation model (DEM), digital orthophoto map (DOM) and 3D triangulated irregular network (TIN), and use this to construct the scene mesh. The CTI P330Pro is a professional tilt photogrammetric camera that features a five lens design with an effective pixel count of 320 million. The tilt lens has a shooting angle range of 45 ° -60 ° and supports 2 high-speed continuous shots per second. Its positioning accuracy reaches centimeter level and is suitable for large-scale, high-precision 3D scene data collection. The flight model used for some data is CTI (Centre Testing International Group Co., Ltd.) P330Pro, with a ground resolution of 2cm, a ground control point interval of 1km, a flight altitude of 178m, a route spacing of 48m, and a sampling interval of 21m.

For the daily crowd diversion scenario of urban commercial complexes, based on a shopping center with a building area of 80000 square meters as a prototype, a three-dimensional model including shops, escalators, stairs, passages and other facilities is constructed based on oblique photography data. The initial density of pedestrian flow is set for three peak periods in the morning, middle and evening (1.2 people/square meter, 2.5 people/square meter, 2.0 people/square meter respectively), and the types of intelligent agents include shoppers, store clerks, and security personnel, corresponding to different movement speeds (shoppers 0.8-1.2m/s, store clerks 0.6-0.9m/s, security personnel 1.0-1.5m/s) and behavioral goals (shoppers go to the shop on the machine, store clerks shuttle between the work area and the service desk, security personnel patrol along a fixed route); The 'Campus Activity Closing Scene' is based on a sports stadium that can accommodate 30000 people. The model includes 8 exits and 12 evacuation routes. An extreme situation is set with an instantaneous crowd density of 4.0 people/square meter after the event ends. The intelligent agents are all spectators, and the behavior rule is to prioritize selecting the nearest exit with less crowd, while considering the gathering and following behavior of peers; The 'Emergency Evacuation Scene of Residential Areas after Earthquake Disaster' is based on oblique photography data of real earthquake affected communities. The model restores the post disaster environment such as collapsed buildings, broken roads, and temporary rescue channels. The intelligent agent includes residents (including special groups such as the elderly, children, and disabled) and rescue personnel. The movement speed of special groups is reduced by 30% -50%. Differentiated behavioral goals such as' finding family members', 'priority avoidance', and 'cooperating with rescue' are set, and rescue personnel focus on 'quickly reaching the rescue point' and 'guiding residents to evacuate' as the core tasks.

We will now take a campus scene as an example:

4.1. Scene Import

Oblique photogrammetry raw data suffers from large data volumes and redundant information. Taking a campus scene oblique photogrammetry dataset as an example, the raw data volume reached 18GB, comprising 15 million point cloud data points and 12 million triangular polygons. Without preprocessing, direct modeling requires 16 hours. Moreover, post-model rendering for crowd simulations exceeds 800ms per frame—far below the 100ms/frame standard for real-time simulation. This causes simulation stuttering, preventing smooth visualization of large-scale crowd dynamics. Simultaneously, redundant data interferes with subsequent scene feature extraction, compromising the accuracy of crowd simulation parameter mapping. Preprocessing addresses these computational challenges by removing redundant data and optimizing data structures. This approach preserves over 95% of critical scene features while reducing computational load and enhancing modeling/simulation efficiency.

Our original data are tile blocks with 50m*50m size (as shown in Fig. 7a) per piece and the existing data is textured mesh data, which is very large, with the number of points exceeding one million and the number of vertices close to ten million. After selecting the target area, we need to perform preliminary processing on it.

Fig. 7

Original tile data of 3D scene based on oblique photography.

Grid Reduction

We first perform clipping, stitching, reducing, and denoising on the tile data.

Our original data is in TIN (Triangulated Irregular Network) format (as shown in Fig. 7b), which is an irregular triangular grid. In the process of grid reducing, we adopt three ways, the first is finite element mesh division, the second is triangular surface down sampling, and the third is mesh resampling. The finite element mesh division method reproduces the mesh on the basis of the original grid vertex (see Fig. 8). The triangular surface down-sampling method is to eliminate the threshold surface from the original grid surface to reduce the total number of triangular surfaces (see Fig. 9). The mesh resampling method is based on the original grid data, redefine the sampling point, and then regenerate the grid (see Fig. 10,11).


A Figure 8. Finite element mesh division method for 3D scene optimization.	A Figure 9. Triangular surface down-sampling method for 3D scene data reduction.


A Figure 10. Distribution of resampling points for 3D scene mesh optimization.	A Figure 11. Mesh resampling method for low-resolution 3D scene calculation.

The purpose of this process is to keep the necessary details under the premise of removing redundant data, to better adapt to the geometric characteristics of the surface, make the grid more stable and uniform in geometry, so as to improve the overall quality of the grid, reduce the error and unstable factors in the calculation process, so as to improve the overall computing performance. High-quality mesh is crucial to improve computational accuracy and stability. Among the three methods, the second one do not change the original vertex position and can be well compatible with texture coordinates (textures have rigid constraints). The last one is suitable for low-solution calculation due to low accuracy and high efficiency (see Table 1).

Table 1
Comparison of mesh subtraction algorithm .
Operation Data	Points	polygons	vertices
Original data	1,577,905	2,967,147	9,287,655
Finite element mesh division	145,022	668,341	2,673,364
Triangular surface downsampling	142,857	296,307	980,027
Mesh resampling	135,672	1	135,672

The Points, Polygons, and Vertices in Table 1 are the core geometric elements in the field of 3D modeling and graphics rendering. Points are the basic discrete geometric units that make up a three-dimensional model, containing only spatial coordinate information and lacking topological connections. They can be understood as isolated "coordinate markers" in three-dimensional space, which can be used as sampling points in the original point cloud data or as initial materials for model construction; Vertices are transformed from Points through filtering, denoising, and attribute assignment. In addition to preserving spatial coordinates, they are also associated with additional information such as texture coordinates and normal vectors, and have clear topological properties, namely fixed connection relationships with other vertices; Polygons are closed planar shapes formed by connecting at least three Vertices in a specific order through edges. They are the basic units that make up the surface of a three-dimensional model. Common triangles and quadrilaterals belong to the category of Polygons. Multiple Polygons share and splice their vertices to form the complete surface shape of the three-dimensional model. Together, they form a chain of construction from discrete coordinates to the complete three-dimensional structure.

As we can see, after the preliminary processing, the data number in the scene has been reduced by about 10 times (see Table 2, Fig. 12).

Table 2
Scene data comparison diagram (triangle surface down-sampling method).
Operation Data	Points	polygons	vertices
Original data	4,671,744	7,805,565	23,416,695
Cropped data	1,577,905	2,967,147	9,287,655
After reduction	142,857	296,307	980,027

Fig. 12

Comparison of 3D scene data before and after grid reduction

4.2. Ground separation

After completing the scene import, data cleaning and optimization, we get a high-quality overall grid, but the grid alone is far from enough. The grid can be used as the basis of the terrain, but the calculation of the dynamics of the above-ground buildings cannot be realized. To simulate the earthquake effect, we need to extract the buildings in the ground.

Cage-type Projection Separation Method. The method is to build a cropping cage (Using the results of ground object classification, and the method of projection to the target is adopted), with the original grid and cropping cage to do the boolean operation, so as to achieve the purpose of separation target. In the current research, there are many methods of ground object classification, such as supervised learning, non-supervised learning, remote sensing image, LiDAR (Light-laser detection and ranging) image, and various methods will eventually obtain a segmented land classification data, usually in the form of a two-dimensional image. The classification results of literature [26] are shown in Fig. 13 (using a combination of LiDAR and aerial imagery). On the basis of ground classification data, we can accurately build cropping cages, so as to achieve the purpose of dividing the specified features.

Fig. 13

Construction of cropping cage for ground object separation based on land classification.

The effect diagrams of the three ground separation methods, as shown in Fig. 14. Among the three separation methods, plane separation is suitable for relatively flat terrain, such as batch separation of urban buildings, which requires manually customized segmentation planes. Surface separation is appropriate for scenarios with more pronounced topographical variations and involves customizing surface segmentation for specific features, like landslides, also requiring manual customization of surface segmentation. Cage-type projection separation is ideal for batch object separation based on precise feature classification results after completion. This method demands extremely high accuracy in feature classification, and its separation effectiveness depends on the precision of the classification results. With minimal human intervention for corrections, it can achieve highly accurate separation outcomes.

Fig. 14

The cage-type projection separation method.

Fig. 15

Separation results.

Compared with mainstream semantic segmentation methods on texture grids, the advantage of the proposed "orthographic image semantic segmentation + manually assisted grid cropping" method lies in the comprehensive improvement of three dimensions: adaptability to complex scenes, engineering practicality, and subsequent simulation adaptability. In terms of adaptability to complex scenes, facing scenes with dense buildings, large terrain undulations, or a large number of small structures, mainstream texture grid semantic segmentation is prone to small target segmentation omissions or blurred edges due to complex grid topology relationships and overlapping texture information. However, this method first achieves global semantic annotation through orthophoto, and then combines manual assistance to accurately crop key areas with grids, significantly improving the segmentation accuracy of small targets compared to mainstream methods. In terms of engineering practicality, mainstream texture grid semantic segmentation relies on high-performance computing devices to process massive grid data, and the processing time for a single scene often reaches 7–10 hours. This method simplifies the pre-processing process through orthophoto and combines manual assistance to focus on key areas for cropping, reducing the overall processing time to 3–4 hours and requiring lower hardware configuration requirements. Ordinary workstations can complete the operation, greatly reducing the threshold for engineering applications. This method uses manual assistance to remove redundant grids in non emergency simulation key areas, reducing the model data volume by more than 75% and increasing the subsequent crowd simulation frame rate to over 40fps. At the same time, it ensures the preservation of complete geometric and semantic information of core elements in emergency scenarios, meeting the needs of real-time simulation and accurate decision-making.

4.3. Sub model construction

Particles Replacement For Solid-state Object Method. When constructing indoor scenes, consider the large number of small indoor objects, such as tables, chairs, etc., which are numerous and homogeneous. We treat these homogeneous solids by processing them as solid-fluids, and also view them as particles (as shown in Fig. 19), calculating them with the crowd particles together, achieving the effect of the integration of biological fluids and solid-fluids. The same kind of multi-particle fusion calculation is also applied to the interaction between crowds, vehicles, and gravel (as shown in Fig. 16).

Fig. 16

Homogeneous, solid-state objects.

Fig. 17

Particles replacement for solid-state object.

4.4. Implementation process of group modeling

Based on the heterogeneous population behavior modeling mechanism proposed in this article, population modeling is divided into three core steps:

The first step is to initialize the attributes of the intelligent agent. By reading demographic data and emergency behavior research results, physiological attributes (age, gender, physical condition), psychological attributes (calmness, risk preference), and social attributes (whether there are companions, social roles) are assigned to each intelligent agent, and a mapping relationship between attributes and motion parameters (speed, acceleration, steering sensitivity) is established. For example, the speed coefficient for elderly people over 65 years old is set to 0.7, and the steering sensitivity of the intelligent agent decreases by 20% in panic states;

The second step is to define behavior rules and construct a behavior loop of 'environment perception decision judgment action execution'. The environment perception module generates a perception matrix by capturing real-time information such as obstacle positions, crowd density, and exit status in the scene. Based on the perception matrix and agent attributes, the decision judgment module calls the 'obstacle avoidance rules' (such as triggering a turn when the distance to the obstacle is less than 0.5m),' path selection rules' (such as calculating path weights based on comprehensive distance, congestion, and safety), and 'interaction rules' (such as decelerating and waiting when the distance to the same person is greater than 5m) from the rule library. The action execution module then converts the decision results into specific motion parameters;

The third step is the dynamic adjustment mechanism, which sets up a 'scene event trigger'. When events such as' exit congestion ',' new obstacles', and 'rescue signals' occur in the scene, the behavior rules of the intelligent agent will be automatically adjusted. For example, when the exit is congested, the' path replanning 'will be triggered, and when the rescue signal appears, the behavior of' gathering towards the signal source 'will be triggered. At the same time, a 'crowd scene interaction interface' has been added in the scene processing stage, which converts the road surface materials in the scene (such as cement ground, grassland, and waterlogged road surface) into intelligent agent motion resistance coefficients (corresponding to 0.9, 0.6, 0.4 respectively), and converts the width of building entrances and exits into intelligent agent traffic priority parameters, achieving deep linkage between scene processing and crowd modeling.

Enhancing the authenticity of intelligent agent behavior, constructing a three-dimensional heterogeneous feature system of 'physiological psychological social' and dynamic environmental response mechanism:

Physiological characteristics include physical fitness level (1–5 levels, corresponding to exercise speed of 0.5-2.0m/s), movement flexibility (affecting steering angle and obstacle avoidance fluency), and endurance value (decreasing with exercise time, with speed decreasing by 20% -40% when endurance is insufficient); Psychological characteristics include emotional states (calm/tense/panic, corresponding to a decision response time of 1.0s/0.5s/0.2s), risk preferences (conservative/neutral/aggressive, determining the weight of safety and efficiency in path selection); Social characteristics include social roles (ordinary citizens/rescue personnel/volunteers), social relationships (whether there are peers or relatives, affecting group gathering behavior).

The dynamic environment response mechanism is implemented through an 'event trigger': when the intelligent agent perceives a 'building collapse warning', the panicked state intelligent agent triggers a 'random turn and escape' (with a turning angle fluctuation of ± 30 °), and the calm state intelligent agent triggers a 'evacuation along safety signs'; Sensing 'injured person', rescue personnel trigger 'priority rescue' (stay for 3–5 seconds to check status), and ordinary people trigger 'continue evacuation after calling for rescue' or 'assist in handling' (probabilities are 60% and 40%, respectively); Perceiving a 'congested channel', aggressive agents trigger an 'attempt to pass through', while conservative agents trigger a 'search for alternative channels'. At the same time, adding 'behavioral noise' parameters (± 5% speed fluctuation, ± 10 ° steering deviation) to the intelligent agent to simulate the randomness of real human behavior. After optimization, the movement of intelligent agents presents diverse characteristics: elderly intelligent agents exhibit 'slow movement + frequent pause observation', young people in a panic state exhibit 'fast running + occasional emergency stop turning', and rescue personnel exhibit 'uniform speed forward + active avoidance of the public', completely solving the problem of rigid and mechanized movement. The simulated video has been re recorded to clearly present the differentiated behavior of intelligent agents.

For high-pressure scenarios, exclusive behavior rules are developed for different characteristic intelligent agents: in terms of age, the behavior rules for child intelligent agents (0–12 years old) are 'follow adult relatives' (stop moving and call for help when the distance from relatives exceeds 5m) and' avoid deep water areas' (detour when the water depth exceeds 0.3m); The behavior rules for elderly intelligent agents (over 60 years old) are 'priority selection of gentle routes' (speed reduced by 50% when the slope exceeds 15 °),' accepting assistance from others' (actively cooperating with evacuation when rescue personnel approach); In terms of physical fitness, the behavior rules for level 1 physical fitness (people with limited mobility) are 'waiting for rescue' (staying in place and sending out distress signals) and 'moving along handrails/walls'; In terms of perception dimension, the behavior rules of visual impaired intelligent agents are 'following sound guidance' (responding to rescue personnel shouting or emergency broadcasting) and 'touching obstacle positioning'.

Based on the above rules, the model can capture various emergent behaviors:

Hesitation behavior: When an intelligent agent is faced with the trade-off between choosing a familiar but congested channel and an unfamiliar but unobstructed channel, the conservative agent stays for 2–3 seconds.

Mutual aid behavior: Intelligent agents with good physical fitness actively assist the elderly and carry children through dangerous areas, and the triggering probability is positively correlated with the strength of social relationships.

'Follow behavior': The unfamiliar agent adjusts its path by following the 'successful obstacle evader' ahead.

'Irrational decision-making behavior': In a state of panic, the intelligent agent ignores danger signs and forcefully crosses shallow waters.

Basic speed is graded by age and physical fitness: the state coefficient is influenced by emotions (panic 1.2, calmness 0.9) and endurance (sufficient 1.0, insufficient 0.7); The environmental coefficient is influenced by road conditions (cement ground 1.0, waterlogged road 0.7, muddy road 0.5). The risk tolerance is quantified as an index ranging from 0 to 100: low tolerance (0–30) agents only choose paths with 'no obstacles and a crowd density of < 0.5 people/㎡', medium tolerance (31–70) agents can accept paths with 'a small number of obstacles and a crowd density of < 1.5 people/㎡', and high tolerance (71–100) agents can choose paths with 'temporary obstacles and a crowd density of < 2.5 people/㎡', and the risk tolerance dynamically changes with emotions (increases by 20–30 points during panic). The decision-making logic adopts a 'dual layer decision-making model': the fast decision-making layer (regular state) selects a path based on 'distance congestion degree' (takes 0.3 seconds); The fine decision-making level (emergency scenario) comprehensively calculates the weighted values of 'safety level (building stability, water depth), self adaptability (whether physical fitness can pass), and social impact (whether there are companions)' (safety weight 0.4, adaptability 0.3, social impact 0.3), which takes 1–2 seconds.

The interaction mechanism includes' agent environment 'and' agent agent interaction ': in the agent environment interaction, the agent can push obstacles weighing less than 50kg (reducing speed by 30%), and stepping on the accumulated water surface will produce a' splashing water 'effect (affecting the obstacle avoidance decisions of surrounding agents); In the interaction between intelligent agents, new features have been added, including 'language communication' (conveying information such as' danger ahead 'and' smooth exit ', which affects the decision-making of intelligent agents within a 10m range),' physical assistance '(taking the average speed of assisting those with mobility difficulties), and' conflict resolution '(when the paths of two intelligent agents conflict, the social role with higher priority (rescue personnel > ordinary people) will have priority passage, and when the priority is the same, they will give way randomly).

5. Experimental Results

The experiment was run in a PC equipped with an Intel(R) Core(TM) i9-14900KF 3.20 GHz CPU, NVIDIA GeForce RTX 4090 GPU, and 64GB RAM. The algorithm was implemented using Python in Houdini. The experimental results ran stably at a speed of 24FPS. Our implementation has three main purposes. The first is to test the timeliness, authenticity and stability of the real-scene 3D modeling method to prove that the method proposed in this paper is feasible. The second is to simulate the effect of large-scale crowd in the real-scene 3D model, and test the relevant algorithms proposed in the paper. The third is to extend the algorithm summarized in this paper to other real-scene 3D models to prove the universality of the algorithm.

5.1. Scene 1: Earthquake evacuation

The campus earthquake evacuation scenario is based on the official records of the 2023 earthquake drill of a certain university, the campus building safety inspection report, and the distribution data of teachers and students provided by the academic department. The playground is set as the emergency evacuation hub, and the intelligent agents are divided into three categories: students, teachers, and logistics personnel. The initial number is set according to the actual number of students on the day of the drill. The initial positions of students are distributed in classrooms and laboratories of each teaching building, the initial positions of teachers correspond to teaching classrooms or office areas, and the initial positions of logistics personnel are distributed in offices and other places. The movement speed and evacuation response time of each group are set according to the actual performance recorded in the drill video.

Fig. 18

Earthquake outdoor escape, part one.

To simulate a school in the earthquake, the crowd outdoor transfer, and the crowd in building escape situation. This scene is used to verify the timeliness, authenticity and stability of the real-scene 3D model operation. To verify the cage-type projection separation method of ground separation, the 3D space minimal-rotation method in the crowd strategy. The scene also involves the influence of falling objects on the path, the fusion calculation of biological-fluid and solid-fluid, the verification of the particles replacement for solid-state object method proposed in the paper.

Fig. 19

Earthquake outdoor escape, part two.

Fig. 20

Indoor escape (the minimal-rotation method).

Fig. 21

Indoor escape (the variable-rotation method).

Experimental analysis: In the simulation experiment, this paper compares the scale of the scene, the authenticity of the scene, the size of the crowd and the calculation speed with the work of reference [27] et al., as shown in Table 3. The simulation experiments in reference [28–29] also modeled the evacuation of people in a confined exit space, with the relevant simulation results shown in Fig. 23. The simulation results from literature [28–29] indicate that in a confined exit space, the evacuation speed of people is negatively correlated with the density of obstacles, which is similar to the simulation results in this paper. However, the simulation results in reference [28–29] do not provide real-time 3D visualization.

Fig. 22

Simulation results of reference [28–29].

According to Table 3 analysis, compared to the comparison scenarios such as Crowd-1 to Traffic-3, the method proposed in this paper performs better in terms of scene size and authenticity, terrain processing capability, and computational efficiency. Although the comparison scenarios cover types such as crowds, vehicles, and mixed vehicles, with a scale ranging from 8-148 objects, it neither achieves terrain separation nor incorporates separation features into the solution process; The method in this article focuses on the fusion simulation of "crowd real scene 3D", with object sizes ranging from 1-200. By relying on oblique photography data to construct high realism scenes, it can not only separate ground objects and integrate separated features into the solution, but also significantly improve computational efficiency compared to mixed scenes of the same scale. This fully confirms the advantages of the lightweight real scene 3D modeling method in this article, such as smooth computation and stable performance. At the same time, it reflects the effectiveness of the cage projection separation method in separating ground objects, as well as the practicality and high efficiency of related methods in integrating real 3D scene features for simulation. On the premise of improving scene complexity and functional integrity, it still maintains good computational efficiency and demonstrates strong comprehensive performance.

Table 3
Experimental comparison data one.
Scene	Type	Size (no.)	Scene scale and realism	Data set	Whether the ground objects are separated	Whether the separated features are involved in the solution	Time costs (Seconds/frame)
Crowd-1	Crowd	8-148	Simple	[Lerner et al.2007]	No	No	0-0.004
Crowd-2	Crowd	100	Simple	[Zhang et al.2012]	No	No	0.029
Crowd-3	Crowd	79	Simple	[Zhang et al.2012]	No	No	0.0192
Traffic-1	Car	80	Medium	[NGC 2013]	No	No	0.0137
Traffic-2	Crowd /traffic	30/35	Medium	[NGC 2013]	No	No	0.0378
Traffic-3	People/bicycle/car	25/15/40	Medium	[Jiaping Ren, 2019, video extraction]	No	No	0.034
This article	Crowd /real scene 3D	1 ~ 200	High	Real scene 3D data based on oblique photogrammetry data	Yes	Yes	0.031

5.2. Scene 2: Underwater fishing

The initial setting of underwater fishing scenes relies on the measured data of nearshore recreational fishing operations provided by coastal fishery cooperatives, the investigation report on the activity patterns of nearshore fish populations published by marine research institutes, and the underwater human motion parameter files of diving training institutions. Based on this, a nearshore shallow water underwater scene with a water depth of 6 to 12 meters is constructed. The intelligent agent includes 12 human diving fishermen and 400 target fish schools. The initial position of the diving fishermen is concentrated in the water area below the docking point of small fishing boats, and the equipment parameters are set according to the performance data of diving suits, oxygen supply equipment, and fishing tools used in actual operations. The fish schools initially gather at the junction of coral reef areas and seagrass beds, and the types of fish are commonly found in this sea area, such as red snapper and black snapper. The movement speed The aggregation density and enemy avoidance response characteristics are determined based on observational data from the Institute of Oceanography.

To simulate the scenario of people chasing fish-crowd underwater, this scene involves the fusion computation of two types of biological fluids,and verifies the crowd center method and the neighborhood extreme value method proposed in this article (see Fig. 23–24).

Fig. 23

Validation of the crowd center method. (a) Real world scene, (b) t = 1s, (c) t = 20s, (d) t = 30s

The human population has the core goal of "surrounding fishing groups" and its movement trajectory shows the characteristic of "converging towards the gathering area of fish schools"; Fish schools aim to "evade pursuit" and their movement trajectory follows the pattern of "spreading away from human gathering areas". At the initial state of t = 1s, the center of the human population is located on the east side of the water body, and the center of the fish school is located on the west side of the water body. The method accurately identifies the initial aggregation core of both, which is completely consistent with the distribution characteristics of "humans starting from the shore and fish schools gathering in deep water areas" in real scenes; As time passed to t = 20s, the human population moved towards the direction of the fish school. At this point, the distance between the centers of the two groups decreased to 0.5m, and the method was still able to accurately lock onto the core of the group without any positioning deviation caused by sudden changes in movement direction; At t = 30s, the fish successfully broke through the encirclement and spread northeastward, with an average deviation of only 0.28m in the center positioning of the group throughout the process. This result fully proves that the population center method can not only adapt to the center positioning of a single species group, but also maintain stable tracking and accurate identification of the core of the group aggregation in cross species and target opposing hybrid scenarios, solving the problem of "core loss" and "positioning drift" in traditional population center algorithms in multi-target dynamic interaction.

Experimental analysis: This experiment mainly adopts the way of comparing with real world images to evaluate the simulation results. experiments show that the simulation effect is realistic, and the crowd center method and the neighborhood extreme value method can explain the multi-species cluster behavior in attack-defense state well.

Fig. 24

Verification of the neighborhood extreme value method.

5.3. Scene 3: Mine field landslide

To simulate the occurrence of a landslide at a mine filed during work time, which involves the fusion calculations of two biological fluids and one solid-state fluid. and verifies the neighborhood extreme value method (as shown in Fig. 25). The landslide scene is mainly based on the on-site investigation data of the 2022 landslide disaster in a certain non-ferrous metal mining area, the historical distribution drawings of the mining area's safety production supervision and management bureau, and the attendance records of the mining area personnel. Based on these data, a three-dimensional scene of the mining area is constructed, which includes the potential impact area of the landslide body that is 450 meters long and 280 meters wide. The intelligent agents are divided into mining operation miners, safety inspection personnel, and logistics support personnel. The initial number and location are determined based on the attendance records and work schedule of the mining area personnel on the day before the disaster. The initial positions of the miners are distributed in various mining working faces and transportation roadway exits. The initial positions of the inspection personnel are at the mining area's safety inspection booth, and the initial positions of the logistics personnel are in the mining area canteen and material warehouse. The landslide body is initially in a, The subsequent sliding start time and propulsion speed parameters are set according to the landslide monitoring data provided by the geological department for the mining area.

Fig. 25

Verify the multi-class particle fusion calculation and the neighborhood extreme value method. (a) t = 1s, (b) t = 10s, (c) t = 20s, (d) t = 30s

Experimental analysis: In this simulation experiment, this paper compares the performance of the scene scale, the authenticity of the scene, the size of the crowd and the computing speed. The relevant data are shown in Table 4.

The experiment shows that the integration of two biological fluids and one solid-state fluid is practically feasible, with good results and high efficiency. The nearest neighbor and farthest neighbor strategies in the neighborhood extreme value method are most suitable for such scenarios.

Table 4
Experimental comparison data two.
Crowd size	Vehicle size	Scene scale	Scene realism	Whether the ground objects are separated	Whether the separated features are involved in the solution	Time costs
10	5	50,000 level	low	Yes	Yes	0.021
20	15	100,000 level	middle	Yes	Yes	0.043
50	20	150,000 level	high	Yes	Yes	0.087
Unit (crowd size: number, vehicle size: number, scene scale: number of vertices, time cost: seconds/frame)

5.4. Scene 4: Evacuation from dangerous waters

Crowds were evacuated from whirlpools, sudden change of water level, and other environments. The crowd evacuation process after the flood occurred was simulated.

Fig. 26

Human-water interaction user evaluation data.

Fig. 27

Evacuation from dangerous waters: whirlpool.

Experimental Analysis: In this simulation experiment, two user evaluation tests were conducted to assess the authenticity of the method. In the first user test, each participant viewed the scene and simulation results from a panoramic perspective (overlook) (e.g., movement of people and water). In the second user test, participants used a first-person perspective, which is closer to their daily life. As shown in Fig. 26, in each group of user tests, participants were required to rate the results using a 7-point scale, where 1 indicates that the result is generally realistic, 7 indicates that the result is very strong, and 4 indicates that the result is acceptable .(The participants did not directly participate in the study and were not informed of the study beforehand).

The experimental results show that the fluid motion calculation and fluid surface reconstruction method used in this paper are feasible. The visual perception is good and the reality is high (Fig. 27,28).

Fig. 28

Evacuation from dangerous waters: sudden change of water level. (a) t = 1.0s, (b) t = 5.0s, (c) t = 10.0s

6. Discussion and Conclusion

In our previous research, we systematically explored crowd behavior during flood disasters, covering key aspects such as situation analysis, global path planning, local routing, risk assessment, and escape strategies. The current study builds upon those foundations and extends them in multiple dimensions.

We enhanced real-scene 3D model construction by adopting drone-based oblique photography to capture elevation, texture, and mesh data of terrain and built environments. A lightweight 3D reconstruction algorithm was utilized to reduce computational overhead, and a novel cage-type projection separation method was proposed for isolating ground objects. In the domain of solid-state object modeling, we introduced a particle replacement algorithm to simulate physical properties efficiently.

To support large-scale crowd simulations, we proposed three new behavior algorithms derived from real-life observations. We also expanded the model’s scope to include fluid-driven natural phenomena such as earthquakes and landslides. Our work further refined natural behavior patterns and introduced more generalized algorithms to enhance realism across scenarios. These innovations enabled the simulation of multi-fluid interactions and preliminary exploration of cross-species agent modeling, laying a work foundation for our future research.

Looking ahead, we aim to generalize our framework further, making it adaptable to additional natural phenomena such as avalanches, volcanic eruptions, hurricanes, and biological threats (e.g., insect infestations). These extensions will contribute theoretical insights and practical tools for disaster prevention and emergency response planning.

Our model holds important promise for applications in early warning systems, predictive analytics, situational analysis, and scientific research. Future work will focus on:

1. Scenario-Specific Experimentation: Conducting more detailed simulations tailored to specific types of natural disasters, improving accuracy and relevance.

2. Model Optimization: Enhancing the performance and responsiveness of algorithms to achieve real-time simulation at larger scales.

3. Deepening Core Components: Further refining key modules such as real-scene 3D model construction, crowd behavior modeling, and multi-fluid interaction mechanisms.

4. Cross-Disciplinary Integration: Incorporating data from remote sensing, oblique photography, and augmented reality to enable richer, more immersive simulation environments.

By advancing along these paths, we aim to contribute to the frontier of simulation-based research, bridging computer graphics, environmental science, disaster management, and artificial intelligence. Our vision is to develop a comprehensive, adaptable framework capable of simulating complex biological and environmental systems with both scientific rigor and practical utility.

Data Availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Declarations

All procedures in this study were conducted in accordance with relevant ethical guidelines.

All experimental protocols were approved by Committee on Academic Ethics of East China Jiaotong University.

Informed consent was obtained from all subjects and/or their legal guardian(s).

Funding.

This work was supported in part by the NSFC (Grant No.52065024), Jiangxi Province Key R&D Program (Grant No.20202BBE53022, Grant No.20223BBE51010), Jiangxi Province 03 Special Project (Grant No.20212ABC03A20).

Contributions by authors.

Material preparation, data collection and analysis were performed by [Xunjin,Zou] and [Yunqing,Ye]. The first draft of the manuscript was written by [Xunjin,Zou] and [Yunqing,Ye] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Conceptualization: [Yunqing,Ye]; Methodology: [Xunjin,Zou]; Formal analysis and investigation: [Yunqing,Ye]; Writing - original draft preparation: [Yunqing,Ye]; Writing - review and editing: [Xunjin,Zou];Resources: [Tianxia,Feng]; Supervision: [ZhenMing,Zhu].

Corresponding author: Xunjin Zou, ZhenMing Zhu.

Yunqing Ye and Xunjin Zou are co-first authors of the article.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

Author Contribution

References

Attanasi, A. et al. Collective behaviour without collective order in wild swarms of midges. PLoS computational biology, 10(7), p.e1003697. (2014). 10.1371/journal.pcbi.1003697.s

Yates, C. A. et al. Inherent noise can facilitate coherence in collective swarm motion. Proceedings of the National Academy of Sciences, 106(14), pp.5464–5469. (2009). 10.1073/pnas.0871195106

Bandgar, P. S., Pondkule, K. A. & Khyade, V. B. Fascinating communication in honey bees. Int. J. Curr. Microbiol. App Sci. 7, 3704–3718. 10.20546/IJCMAS.2018.709.460 (2018).

Halupka, K. Spreading information in a network of interacting neighbours. PloS one. 9 (7), e102801. 10.1371/journal.pone.0102801 (2014).

Schmelzer, E. & Kastberger, G. Special agents’ trigger social waves in giant honeybees (Apis dorsata) Vol. 96, pp.1431–1441 (Naturwissenschaften, 2009). 10.1007/s00114-009-0605-y

Ruprecht, I., Michelic, F., Eggeling, E. & Preiner, R. Adaptive movement behavior for real-time crowd simulation. Visual Comput. 40 (7), 4789–4803. 10.1007/s00371-024-03476-2 (2024).

Reynolds, C. W. August. Flocks, herds and schools: A distributed behavioral model. In Proceedings of the 14th annual conference on Computer graphics and interactive techniques (pp. 25–34). (1987). 10.1145/37402.37406

Funge, J., Tu, X. & Terzopoulos, D. July. Cognitive modeling: Knowledge, reasoning and planning for intelligent characters. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques (pp. 29–38). (1999). 10.1145/311535.311538

Shao, W. & Terzopoulos, D. July. Autonomous pedestrians. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation (pp. 19–28). (2005). 10.1145/1073368.1073371

10.

Hughes, R. L. A continuum theory for the flow of pedestrians. Transp. Res. Part. B: Methodological. 36 (6), 507–535. 10.1016/S0191-2615(01)00015-7 (2002).

11.

Colombo, R. M. & Rosini, M. D. Pedestrian flows and non-classical shocks. Math. Methods Appl. Sci. 28 (13), 1553–1567. 10.1002/mma.624 (2005).

12.

Treuille, A., Cooper, S. & Popović, Z. Continuum crowds. ACM transactions on graphics (TOG), 25(3), pp.1160–1168. (2006). 10.1145/1179352.1142008

13.

Shirvani, M., Kesserwani, G. & Richmond, P. Agent-based simulator of dynamic flood‐people interactions. J. Flood Risk Manag. 14 (2), e12695. 10.48550/arXiv.1908.05232 (2021).

14.

Shirvani, M., Kesserwani, G. & Richmond, P. Agent-based modelling of pedestrian responses during flood emergency: mobility behavioural rules and implications for flood risk analysis. J. Hydroinformatics. 22 (5), 1078–1092. 10.48550/arXiv.2004.10589 (2020).

15.

Wei, L. & Freris, N. M. Multi-scale graph neural network for physics-informed fluid simulation. Visual Comput. 41 (2), 1171–1181. 10.1007/s00371-024-03402-6 (2025).

16.

Nakamoto, H., Takebayashi, H. & Fujita, M. Assessment of debris flow risk according to damage type. J. Disaster Sci. Manage. 1 (1), 5. 10.1007/s44367-025-00005-3 (2025).

17.

Takahashi, A. & Yasufuku, K. Evaluation of Tsunami Evacuation Plans for an Underground Mall Using an Agent-Based Model. J. Disaster Res. 19 (2), 268–278. 10.20965/jdr.2024.p0268 (2024).

18.

Nishino, K. & Ishida, T. November. Earthquake Virtual Reality Simulation System for Appropriate Evacuation Actions. In International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (pp. 292–301). Cham: Springer Nature Switzerland. (2024). 10.1007/978-3-031-76462-2_26

19.

Choi, Y., Yang, S. & Kim, S. The smoke control system to improve the possibility of evacuation from fire disasters in high-rise buildings. Therm. Sci. Eng. Progress. 59, 103269. 10.1016/j.tsep.2025.103269 (2025).

20.

Borralho, T. M. O., Rodrigues, J. P. C. & Santos, C. C. dos Evacuation of Lisbon’s Baixa-Chiado subway station in case of fire. Architecture, Structures and Construction, 5(1), p.11. (2025). 10.1007/s44150-025-00127-5

21.

Shukla, P., Swami, K. & Gaddam, H. K. Simulation Based Passengers Emergency Evacuation Study for a Double Decker Train Coach. Transp. Res. Procedia. 82, 1054–1070. 10.1016/j.trpro.2024.12.113 (2025).

22.

Bernardini, G., Cantatore, E., Fatiguso, F. & Quagliarini, E. User Behaviour in Terrorist Acts to Model the Evacuation in Outdoor Open Areas. In Terrorist Risk in Urban Outdoor Built Environment: Measuring and Mitigating via Behavioural Design Approach (pp. 35–58). Singapore: Springer Nature Singapore. (2024). 10.1007/978-981-97-6965-0_3

23.

Zou, X., Ye, Y., Zhu, Z. & Chen, Q. Crowd evacuation simulation in flowing fluids. Comput. Animat. Virtual Worlds. 34 (3–4), e2161. 10.1109/CGAMES.2011.6000319 (2023).

24.

Okereke, M., Keates, S., Okereke, M. & Keates, S. Finite element mesh generation. Finite Element Applications: A Practical Guide to the FEM Process, pp.165–186. (2018). 10.1007/978-3-319-67125-3_6

25.

Demyen, D. & Buro, M. July. Efficient triangulation-based pathfinding. In Aaai (Vol. 6, pp. 942–947). (2006). 10.1063/1.3579575

26.

Xu, F., Zhang, X. & & Shi, Y. Urban Object Classification Based on Lidar and Aerial Imagery. Remote Sensing Technology and Application (China), 34 (02),253–262. (2019). 10.11873/j.issn.1004-0323.2019.2.0253

27.

Ren, J. Multi-agent system simulation and evaluation (Ph.D. dissertation, Zhejiang University). Doctor. (2019).

28.

Xue, Y. Study on the model of dense population flow and evacuation guidance under environmental constraints (Ph.D. dissertation, Harbin Institute of Technology). Doctor. (2023). 10.27061/d.cnki.ghgdu.2023.005302

29.

Sha, Y. Research on Microscopic Simulation Model of Crowd Dispersion (Doctoral Dissertation, Tsinghua University). Doctor. (2008).

Yes