Simulation-Based Machine Learning for Detection of Rarely Occurring Behaviors Based on Human Trajectories (original) (raw)

1 Introduction

Security in public spaces has become a growing concern, leading to increased interest in intelligent surveillance and autonomous security robots. In particular, robotic bodyguards designed to protect high-profile individuals require the ability to detect potential threats based on human motion cues. Recognizing hostile intentions in public environments is challenging due to the rarity of such behaviors, which makes data collection difficult.

Previous studies on human behavior recognition often focus on standard pedestrian dynamics or anomaly detection in general settings. However, in security contexts, specific high-risk behaviors must be detected with high accuracy. These include cases such as an individual rapidly closing the distance toward a target, loitering near restricted areas with intent to breach security, or abruptly changing direction after being detected. These behaviors, while infrequent, are critical for identifying threats before they materialize. For the purposes of this study, we define rarely occurring behaviors (or rare behaviors) as intentional, goal-directed actions that are infrequent in public settings, deviate from typical social conduct, and can be identified through their trajectory characteristics.

To address the scarcity of real-world training data for such events, we propose a simulation-based machine learning approach that generates synthetic data representing these rarely occurring behaviors. This allows models to be trained effectively even when real-world observations are limited. The study explores how simulated trajectories, informed by human movement patterns, can enhance behavior recognition capabilities in robotic security applications.

We hypothesize that rare behaviors, such as hostile intent, can be detected using 2D motion trajectories under controlled settings.

Therefore, we investigate the use of simulation-only data versus combined simulation and real data for a simulation-based trajectory learning approach. The purpose of this research is to study the parameters that influence machine learning performance using simulated trajectories, taking into account the following question: ‘How to optimize the detection performance of rarely occurring behaviors by following an effective simulation-based trajectory learning approach?”

To investigate this question, this study proposes a simulation-based solution to deal with the lack of rarely occurring behavioral data through a 3D interactive social force model-based simulator of pedestrian movement behavior in public places (i.e., human trajectory) along with rarely occurring behavioral trajectories while maintaining realistic interactions between all pedestrians and objects in the space with the possibility to:

Integrate a rarely occurring behavioral model so that it’s possible to simulate the behavior while interacting with other people.
Build the targeted public space in the simulation environment so that it can reflect the actual environment where the behavior occurs in real life.
Tune physical parameters to reduce the reality gap.

To our knowledge, the literature does not include simulations of rarely occurring behaviors. Up to now, related research contributions consider simulating general movement behavior in public places (e.g., simulation of future trajectories based on past ones [1,2,3]), except for a few research works that discuss panic behaviors during emergency situations in public places [4]. Also, the pre-built simulation environments in current approaches are general ones, dependent on training using a large amount of real data, and they are not built and tuned precisely for complex interaction behaviors in public places. In addition, this is the first research to discuss potential approaches to tune physical parameters in human simulation interactive environments with the aim of reducing the reality gap. The difference between these approaches and ours is that instead of training a generative model based on a large amount of real data, we offer a tuned simulation environment where it’s possible to generate large amounts of rarely occurring data from an interactive simulator that is developed based on real-life observations.

The overall goal of this research is to develop a simulation-based approach to deal with the lack of rarely occurring behavioral data, while the purpose of using a real data set is to validate the performance of the proposed simulation-based approach. In this context, the real data set should reflect a general, rarely occurring behavior. The specific type of rarely occurring behavior or to what extent it reflects its occurrence in real life are not the focus of this study.

The main contributions of this research work are as follows:

We confirmed that it’s possible to achieve successful prediction performance for “rarely” occurring behaviors by using simulated behavioral trajectories.
We followed an effective representation method of simulated trajectories, and it helped us achieve improved prediction performance of rarely occurring behaviors through a simulation-based trajectory learning approach and confirmed it through a real-case study.
The other proposed methods are considered a novel guideline that optimizes performance by investigating the modeling parameters (e.g., artificial noises, etc.) in the simulation environment and the amount of data for simulation-based trajectory learning. Consideration of modeling parameters is done by investigating ranges with a size larger than the well-known real values.

The remaining sections of this paper are organized as follows: Section 2 introduces the related work. Section 3 illustrates the proposed methodology along with steps to follow towards successful simulation-based trajectory learning of rarely occurring behaviors. Section 4 explains a case study of social trajectory with limited data, data sets, a simulation environment, and a behavioral model. Section 5 shows the case study results following the proposed methodology. Section 6 discusses the research findings. Finally, Sect. 7 concludes.

Understanding human movement and social interactions is critical for various fields, including robotics, intelligent surveillance, and behavioral analysis [5, 6]. Recent deep learning models have significantly advanced human trajectory prediction by capturing temporal dependencies and social interactions [7]. Techniques such as graph neural networks (GNNs) and transformer-based models have shown promising results for long-term prediction of pedestrian motion in dynamic environments [8, 9].

One notable advancement is the Social Attention Graph Neural Network (SA-GAT), which explicitly models pedestrian interactions for better motion forecasting [10]. These methods allow for more context-aware prediction systems, enabling real-time trajectory forecasting with higher accuracy and adaptability [11].

Recent research has also focused on large-scale datasets and multimodal approaches to further improve pedestrian trajectory prediction. Multi-modal trajectory prediction networks incorporating traffic signals and environmental constraints have been developed to handle complex real-world scenarios such as urban navigation and pedestrian flow analysis [12]. Additionally, Large Language Models (LLMs) like LG-Traj are being explored for pedestrian behavior prediction, leveraging semantic reasoning from textual and visual cues [13].

In contrast to the works above, which typically rely on abundant real-world data, our study addresses the data scarcity problem for rarely occurring behaviors by using simulations to generate synthetic yet realistic human trajectories.

2.2 The Role of Proxemics in Human-Robot Interaction (HRI)

Proxemics, the study of how individuals maintain personal space in social interactions, has gained importance in robotics and AI-based surveillance [14]. Human behavior in social settings is highly influenced by proxemic factors such as distance, approach angles, and interaction duration [15]. Studies have shown that deviations in proxemic patterns can be strong indicators of abnormal behaviors, including hostile intentions [14].

The integration of proxemics in AI models has led to the development of proxemic-aware augmented reality (AR) systems, which enhance human-robot interaction (HRI) by dynamically adapting robot behavior based on human movement patterns [16]. Moreover, the impact of robot personality on human proxemic responses has been explored, with findings suggesting that different robotic personas influence human comfort and personal space maintenance [17].

Recent work has also examined emotion-aware navigation, where robots can adjust their motion strategy based on inferred human emotions, leading to more socially acceptable robot navigation [18]. This approach ensures that robots behave in a contextually aware and non-intrusive manner, particularly in crowded spaces.

2.3 Simulation-Based Learning for Rare Behavior Detection

Rarely occurring human behaviors, such as hostile approaches, suspicious loitering, and sudden deviations in movement, pose significant challenges for real-time security applications. Collecting real-world data for such scenarios is often difficult due to privacy and ethical constraints, leading researchers to rely on simulation-based learning to augment datasets [19].

Several research studies [20,21,22] have proposed approaches using 3D simulation data for tracking and action recognition via computer vision training. For instance [23], used randomized simulation data for human behavior recognition. Zeng et al. [24] proposed adversarial attacks by using 3D simulation. [25] proposed evaluation for multi-future trajectory prediction using the ForkingPaths dataset, which is created by human annotators who are controlling agents in a 3D simulator.

On the other hand, research in [3, 4] trained classification or generative models using synthetic and semi-synthetic data. For example [4], used a synthetic dataset created from the Grand Theft Auto V engine to train a generative model about a crowd in panic or a crowd in a fight. Also, the research done in [3] used a semi-synthetic data set created by combining real group data with generated trajectories from their social-aware navigation method, then trained a model using neural networks called App-LSTM to generate behavioral interactions with groups of agents and individuals in a crowd.

Unlike previous work, our approach simulates rare behaviors rather than general or emergency crowd scenarios.

2.4 Challenges in Simulation-Based Trajectory Detection

Despite recent advances, several challenges remain in the real-world deployment of trajectory prediction and behavior analysis models. One major limitation is Sim-to-Real adaptation, where models trained on simulated data struggle to generalize to domain shifts between simulated and real-world environments [26]. Emerging solutions, such as reinforcement learning and domain adaptation techniques, have shown promise in bridging the simulation-reality gap [16, 27].

Additionally, recent studies have introduced Human Trajectory Prediction via Neural Social Physics, a framework that combines physics-based motion modeling with deep learning techniques, offering improved generalization capabilities [28]. Similarly, adaptive HRI models are being developed to improve robot perception and decision-making in dynamic social environments [18, 29].

As AI-powered surveillance and robotic security systems become more prevalent, ethical considerations regarding data privacy, bias mitigation, and transparent decision-making must also be addressed to ensure responsible deployment of predictive technologies [30, 31].

Our study addresses these challenges by focusing on sim-to-real transfer through physical parameter tuning and structured trajectory representation, without relying on large real-world datasets.

3 Methodology

Rarely occurring low-moral behaviors and their associated trajectory features within a range of crowded pedestrians are challenging to detect in public places due to a lack of data, e.g., stalking or preparing for a theft in front of a shop, etc. Simulation is one of the most recent solutions to generate human trajectories when they’re limited in real life. However, building a simulator that can reflect the actual behavior in the real world is challenging due to the complexity of the behavior and a lack of knowledge about the actual characteristics and features. The simulation environment should include a pedestrian model for common interactions in public places, along with a fairly realistic approximate model of a rarely occurring behavior. In addition, physical parameters within simulation environments, e.g., positional noise, velocity, etc., play an important role in achieving realistic and helpful simulated trajectories for machine learning. Also, fine-tuning (transfer learning) a simulation-trained model with limited real data can help it learn complex features that can’t be covered within the simulation environment. However, related studies generate simulated trajectories which are clean with constant-speed linear paths of pedestrians and fail to reflect noisy interactions between pedestrians in real-world public places [1, 3]. In this context, our methodology involves designing simulated and real scenarios for behavior recognition, specifically to model and identify rarely occurring human behaviors such as hostile approaches. The methodology is divided into the following components:

3.1 Process Steps

We propose the following process steps (illustrated in Fig. 1 and the following subsections) to follow:

1. Observation and collection of limited human behavior trajectory-related data from real life. However, if certain behavioral data are difficult to collect, a role-playing experiment is an alternative. These include normal interactions and examples of rare hostile behavior [32].
1. Mathematical modeling of rarely occurring behavior as a 2D trajectory. The model should reasonably match the observed data and be imported into a pre-built simulation environment. A pedestrian simulator based on the social force model [33], is recommended, where regular pedestrians follow attraction/repulsion dynamics, ensuring realistic proxemic behavior while maintaining diversity in speed and trajectory.
1. To successfully classify the trajectories into those that belong to rarely occurring behavior and normal behavior, a reduction of the reality gap in the simulation environment and optimization of the performance of the learning process using simulated trajectories should be followed. It can be achieved by tuning the physical parameters in the simulated environment and following a simplified representation of the simulated trajectories before they are imported into the learning process.
1. Consideration of the minimum applicable simulation and real data sizes for pure simulation training and fine-tuning with limited real data by following the appropriate learning structure for the trained model [34].

Fig. 1

Proposed process steps towards fine-tuned simulation-trained models for prediction of rarely occurring trajectories

3.2 Collecting Data

Data collection for rare human behaviors should be conducted in real-life public places through actual cases. This depends on several factors, e.g., chances of behavior occurrence in public places, possibility to collect reasonable data size (at least hundreds of samples), and permits in targeted locations. If these factors are not possible to achieve, a role-playing experiment should be considered as an alternative solution. The collection process requires reasonable selection of the place to collect data based on the targeted rarely occurring behavior, appropriate setup of the collection system, and careful observations of the behaviors’ possible features. The collected data helps to observe actual behavioral characteristics and can be used to validate and fine-tune the simulation-trained models. Trajectories data in 2D (X, Y) can be collected using vision- or non-vision-based tracking methods. However, due to privacy concerns, we recommend following a non-vision-based method using sensors, e.g., a Li-DAR sensor. These data should be filtered out from the rich information gained through the sensor, e.g., point cloud data of the objects and pedestrians.

3.3 Integrating Model in Simulation Environment

The simulator should be built so that it reflects the real-world environment structure of a public place, where normal pedestrians exist and rare behaviors occur. Instead of using basic settings, i.e., constant speed, the selected settings and parameters for pedestrian movements and interactions should be optimized using a well-known pedestrian model (based on a social force model [33]) to ensure realistic interactions between all pedestrians. The targeted, rarely occurring behavior should be mathematically modeled based on trajectory-related observations to be integrated within the simulation environment.

In other words, collected observations should be formulated into a simple and approximate model to reflect the targeted behavior. To achieve this, the physical parameters and movement directions for the specified behavior should be chosen within the simulator code, for example, preferred velocities, accelerations, and movement goals. Additionally, a pedestrian who exhibits unusual behavior should primarily navigate through the environment according to the given mathematical behavioral model, although on other occasions they should interact with other pedestrians and obstacles in accordance with the fundamental pedestrian model.

The behavioral agent should be initially modeled using the same social force parameters as normal pedestrians. However, at the same time, the assigned behavioral model becomes active so that the agent dynamically switches to a guided hostile action. This ensures that the agent is indistinguishable under normal conditions but shifts toward an identifiable rare trajectory profile as they approach their target or are under certain conditions.

3.4 Guidelines for Effective Simulation-Based Learning of Rare Trajectories

Due to the reality gap and the difficulty of predicting behaviors through trajectories, generating realistic behavioral trajectories through simulation is challenging. In this context, we present a guideline with methods to follow towards effective simulation-based trajectory learning. To do so, we will discuss it in two directions:

1. Optimizing the accuracy of training using simulation data by tuning physical parameters in the simulation environment: we propose reasonable configurations of physical parameters in the simulator environment. The parameter exploration range should deviate around real values so that it’s possible to explore their effect on performance.
1. Accurate and efficient training: we propose a trajectory representation method that can enhance the performance of trajectory-based learning.

3.4.1 Introducing Realism into Simulated Trajectories

Real trajectories are always noisy, but simulated ones are by default clean. Based on that, we propose to consider reasonable configurations for simulation physical parameters (positional and velocity noises) in the simulation environment in order to optimize the performance of trajectory models.

Firstly, we recommend adding uniformly distributed artificial noise ranges “point by point” to the simulated trajectories so that it’s possible to reduce the reality gap “towards realistic noisy trials” and also test the performance of learning when adding several ranges of noise to the simulated trajectories. The simulated raw data have some noise because the simulation environment contains perception noise plus pedestrian noise due to dynamic models in simulation, i.e., human interactions. Figure 2 shows a sample trajectory from our simulation raw data in comparison with a resultant noisy trajectory (artificial noise added point by point independently using several ranges). The overall noise range for real and simulated trajectories is calculated later on within a following section and compared per 1 second. The noise estimation is done based on the distance between the actual and fake (straight) middle points for every 1-second (11-point) section of the trajectory.

Fig. 2

Raw vs. artificially noisy simulation trajectory trail

Secondly, usually, preferred velocities are randomly determined by considering a Gaussian distribution with a specific mean and standard deviation. Their values are usually recommended based on collected pedestrian data to obtain a realistic macroscopic velocity model and density for pedestrian distributions [35]. However, we intend to explore the effectiveness of using different velocity standard deviations in order to see whether there is a possibility for an enhancement in the performance using other values than the recommended ones.

Finally, although our main objective is to reduce the reality gap between clean simulation trajectories and real human movement, a direct quantitative comparison between artificially noisy simulated trajectories and real-world data is not feasible. This is mainly due to the lack of available datasets containing fully matching conditions, such as starting positions, relative velocities, and pedestrian interactions. Instead, we rely on qualitatively comparing sample noisy trajectories with real data to assess plausibility, complemented by empirical evaluation of whether adding controlled noise improves model robustness and recognition accuracy under realistic variability. This approach prioritizes practical improvements in behavior detection over trajectory similarity metrics alone.

3.4.2 Trajectory Alignment

To optimize trajectory learning efficiency, we should find the best way to represent trajectories before they are imported into the learning process. In this context, we followed in this subsection the concept of trajectory alignment, which helped to optimize the learning performance significantly by shifting their coordinates to the origin with respect to the final position so that the final point of the trajectory is on the origin point along with rotation to align the first and end points with the x-axis. Figure 3 shows a simulated trial with the proposed trajectory representation process.

Fig. 3

Proposed trajectories alignment process: shifting coordinates to the origin with respect to the final position and rotating to align with x-axis

3.5 Optimizing Data Size

Deep learning generally requires big data to achieve high performance. However, we intend to explore the minimum efficient simulation data size for successful prediction of rarely occurring behaviors. To do so, we propose checking the performance of the trained models using different size ranges of simulated data. In addition, it’s important to explore the minimum acceptable real data size for successful transfer learning by fine-tuning the simulation-trained models using a limited number of real trajectories. Transfer learning is needed to learn the complex human behaviors that can’t be mathematically modeled and are not covered in the simulation environment.

Celebrity persons always have human bodyguards who walk beside them because they typically face high-risk situations most of the time. We would like to study the possibility of replacing human bodyguards with robot guards in the future. Our case study involves a walking person accompanied by a guard robot, whom we refer to as “a partner,” while another one is approaching to do some harm, whom we refer to as an “enemy,” for example, following someone to prepare for a theft or harassment. We assume that there is a robot guard walking with his partner in a public place where movements from normal pedestrians exist in the same place. The robot should be able to classify abnormal behaviors from normal ones within an appropriate recognition range so that it’s possible to identify the behavior at an early stage before the suspected person reaches the target person.

4.1 Human Trajectory Collection

Due to a lack of data on enemy behaviors, we performed human data collection for the proposed case study. In this context, the field experiment (as shown in Fig. 4) is carried out in a shopping mall using a Robovie robot, which is equipped with a Velodyne LiDAR sensor (HDL-32E). We collected 63 trajectories that were long enough. The Velodyne sensor captures pose points for every pedestrian in the whole body. Then, these data are saved as a 3D point cloud. In our study, we preprocess the data and consider only a single (X, Y) point in the middle of the body to represent the trajectory over time. Two participants were asked to play the roles of enemy and partner, and we collected 63 human trajectory trials. An example of trajectory-based point cloud data can be found online here. It overlays text labels indicating “Enemy” and “Partner” above each trajectory’s initial position throughout the video playback. The participant who plays the role of the enemy was asked to target the partner and follow her trajectory freely from any location based on their interpretation of the enemy’s behavior, e.g., straightforward when it is possible to walk freely, rotating around obstacles or columns in the place, and moving randomly when in a face-to-face situation with the target person.

4.2 Model-Based Simulation

4.2.1 Simulation Environment

The simulation environment is built based on the experiment environment structure at a shopping mall, and the entire simulator code follows a calibrated social force model using case studies for various situations occurring in public places. The technical details can be found here [36]. The motions and trajectories of simulated pedestrians are computed based on that. The distribution of walking speeds of simulated pedestrians (including enemies and partners) “preferred speed” is used to define the moving speed using a Gaussian distribution with a mean of 1.28 m/s and a standard deviation of 0.2 m/s. The distributions are scaled in such a way that simulated and observed ones always have the same average value [35]. A complete trajectory of a pedestrian in the environment is defined by a sequence of subgoals towards the final destination. All pedestrians in the same group have the same preferred speed and subgoals. Pedestrians predict the place and time of their next collision with each other in order to avoid it, so their trajectories are influenced by the surrounding people. Figure 5 shows the simulation environment along with partner, enemy, and pedestrian movement directions. The enemy model is explained in the following subsection.

4.2.2 Enemy Model

Behavioral characteristics for rarely occurring incidents that can be classified as enemy behavior, for example, theft or harassment, are limited and lack availability. The observations that can be obtained from the real world are considered the only source to get information about it, and they are limited for rare behaviors. The potential models should be based on major behavioral observations, which frequently occur. It’s clear from most related incidents that the suspicious person (enemy) tries to approach and follow the position of a walking person (partner) from several perspectives.

In this context, we chose two different behavioral ideas (models) to imitate the behavior of people observed in the collected data. In the first one, we set the enemy goal and preferred velocity so that they continuously change according to the partner’s position (in other words, the enemy is always forced to move towards the partner’s position) (online video: [model 1>]), while in the second one, the enemy moves towards an estimated goal position by estimating the partner’s goal based on walking speed or a fixed distance prediction ahead of the target (online video: [model 2]).

To determine which enemy behavior model can produce more realistic and effective training data, we performed comparative tests using cross-validation. Specifically, we generated synthetic datasets using each model separately and trained identical trajectory classification networks. Performance was measured using 50-fold cross-validation on real test trajectories (simulation datasets of the same size, which is approximately five times the size of the real data). Model 1 achieved a higher mean accuracy (84.47%) compared to Model 2 (79.81%), with an approximate difference of 4.5%. These results supported the selection of Model 1 as the preferred generator for the main experiments.

We initially expected the latter model to perform better since it’s modeled via observation and appeared conceptually more plausible, but the test results were worse than those of the simple “pointing to target” (model 1). Based on that, we decided to use the simplified behavioral approach (model 1), where the enemy always targets the position of the partner.

To simulate a realistic transition from normal to hostile behavior, we implemented a distance-based activation mechanism. Specifically, the enemy agent begins with general pedestrian movement using normal social force parameters. Once the distance between the enemy and the partner drops below 7 meters, the hostile behavior model is triggered. This transition causes the enemy to begin directly targeting the partner’s position, as defined in model 1. This setup reflects observations from our real-world data, where individuals exhibiting suspicious intent began to change their trajectory patterns within a similar spatial range.

Fig. 4

Field experiment: a role-play case for human data collection on the rarely occurring scenario that one suspicious person (the enemy) approaches a walking person (the partner) in a public space, e.g., a shopping mall. The mobile robot moves next to the human partner and collects the human trajectories

Fig. 5

Simulation environment of the rarely occurring scenario that a suspicious person approaches to other people in a shopping mall

4.3 Behavioral Recognition Range Design

The recognition range defines the spatial boundary within which the behavior of an approaching agent (e.g., enemy) must be detected relative to the partner. Its purpose is twofold: (1) to prevent unrealistic behavior where the enemy appears too close to the partner at the start, and (2) to ensure detection occurs within a practical and safety-critical window before physical proximity is reached.

During real-world trials, we observed that enemy actors typically began exhibiting goal-directed movement toward the partner from an approximate distance of 5 meters. On average, these actors reached within 1–2 meters of the partner at the endpoint of their trajectory. To ensure early yet realistic detection, we set a recognition boundary of 7 meters.

This 7-meter range enables the model to observe the complete initiation and progression of enemy behavior during both training and testing, while excluding ambiguous cases where behavior might not yet be distinctive. It also provides a safety buffer, allowing for proactive responses (e.g., alerts or robot reactions) before the agent reaches a critical proximity.

Consequently, this range affects model training by shaping the data labeling window and ensures that evaluation metrics reflect timely detection, which is vital for real-world deployment in surveillance or robotic systems.

5 Training Results

We present in this section the training results using the proposed DNN-LSTM training structure, which is explained in the next subsection. A series of one-second windows (11 frames/window) of enemy trajectory, along with the partner’s trajectory, are fed to the network. We split simulation and real data into independent trajectory trials within the recognition range as follows: 88% for training, 10% for validation, and 2% for testing, with a 50-fold cross-validation (CV). In addition, to avoid overfitting, a dropout of 0.1 is used after each LSTM layer.

To assess the model’s predictive performance, we used standard evaluation metrics including categorical accuracy and confusion matrix analysis (TP, FP, FN, TN). Where: \(\textrm{TP} = \textrm{True}\) Positives (correctly predicted enemy), TN = True Negatives (correctly predicted pedestrian), \(\textrm{FP} = \textrm{False}\) Positives, \(\textrm{FN} = \textrm{False}\) Negatives.

During training, the model was compiled using categorical cross-entropy loss and evaluated on both training and validation datasets using the categorical accuracy metric provided by Keras. We monitored this accuracy during training with early stopping based on validation loss (\(\textrm{patience} = 50\) epochs).

For final evaluation on the test set, we computed the classification accuracy using accuracy score() from scikit-learn. To provide deeper insight into prediction errors, we converted softmax outputs into binary class predictions and compared them against true labels, capturing:

False Positives (FP): predicted “enemy” when ground truth is “pedestrian”.

False Negatives (FN): predicted “pedestrian” when ground truth is “enemy”.

5.1 Network Architecture and Training Setup

During recent years, LSTMs have shown promising performance when dealing with time series data [34]. In this context, we decided to follow a recommended DNN-LSTM structure (supervised training network) under Python and Keras, which is shown in Fig. 6, where a series of windows of 2D (X, Y) trajectories are fed to the network with a time scale of one second (11 frames/window) as a paired window where each window consists of the suspected (enemy or normal pedestrian) trajectory, along with the partner’s trajectory. The size of the data used for training (real and simulated) is balanced (50% normal, 50% rare), and the model parameters are tuned to achieve the best possible performance. The applied learning rate is 0.00001, while the optimizer is RMS. Two fully connected LSTM layers are used (1024 units), while classifier layer activation functions are “RelU” (512 units) and “Softmax” (2 binary classes). Finally, the classifier distinguishes between targeted and normal behaviors.

Fig. 6

The DNN-LSTM supervised network structure

5.2 Baseline

The baseline is trained using only experiment data (372 paired windows) and following the presented DNN-LSTM training structure. Firstly, we tried to follow the solution proposed in [34] and augmented the data by overlapping the enemy’s approaching behavior. However, the average testing accuracy reached is around 75%. Then, we followed our concept of trajectory alignment, and the average testing accuracy reached is 80.99% (50-fold CV). This is our baseline to be compared with the trained models using only simulation data, which are explained in the following subsections.

5.3 Statistical Analysis

To assess whether differences in model performance across conditions were statistically significant, we performed the following tests using results from 50-fold cross-validation:

One-way repeated measures ANOVA: This test was used to assess the impact of positional and velocity noise levels. Assumptions of normality and sphericity were checked using the Shapiro-Wilk test and Mauchly’s test, respectively. Non-sphericity corrections were applied where necessary.
Paired t-test: This test was used to compare fine-tuned and non-fine-tuned models. Normality of differences was assessed using the Shapiro-Wilk test.

Statistical significance was set at p less than 0.05 for all tests.

5.4 Positional Noise

We applied uniformly distributed artificial noise ranges to our simulation raw data independently for each frame (2,500 paired one-second windows). Hence, we used each dataset for training the model. Figure 7 shows the performance of the simulation-trained model (using simulated raw data) up to 84.47% while our real baseline is 80.99%.

The results indicate that a noise range of \(\pm\)6 cm yields the best performance up to 85.28%. To substantiate this, a one-way repeated measures ANOVA was conducted to compare accuracy across different positional noise levels (\(\pm\)2 cm, \(\pm\)4 cm, \(\pm\)6 cm, \(\pm\)8 cm, \(\pm\)10 cm, \(\pm\)20 cm). The analysis showed significant differences (F(6, 343) = 4.06, \(p <.001\)), confirming that positional noise levels significantly affect accuracy. The \(\pm\)6 cm noise level was found to be optimal.

Table 1 gives the resultant average overall noise values for each data set based on the middle points calculation, where the simulation data set with an additive noise range of 6 cm gives an overall average noise value approximately similar to our real data value of 6.9 cm.

Fig. 7

The effect of artificial positional noise on the performance. A one-way ANOVA confirmed significant differences in accuracy across positional noise levels (F(6, 343) = 4.06, \(p <.001\))

Table 1 Real and simulation data average noise values

Full size table

5.5 Velocity Noise

We explored the effectiveness of using different velocity standard deviations (SDs) in the simulator environment other than the recommended value of 0.2 m/s while keeping a velocity mean of 1.28 m/s [35]. A one-way repeated measures ANOVA was performed to evaluate accuracy across different velocity SDs (0.1 m/s, 0.2 m/s, 0.3 m/s, 0.4 m/s, 0.5 m/s). The analysis revealed significant differences (F(5, 294) = 36.47, \(p <.001\)), with 0.3 m/s being the optimal value for velocity SD up to an overall accuracy of 86.18% as shown in Fig. 8.

Fig. 8

The effect of velocity noise on the performance. Velocity SD of 0.3 m/s yielded the highest accuracy. ANOVA results: F(5, 294) = 36.47, p <.001

The approximate calculation for our real data showed that their average velocity and standard deviation are 1.19 m/s and 0.322 m/s, respectively. It illustrates the best performance shown using 0.3 m/s SD within the simulator environment, which is the closest value to our real data.

5.6 Combining Positional and Velocity Noise

We tried to consider positional and velocity noise as the best parameters (by using both of them at the same time) and checked the performance in comparison with following each one separately, as shown in Fig. 9. Our results showed that merging both parameters did not lead to a significant enhancement in the performance when compared with using a standard deviation of 0.3 m/s. Based on that, we confirmed that the best performance is with 0 positional noise and 0.3 m/s velocity noise.

Fig. 9

Positional and velocity noise optimization. A combination of positional noise (\(\pm\)6 cm) and velocity noise (SD = 0.3 m/s) showed no significant additional performance improvements over velocity noise optimization alone (F(3, 196) = 16.43, \(p <.001\))

5.7 Data Size and Proposed Trajectory Alignment

We applied a range of simulation data sizes (1–5 K windows) to explore the acceptable size for successful prediction. Simulation dataset of 3000 samples appears to be the minimum successful size for prediction because the performance did not improve significantly with higher sizes. Based on that, we explored a new method to optimize performance through trajectory representation. The results are compared with the trained models using original simulated trajectories to show the effectiveness of following the proposed method of trajectory alignment. The results showed that the proposed method helped to achieve a significant extra enhancement in performance, up to 86.20%. To assess this improvement, a paired t-test was conducted using the accuracies from 50-fold cross-validation. The test indicated a significant improvement using aligned data (t(49) = 63.36, \(p <.001\)). Figure 10 shows the effect of data size and the proposed method of trajectory alignment.

Fig. 10

The effect of data size and the proposed method of trajectory alignment. Aligned data resulted in significantly higher accuracy (t(49) = 63.36, \(p <.001\)), improving detection performance compared to original data models

5.8 Fine-Tuning via Transfer Learning

To investigate complex behavioral features that are not covered in the simulation environment, we fine-tuned (transfer learning) the simulation-trained models (around 100 trajectories for each batch of 1000 samples) with limited real-world data by retraining the last layer using 60 real trajectories (360 samples) and following a 50-fold CV. Figure 11 depicts the results of the fine-tuned models versus the non-fine-tuned ones. These results consider 0.3 m/s velocity noise while no positional noise is added.

Fine-tuning the models resulted in enhanced performance, with an accuracy of 90.21% compared to 86.20% for non-fine-tuned models trained on 3000 samples. To assess this improvement, a paired t-test was conducted using the accuracies from 50-fold cross-validation. The test indicated a significant improvement with fine-tuning (t(49) = 2.68, p = 0.010).

In addition, we tried to explore the minimum acceptable real data size for successful transfer learning for rarely occurring behaviors by fine-tuning the simulation-trained models using fewer real trajectories within the range of 10 to 60. Figure 12 shows the performance trend that is linear. However, using very limited real data sizes for fine-tuning (fewer than 20 trajectories) appeared to reduce performance and, in some cases, resulted in worse outcomes than the non-fine-tuned models.

Fig. 11

The effect of fine-tuning using most of the real data. Fine-tuning resulted in significantly higher accuracy (t(49) = 2.68, p = 0.010), improving detection performance compared to non-fine-tuned models

Fig. 12

The effect of fine-tuning using a small number of real trajectories. The size of the fine-tuning dataset is not significantly impacting the performance (F(5, 294) = 1.37, p = 0.235)

The testing confusion matrix for non-fine-tuned and fine-tuned models is shown in Fig. 13. It’s obvious from the matrix results that both models predict enemy behavior reasonably well. The majority of failure cases (false negatives) are caused by normal pedestrians’ confusing behavior, which is very similar to approaching trajectories from potential enemies. In addition to the minority of failure cases (false positives) caused by potential enemies’ confusing behavior, which is similar to trajectories from normal pedestrians. For example, moving in parallel with the partner from a certain distance or changing the direction of movement on certain occasions. These are challenging cases that could not be detected by the simulation-trained model and were successfully classified using a real fine-tuned model. Figure 14 shows a couple of trials from the false positives and negatives for the non-fine-tuned model, where four cases were successfully predicted using the fine-tuned model. This means that the fine-tuning process helped the model to successfully predict the complex behavioral trials and achieve reasonable extra enhancement in performance.

Fig. 13

Confusion matrices for testing results using real data

Fig. 14

Examples of real data trials from false positives and negatives for a non-fine-tuned model

6 Discussion

Based on research study results, we confirmed that our proposed methods help achieve successful detection performance following effective simulation-based trajectory learning:

The proposed way of generating simulated noisy trials helped to reduce the reality gap to a specific range. However, investigating other velocity distributions showed an improvement in performance.

Following the trajectory alignment method significantly improved learning performance by shifting their coordinates to the origin with respect to the final position and rotating to align with the x-axis.
Due to the complexity of the behavior, fine-tuning the simulation-trained models using most of our real data resulted in enhanced performance. However, using very limited real data sizes for fine-tuning can degrade performance.

There are two purposes for adding artificial noise to the simulated trajectories or discovering other velocity ranges, as follows in this study. The first purpose is to confirm which parameter ranges are closer to real-life data, while the second purpose is to confirm which ranges can optimize the training and detection performance, whether higher, lower, or equal to real-life data. In addition, we applied uniformly distributed artificial noise ranges to our simulation raw data with the aim of generating a realistic trajectory, and we achieved enhanced performance. However, we did not investigate whether applying other distributions (e.g., Gaussian) could perform better or not. We reported these findings and hope that they can be generalized for other trajectory-related behaviors that rarely occur.

This research study focuses on the simulated physical parameters that affect the shape of the human trajectory and its associated characteristics in public places through a case study. In this context, the findings of the research study should generalize to other behavioral cases if behavior characteristics are related to the trajectory of movement.

The other existing simulation-based machine learning models consider CNN structures [20,21,22] or LSTM [3] by using skeleton data [4] or trajectory data [25]. However, they tend to simulate individual or group behaviors separately and do not consider simulating rarely occurring or low-moral behaviors based on trajectories. Therefore, our study is considered the first research to focus on rarely occurring behaviors by following a simulation-based trajectory learning approach that considers a base social force model along with integration of the targeted rarely occurring trajectory model.

Although the accuracy reported in [34] to detect rarely occurring behaviors based on human trajectories and their physical parameters is rather high (around 91.33%), it depends on detecting a single person, and the solution does not show similar detection accuracy if the targeted action includes more people with possible behavioral interactions. To our knowledge, this is the first focused research on simulation trajectory-related, rarely occurring behaviors that could be suspicious or lead to low-moral behaviors. The accuracy reported from our case study is promising and reaches 90.2% with a few false positives. It also shows the importance of tuning the simulation environment’s physical parameters for the training and detection performance of such behaviors.

The limitations of the proposed approach include the following:

The accuracy of simulation-based machinelearning models is affected by the quality of thesimulation environment and the involved tunedphysical parameters.
Several environmental structures may beneeded based on the targeted behavior and possible occurring locations.

However, the proposed guideline is expected to improve detection performance by tuning the simulation environment parameters and achieving trustworthy simulated trajectories.

Since there is no real data set available for suspicious approaching behavior in public spaces collected by LiDARs or any other sensors, we decided to role-play the behavior. However, the role-played people were asked to move in the space and behave based on their own understanding of the investigated behavior so that it is possible to consider different scenarios and make the data as diverse as possible.

We believe that additional validation scenarios should be performed before applying our current model to the bodyguard application. However, this is a future step because of the limited resources of real data. Despite this limitation, we believe that the rare-behavior scenario represents a meaningful first test case because it highlights the potential of simulation-based learning in cases where data are scarce.

To improve the generalizability of the “real-world” data, the authors recommend future plans to expand the subject pool for the experiments, including experts who can identify and stage realistic hostile scenarios and investigating the possibility of incorporating data from real-world aggression events captured on CCTV systems, subject to ethical and legal considerations.

In addition, future work includes an improvement of the simulation environment by calibrating other physical parameters or considering other learning inputs that can affect the learning performance using human trajectory data. It could include friction factors, acceleration, deceleration, allowable distances between pedestrians, and collision-avoiding behaviors.

7 Conclusion

We proposed a novel guideline with methods to follow towards effective simulation-based trajectory learning for successful prediction of rarely occurring behaviors. The research study findings are validated through a real-case study in a public space. It includes effective representation of human trajectories before being imported into the learning process; consideration of proper data size; and probing physical parameters in the simulation environment (positional and velocity noises), which helped to reduce the reality gap and achieve effective performance. Finally, we show an extra enhancement in performance by fine-tuning the simulation-trained models using most of the available real data, with caution advised when real data availability is limited. Examples of test results from our models can be found online here.

References

Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, Savarese S (2016) Social lstm: human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971
Lee N, Choi W, Vernaza P, Choy CB, Torr PH, Chandraker M (2017) Desire: distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 336–345
Yang F and Peters C (2019) App-lstm: data-driven generation of socially acceptable trajectories for approaching small groups of agents. In Proceedings of the 7th International Conference on Human-Agent Interaction, pp 144–152
Lazaridis L, Dimou A, Daras P (2018) Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In 2018 26th European Signal Processing Conference (EUSIPCO), IEEE, pp 2060–2064
Quan R, Zhu L, Wu Y, Yang Y (2021) Holistic lstm for pedestrian trajectory prediction. IEEE Trans On Image Process 30:3229–3239
Article Google Scholar
Zamboni S, Kefato ZT, Girdzijauskas S, Norén C, Col LD (2022) Pedestrian trajectory prediction with convolutional neural networks. Pattern Recognit 121(108252)
Korbmacher R, Tordeux A (2022) Review of pedestrian trajectory prediction methods: comparing deep learning and knowledge-based approaches. IEEE Trans Intell Transp Syst On Intelligent Transportation Systems 23(12):24126–24144
Article Google Scholar
Gil Ó, Garrell A, Sanfeliu A (2021) Social robot navigation tasks: combining machine learning techniques and social force model. Sensors 21(21):7087
Article Google Scholar
Graser A, Jalali A, Lampert J, Weißenfeld A, Janowicz K (2024) Mobilitydl: a review of deep learning from trajectory data. GeoInformatica 1–33
Pei L, Wang W, Wang Y, Zhang Y, Mingliang X, Changsheng X (2023) Ssagcn: social soft attention graph convolution network for pedestrian trajectory prediction. In IEEE transactions on neural networks and learning systems
Shi H, Weiwei Y, Madani K (2024) A review on social awareness navigation for service robots. In International Conference on Social Robotics, Springer, pp 143–152
Lee S, Park H, You Y, Yong S, Moon I-Y (2023) Deep learning-based multimodal trajectory prediction with traffic light. Appl Sci 13(22):12339
Article Google Scholar
Chib PS, Singh P (2024) Lg-traj: Llm guided pedestrian trajectory prediction. arXiv preprint arXiv:2403.08032.
Samarakoon SBP, Muthugala MVJ, Jayasekara ABP (2022) A review on human–robot proxemics. Electronics 11(16):2490
Article Google Scholar
Camara F, Fox C (2021) Space invaders: pedestrian proxemic utility functions and trust zones for autonomous vehicle interactions. Int J Soc Robot 13(8):1929–1949
Article Google Scholar
Millán-Arias C, Fernandes B, Cruz F (2023) Proxemic behavior in navigation tasks using reinforcement learning. Neural Comput Appl 35(23):16723–16738
Article Google Scholar
Liu L, Liu Y, Gao X-Z (2021) Impacts of human robot proxemics on human concentration-training games with humanoid robots. In Healthcare 9:894. MDPI
Article Google Scholar
Bilen B, Kivrak H, Uluer P, Kose H (2024) Social robot navigation with adaptive proxemics based on emotions. arXiv preprint arXiv:2401.17663
Liang J, Jiang L, Hauptmann A (2020) Simaug: learning robust representations from simulation for trajectory prediction. In European Conference on Computer Vision, Springer, pp 275–292
de Souza12 CR, Gaidon A, Cabon Y, López AM (2017) Procedural generation of videos to train deep action recognition networks
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340–4349
Qiu W, Zhong F, Zhang Y, Qiao S, Xiao Z, Kim TS, Wang Y (2017) Unrealcv: virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on Multimedia, pp 1221–1224
Zhang Y, Wei X, Qiu W, Xiao Z, Hager GD, Yuille A (2019) Rsa: randomized simulation as augmentation for robust human action recognition. arXiv preprint arXiv:1912.01180
Zeng X, Liu C, Wang Y-S, Qiu W, Xie L, Tai Y-W, Tang C-K, Yuille AL (2019) Adversarial attacks beyond the image space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4302–4311
Liang J, Jiang L, Murphy K, Ting Y, Hauptmann A (2020) The garden of forking paths: towards multi-future trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10508–10518
Jelonek K, Kunde S, Simms N, Uriarte G, Duncan B (2024) Comparison of human-drone distancing studies across in-person and online modalities. ACM Trans on Hum Rob Interact 14(1):1–19
Google Scholar
Millán-Arias C, Fernandes B, Cruz F (2021) Learning proxemic behavior using reinforcement learning with cognitive agents. arXiv preprint arXiv:2108.03730
Yue J, Manocha D, Wang H (2022) Human trajectory prediction via neural social physics. In European conference on computer vision, Springer, pp 376–394
Hang S, Wen Q, Chen J, Yang C, Sandoval J, Laribi MA (2023) Recent advancements in multimodal human–robot interaction. Front Neurorobot 17(1084000)
Gupta V, Sharma S, Tyagi S (2024) Adaptive multi-modal deep learning framework for proactive crime detection and behavioral analysis in smart city surveillance networks. In 2024 IEEE 8th International Conference on Information and Communication Technology (CICT), IEEE, pp 1–6
Singh T (2023) Ai-driven surveillance technologies and human rights: balancing security and privacy. In International Conference on Smart Systems: Innovations in Computing, Springer, pp 703–717
Kanghui D, Kaczmarek T, Brščić D, Kanda T (2020) Recognition of rare low-moral actions using depth data. Sensors 20(10):2758
Article Google Scholar
Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E E 51(5):4282
Article Google Scholar
Shehata HM, Nam D (2023) Shunl Inaoka, and Trung Tran Quang. Detection of rarely occurring behaviors based on human trajectories and their associated physical parameters. In International Conference on Social Robotics, Springer, pp 276–293
Zanlungo F, Ikeda T, Kanda T (2012) A microscopic “social norm” model to obtain realistic macroscopic velocity and density pedestrian distributions. PLoS One 7(12):e50720
Article Google Scholar
Zanlungo F, Ikeda T, Kanda T (2011) Social force model with explicit collision prediction. EPL (Europhysics Letters), 93(6):68005

Download references

Suspicious Approaching Trajectory Detection Model_Test_Results Trajectory-based point cloud data with_labels Model_1 Model_2