Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives (original) (raw)

Chen Gao Xiaochong Lan Nian Li Yuan Yuan Jingtao Ding Zhilun Zhou
Fengli Xu Yong Li
Tsinghua University, Beijing, China
{chgao96, fenglixu, liyong07}@tsinghua.edu.cn

Abstract

Agent-based modeling and simulation has evolved as a powerful tool for modeling complex systems, offering insights into emergent behaviors and interactions among diverse agents. Integrating large language models into agent-based modeling and simulation presents a promising avenue for enhancing simulation capabilities. This paper surveys the landscape of utilizing large language models in agent-based modeling and simulation, examining their challenges and promising future directions. In this survey, since this is an interdisciplinary field, we first introduce the background of agent-based modeling and simulation and large language model-empowered agents. We then discuss the motivation for applying large language models to agent-based simulation and systematically analyze the challenges in environment perception, human alignment, action generation, and evaluation. Most importantly, we provide a comprehensive overview of the recent works of large language model-empowered agent-based modeling and simulation in multiple scenarios, which can be divided into four domains: cyber, physical, social, and hybrid, covering simulation of both real-world and virtual environments. Finally, since this area is new and quickly evolving, we discuss the open problems and promising future directions.

1 Introduction

Simulation, as a computational tool, encompasses the emulation of real-world processes or systems by employing mathematical formulas, algorithms, or computer-generated representations to imitate their behaviors or characteristics. Agent-based modeling and simulation focuses on modeling complex systems by simulating individual agents and their interactions within an environment macal2005tutorial . It operates by assigning specific behaviors, attributes, and decision-making capabilities to these agents, enabling the examination of emergent phenomena resulting from agents’ interactions and environment dynamics. The significance of simulation spans various domains, serving as a valuable tool for understanding, analyzing, and predicting intricate phenomena that might be impractical or impossible to observe directly in real life. It facilitates experimentation, hypothesis testing, and scenario analysis, offering insights into systems’ behaviors under diverse conditions and aiding in decision-making processes across fields like economics, biology, sociology, and ecology. The capacity to acquire and use language is a key aspect that distinguishes humans from other beings hauser2002faculty . The advent of Large Language Models (LLMs) represents a recent milestone in machine learning, showcasing immense capabilities in natural language processing tasks and textual generation zhao2023survey . Leveraging their formidable abilities, LLMs have shown promise in enhancing agent-based simulations by enabling more nuanced and realistic representations of agents’ decision-making processes, communication, and adaptation within simulated environments. Integrating LLMs into agent-based modeling and simulation holds the potential to enrich the fidelity and complexity of simulations, potentially yielding deeper insights into system behaviors and emergent phenomena for the following reasons:

First, the LLM agent can adaptively react and perform tasks based on the environment without predefined explicit instructions autogpt ; babyagi . Second, the LLM agent has strong intelligence to respond like a human and even actively take actions with self-oriented planning and scheduling wang2023survey ; xi2023rise . The action space of the LLM agent is neither limited to texts, for which the tool usage and internal action module allow the agent to take various actions schick2023toolformer . Last, the LLM agent can interact and communicate with humans or other AI agents park2023generative . With the above three strengths, LLM agents have been embraced for usage in a wide array of areas park2022social ; li2023you ; li2023quantifying ; park2023generative ; kovavc2023socialai ; lin2023agentsims ; kovavc2023socialai ; gao2023s ; jinxin2023cgmi ; boiko2023emergent ; bran2023chemcrow . From this perspective, it is clear that LLM agents can serve as a new paradigm for simulation, bestowing agents with human-level intelligence.

As a result of the massive potential of LLM agents, there has recently been a boom in research efforts in this area. However, as yet, there is no survey that systematically summarizes the relevant works, discusses the unresolved issues, and provides a glimpse into important research directions. In this survey, we analyze why large language models are essential in the fundamental problem of simulation, especially for agent-based simulation. After discussing how to design agents in this new paradigm, we carefully and extensively discuss and introduce the existing works in various areas, most of which have been published recently. The contribution of this survey can be summarized as follows.

2 Background

In this section, we will first introduce the background of agent-based modeling and simulation, and large language models-empowered agents.

2.1 Agent-based Simulation

2.1.1 Basic concepts of agent-based simulation

Agent-based simulation captures the intricate dynamics inherent in complex systems by concentrating on individual entities referred to as agents macal2005tutorial . These agents are heterogeneous, with specific characteristics and states, and adaptively behave according to context and environment, , making decisions and taking actions elsenbroich2014agent . The environment, whether static or evolving, introduces conditions, instigates competition, defines boundaries, and occasionally supplies resources influencing agent behaviors cipi2011simulation . The interaction includes interactions with both the environment and other agents, and the goal is to mirror the behaviors in reality based on predefined or adaptive rules elliott2002exploring ; macal2005tutorial . To summarize, the basic components of agent-based simulation include:

Agents. Agents are the fundamental entities in an agent-based simulation. They represent individuals, entities, or elements in the system being modeled. Each agent has its own set of attributes, behaviors, and decision-making processes.

Environment. The environment is the space in which agents operate and interact. It includes the physical space, as well as any external factors, e.g., weather conditions, economic changes, political shifts, and natural disasters, that influence agent behavior. Agents may be constrained or influenced by the environment, and their interactions can have effects on the environment itself.

Interaction. Agents interact with each other and their environment through predefined mechanisms. Interactions can be direct (agent-to-agent) or indirect (agent-to-environment or environment-to-agent).

With the above components, agent-based modeling and simulation provide a bottom-up perspective to study the macro-level phenomenons and dynamics from the individual interactions.

2.1.2 Agent capability

To achieve realistic simulation in a wide range of application domains, agents should have the following capabilities in terms of perception, decision and action wooldridge1995intelligent :

Autonomy. Agents should be able to operate without the direct intervention of humans or others, which is important in real-world applications such as microscopic traffic flow simulation lopez2018microscopic and pedestrian movement simulation batty2003agent .

Social ability. Agents should be able to interact with other agents (and possibly humans) to complete the assigned goals. When studying social phenomena, group behavior, or social structures, the sociability of agents is key. This includes simulating the formation of social networks, the dynamics of opinions, the spread of culture, and more. The social interactions between agents can be either cooperative or competitive, which are critical when simulating economic activities such as market behavior, consumer decisions, etc.

Reactivity. Agents should be able to perceive their environment and respond quickly to changes in the environment. This capability is especially important in systems that need to simulate real-time responses, such as traffic control systems and automated production lines, and in disaster response scenarios where agents need to be able to respond to environmental changes immediately to effectively conduct early warning and evacuation. More importantly, agents should be able to learn from previous experience and adaptively improve their responses, similar to the idea of reinforcement learning lin1992self .

Pro-activeness. Agents should be able to exhibit goal-directed behavior by taking the initiative instead of just responding to their environment. For example, agents need to proactively provide help, advice, and information in applications such as intelligent assistants and actively explore their environment, plan paths, and perform tasks in fields such as autonomous robots and self-driving cars.

It is worth mentioning that, like humans, agents cannot make perfectly rational choices due to limitations of knowledge and computational capacity simon1997models . Instead, they can make suboptimal yet acceptable decisions based on imperfect information. This capability is particularly critical in achieving human-like simulations in the economic market arthur1991designing and management organizations puranam2015modelling . For example, considering agents’ bounded rationality when simulating consumer behavior, market transactions, and business decisions can more accurately reflect real economic activities. In addition, in simulating decision-making, teamwork, and leadership within organizations, bounded rationality helps reveal behavioral dynamics in real work settings.

2.1.3 Applications of agent-based modeling and simulation

The flexibility of agent-based modeling and simulation allows for the exploration of diverse scenarios and the study of emergent phenomena in a controlled simulation environment. Therefore, it offers researchers and practitioners a versatile tool for understanding and predicting the behavior of complex systems across various domains.

Based on the four categories of the target systems, current applications of agent-based simulation can be divided into four domains:

Physical domain. This category refers to the natural system in the physical environment an2012modeling . Typical applications include ecology and biology zhang2020overview ; pereira2004agent ., such as modeling ecological systems heckbert2010agent ; lippe2019using , species interactions mclane2011role , and the impact of environmental changes pertoldi2004impact ; beltran2017agent . Many simulation problems in urban environments also belong to the physical domain an2012modeling , such as transportation, human mobility, etc. Specifically, for urban planning gaube2013impact , agent-based modeling and simulation can aid in simulating urban growth arsanjani2013spatiotemporal ; barros2004urban , traffic patterns mastio2018distributed ; de2019mesoscopic , and the impact of urban policies maggi2016understanding ; widener2013agent ; ma2013agent . Another application is engineering and manufacturing barbosa2011simulation ; rolon2012agent ., in which agent-based molding and simulation can be applied to model supply chain dynamics schieritz2003emergent , production processes parv2019agent , and the interactions of entities within manufacturing systems.

Social domain. The social domain mainly covers the social behavior simulation, which can be further divided into 1) social interaction that focuses on social networks, community interactions, or organizational behavior macy2002factors ; wall2016agent , and 2) economic system that simulates economic systems, market dynamics, or financial interactions samanidou2007agent . Specifically, for social sciences conte2014agent ; gilbert2007computational ; gilbert2000build ; terna1998simulation , agent-based modeling and simulation is widely used to model social phenomena such as crowd behavior luo2008agent ; kountouriotis2014agent , opinion dynamics li2020opinion ; banisch2012agent , and social network interactions madey2003agent ; el2012social ; gilbert2004agent . The agent-based modeling can simulate the emergence of societal patterns and trends helbing2012social . As for the research of economics leombruni2005economists ; van2008agent ; hamill2015agent , agent-based models are employed to study economic systems deguchi2011economics , market dynamics rouchier2017agent ; wang2018agent , and the behavior of individual economic agents mueller2016economic .

Cyber domain. Besides the physical world and human society, our daily life has been further extended into cyberspace. Therefore, agent-based simulation has also been applied in wide areas like web-based behaviors guyot2006agent and cyber-security applications alluhaybi2019survey .

Hybrid domain. This category includes hybrid systems combining components covering the physical world, social life, and cyberspace. For example, an urban environment is a socio-physical environment that integrates social behavior with physical infrastructure. Moreover, it is also multi-layered after taking online social networks into account. That is, these applications involve more than one domain of physical, social, or cyber domains. Therefore, agent-based simulations within an urban environment, such as urban planning chen2012agent and epidemic control silva2020covid , are far more complex and challenging than those in unitary environments. Moreover, for healthcare cabrera2011optimization ; barnes2013applications , agent-based modeling and simulation can be used to model the spread of infectious diseases perez2009agent , healthcare systems silverman2015systems , and the effectiveness of interventions beheshti2017comparing , which help in understanding and planning for public health scenarios.

2.1.4 Methodologies of agent-based modeling and simulation

The development of modeling technologies utilized in agent-based simulation has also gone through the early stage of knowledge-driven approaches and the recent stage of data-driven approaches. Specifically, the former includes various approaches based on predefined rules or symbolic equations, and the latter includes stochastic models and machine learning models.

2.1.5 Limitations

Early works on agent-based simulation are keen to design “deliberative architectures” that rely on explicit, often complex, internal models to make decisions, emphasizing the importance of planning, reasoning, and decision-making processes wooldridge1995intelligent . However, optimizing the internal world model and planning-reasoning module based on symbolical AI approaches are generally intractable in practice. This leads to the prevalence of “reactive architectures” in agent-based simulations, which instead rely primarily on direct sense-action loops rather than complex internal models of the world or deep reasoning processes to make decisions. The subsequent development of AI, especially deep learning technology, does not fundamentally change this paradigm of agent-based simulation due to the poor interpretability and generalization capability. However, facing the need for realistic simulation of real-world processes or systems, current approaches still have several limitations, as described below.

Simple agent architecture is not enough to cope with complex tasks. Although “reactive architectures” are able to adapt to different environmental conditions, they may be limited in handling complex tasks or situations that require long-term planning. To achieve human-like simulation in real-world complex problems, current agent architecture requires redesigns that solve challenges in processing speed, resource efficiency and task complexity. Specifically, agents should be capable of complex planning and reasoning processes, like using internal models to predict the consequences of different courses of action and choose the best one, and able to develop and execute complex strategies to achieve long-term goals.

It is difficult to develop a general agent that can support simulations across environments. Different environments vary in dimensions like complexity, dynamics, and uncertainty. Due to this diversity, a specific agent that is effective in one environment (like a financial market simulation) might be completely ineffective in another (like a social campaign simulation). In real-world applications where the target environment is often hybrid with significant dynamics and uncertainty, developing specific agents case by case is highly inefficient and costly.

Existing methods cannot support integrative simulation in real-world problems. A versatile agent-based simulation model should be able to describe how systems operate under known conditions, explain why certain patterns emerge, predict future states based on existing observations, and explore the outcomes of hypothetical scenarios. However, existing methods cannot support the above tasks simultaneously: rule-based methods are useful in descriptive problems, while symbolic or stochastic methods can provide explanations regarding underlying mechanisms that drive the system. Comparatively, machine learning models are better at predictive problems by learning hidden patterns from data but with less interpretability. Therefore, there remain challenges in developing methods that simultaneously capture the accuracy of behavioral modeling, interpretability of mechanisms, adaptability, and reliability under environmental changes.

2.2 Large language models and LLM-empowered agents

Large language models (LLMs), such as ChatGPT chatgpt , Gemini gemini , LLaMA touvron2023llama , Alpaca alpaca , and GLM zeng2023glm , are the latest paradigm of language models, which evolve from early statistical language models bellegarda2004statistical to neural language models melis2017state , then to pre-trained language models brown2020language , and finally to large language models zhao2023survey . With billions of parameters and extensive pre-training corpus, LLMs have shown astonishing abilities not only in natural language processing tasks li2023seed ; zhang2023benchmarking such as text generation, summarization, translation, etc., but also in complex reasoning and planning tasks, such as solving mathematical problems arora2023have , etc. Pre-training on large-scale corpora lays the foundation ability for zero-shot generalization. Moreover, pre-trained models can be further fine-tuned for specific tasks, adapting to particular application scenarios jiang2023health . In addition, the advances of large language models in the past year such as ChatGPT and GPT-4 have achieved human-like reasoning ability, a milestone that is now being considered to be the seed of artificial general intelligence (AGI). Specifically, the capacity to acquire and use language is a key aspect of how we, humans, distinguish ourselves from other beings tomasello2010origins . Language is one of the most important mechanisms we have to interact with the environment, and language provides the basis for high-level abilities hauser2002faculty .

Thus, it is promising to construct large language model-empowered agents wang2023survey ; xi2023rise due to their human-like intelligence in perceiving the environment and making decisions. First, the LLM agent is able to adaptively react and perform tasks based on the environment without predefined explicit instructions autogpt ; babyagi . In addition, during the simulation process, the LLM agent can even form new ideas, solutions, goals, etc, franceschelli2023creativity . For example, AutoGPT autogpt can automatically schedule plans when given a set of available tools and the final task goal, exemplifying the significant potential of LLMs in constructing autonomous agents. Meanwhile, BabyAGI babyagi created an LLM-driven script running an infinite loop, which continuously maintains a task list, in which each task is completed the task by ChatGPT API chatgpt based on the task context. Second, the LLM agent has enough intelligence that it can respond like a human and even actively take actions with self-oriented planning and scheduling wang2023survey ; xi2023rise . The environment input is not limited to text; rather, recent multi-modal fusion models can be fed other types of information, such as image or audio zhu2023minigpt . The action space of the LLM agent is neither limited to text, for which the tool-usage ability allows the agent to take more actions schick2023toolformer . Lastly, the LLM agent has the ability to interact and communicate with humans or other AI agents park2023generative . In the simulation, especially agent-based simulation, the agent’s communication ability elevates individual simulation to the community level gilbert2005simulation . An LLM-driven agent can generate text, which can be received and understood by another agent, in turn providing the basis for interpretable communication among agents or between humans and agents park2023generative . Moreover, the simulation at the community level requires heterogeneity of agents, and the LLM agents can meet these requirements for playing different roles in society qian2023communicative . An artificial society constructed by LLM agents can further reveal the emergence of swarm intelligence with collective agent behaviors gao2023s ; park2023generative , similar to wisdom-of-crowds in human society surowiecki2005wisdom .

As mentioned above, the simulation system has widely utilized the paradigm of agent-based modeling, which requires agents with high-level abilities, well motivating the use of large language model-empowered agents in simulation scenarios.

3 Critical abilities of LLM for agent-based modeling and simulation

As mentioned above, agent-based modeling and simulation serve as a basic approach for simulation in many areas macal2005tutorial ; elsenbroich2014agent , but it still suffers from several key challenges. Large language model-empowered agents not only meet the requirements for agent-based simulation but also address the limitations relying on their strong abilities in perception, reasoning, decision-making, and self-evolution, illustrated in Figure 1.

Refer to caption

Figure 1: Illustration of how large language model agents meet the requirements of agent-based modeling and simulation.

3.1 Perception

The core of agent-based modeling and simulation is to model how an individual agent interacts with an environment macal2005tutorial , which requires the agent to accurately sense various types of information from said environment. As for the large language model-empowered agents, the ability of language enables agents to comprehend and respond to diverse environments directly or indirectly. On the one hand, the basic ability to understand and generate text enable agents to engage in complex dialogues, negotiate, and exchange information, and support direct interaction. On the other hand, the interface between the agent and environment can be operated via texts xagent2023 , which leads to indirect interaction. Of course, such ability also supports the communication between different agents, besides the agent-environment perspective.

It is worth mentioning that the ability to interact with the environment and other agents is not adequate to achieve human-like simulations. To be more specific, it is also required that large language model-based agents “put themselves in real humans’ shoes", thereby allowing the agent imagine that it is indeed in the environment. That is, LLM agents should be able to comprehend, perceive, and respond to diverse needs, emotions, and attitudes within different contexts, from the “first-view sight” shanahan2023role . This capability enables models to better understand the information from the environment or other agents and generate more real responses.

3.2 Reasoning and decision making

One critical challenge in traditional agent-based simulation is that rule-based or even neural network-based agent is not intelligent enough cipi2011simulation . That is, the agent is not able to make correct or optimal decisions, such as choosing a crowded road in transportation simulation or sending an incorrect message in social network simulation. This can be explained by the fact that the traditional neural network-based artificial intelligence is still not as intelligent as a real human hoshen2017iq ; liu2019well ; mandziuk2019deepiq ; hernandez2016computer . In contrast, large language model-empowered agents exhibit heightened reasoning capabilities, enabling them to make more informed decisions and choose suitable actions within the simulation. Despite making suitable decisions, another critical advantage of large language model-empowered agents to support better agent-based modeling and simulation is autonomy fu2023drive . With only limited guidance, regulations, and goals, agents equipped with large language models can autonomously take actions, make plans for the given goal, or even achieve new goals without the need for explicit programming or predefined rules park2023generative . That is, autonomy enables LLM agents to dynamically adjust their actions and strategies based on real circumstances, contributing to the realism of the simulation.

3.3 Adaptive learning and evolution

For agent-based modeling and simulation, the system always has uncertainty and uncontrollability macal2005tutorial . In other words, the environment and the agent’s state may be completely different compared with the initial stage of the simulation. As the old story of Rip Van Winkle tells, a man falls asleep in the mountains and awakens to find that the world around him has drastically changed during his slumber. That is, the environment is continuously changing in a long-term social network simulation gao2023s ; the agent should be able to adapt to the new environment, formulating decision policies that may deviate significantly from their original strategies. Obviously, adaptive learning and evolution are challenging for traditional approaches, but luckily, this can be addressed by large language model-based agents lu2023self . Specifically, with the ability to continually learn from new data and adapt to changing contexts, LLM agents can evolve behaviors and decision-making strategies over time. Agents can assimilate new information, analyze emerging patterns in data, and modify their responses or actions accordingly based on in-context learning dong2022survey , mirroring the dynamic nature of real-world entities. This adaptability contributes to the simulation’s realism by simulating the learning curve and evolution of agents’ behaviors in response to varying stimuli.

3.4 Heterogeneity and personalizing

As the saying goes, one man’s meat is another man’s poison. Heterogeneity of agents is critical for agent-based simulation, with the complex society brown2006effects or economic system bohlmann2010effects with heterogeneous individuals. Specifically, in agent-based modeling and simulations, the heterogeneity of agents involves representing diverse characteristics, behaviors, and decision-making processes among individuals. Agent-based simulation stands out for its capacity to accommodate varied rules or parameters compared to traditional simulation methods, discussed as follows.

The first one is the extremely high complexity of parameter settings of the existing methods elliott2002exploring ; macal2005tutorial . In these models, the vast array of variables influencing an agent’s behavior—from personal traits to environmental factors—makes selecting and calibrating these parameters daunting. This complexity often leads to oversimplification, compromising the simulation’s accuracy in portraying true heterogeneity macal2005tutorial . Moreover, acquiring accurate and comprehensive data to inform parameter selection is another challenge. That is, real-world data capturing diverse individual behaviors across various contexts might be limited or challenging to collect. Furthermore, validating the chosen parameters against real-world observations to ensure their reliability adds another layer of complexity.

Second, the rule or the model cannot cover all dimensions of heterogeneity, as real-world individuals are very complex macal2005tutorial . Using rules to drive agent behaviors only captures certain aspects of heterogeneity but could lack the depth to encapsulate the full spectrum of diverse behaviors, preferences, and decision-making processes. Furthermore, as the model capacity, trying to cover all dimensions of heterogeneity within a single model is too idealistic. Thus, balancing model simplicity and accurate modeling agents becomes a critical challenge in agent-based modeling and simulation, resulting in oversimplification or neglect of certain aspects of agent heterogeneity.

Different from the traditional methods, the LLM-based agents support 1) capturing complex internal characteristics with internal human-like cognitive complexity, and 2) specialized and customized characteristics with prompting, in-context learning, or fine-tuning.

4 Challenges and approaches of LLM agent-based modeling and simulation

The core of agent-based modeling and simulation is how the agent reacts to the environment and how agents interact with each other, in which agents should behave close to real-world individuals with human knowledge and rules, as real as possible. Therefore, when constructing large language model-empowered agents for simulation, there are four major challenges, including perceiving the environment, aligning with human knowledge and rules, choosing suitable actions, and evaluating the simulation. We will discuss the solutions from a high-level perspective, and how the existing works address them will be elaborated on in detail in the next section.

4.1 Environment construction and interface

For agent-based simulation with large language models, the first step is to construct the environment, virtual or real, and then design how the agent interacts with the environment and other agents. Thus, we need to propose proper methods for an environment that LLM can perceive and interact with.

4.1.1 Environment: define the world and rules

The external environment in agent-based simulation varies for different domains. In general, the environment built by existing works can be divided into two categories: virtual and real.

4.1.2 Interface

The interface actually has two aspects, how the agent interacts with the environment and how agents communicate with each other.

4.2 Human alignment and personalization

Although LLMs have already demonstrated remarkable human-like characteristics in many aspects, agents based on LLMs still lack the necessary domain knowledge in specific areas, leading to irrational decisions. Therefore, aligning LLM agents with human knowledge and values, especially those of domain experts, is an essential challenge to achieve more realistic domain simulations. However, the heterogeneity of agents, as a fundamental characteristic of ABM, is both an advantage and a challenge for traditional models. While, LLMs possess a powerful capability to simulate heterogeneous agents, ensuring controllable heterogeneity. However, enabling LLMs to play different roles to meet personalized simulation requirements is a significant challenge. Next, we will explain the methods and technologies to address these two challenges from two perspectives: prompt engineering and tuning, and introduce the existing related work in these areas.

4.2.1 Human alignment

Prompt engineering. When simulating specific agents, we can provide task instructions, background knowledge, generation patterns, and task examples specific to certain domains or scenarios, thereby aligning LLMs’ output with human knowledge and values when deployed. For example, providing detailed descriptions of game rules and examples for the agent allows it to consider various factors it cares about, like humans when making decisions, such as self-interests, fairness, etc akata2023playing . In addition, constructing modules such as reflection and memory can improve agents’ planning and reasoning capabilities, thereby giving them stronger gaming capabilities and creating a possible path towards human-intelligent gaming guo2023suspicion .

Tuning. Tuning requires constructing a training dataset for specific domains, scenarios, or hiring domain experts. Based on the dataset or expert feedback, fine-tuning the LLM can also empower the agents with more domain-specific knowledge, producing outputs more in line with human knowledge and values. For example, Singhal et al. singhal2023large propose to achieve knowledge alignment in clinical medicine. The proposed MultiMedQA benchmark combines six existing medical question-and-answer datasets covering professional medicine, research, and consumer inquiries. Additionally, Med-PaLM singhal2023large , a LLM for the medical field, is trained based on a foundational model PaLM chowdhery2023palm . In terms of implementation, the authors incorporate examples of medical question-and-answer and modify model prompts through the guidance of professional clinicians (involving five clinical doctors) for fine-tuning. This guides the model to generate text consistent with clinical requirements. With this domain-specific LLM, we can simulate agents (e.g., medical assistants) in real-world medical environments. In addition to collecting large-scale datasets with domain knowledge, other research dubois2023alpacafarm directly uses LLMs to generate “human feedback”, specifically pair-wise feedback for instructions, for LLM fine-tuning. Results show that the generated feedback enables LLM to achieve high human alignment 45×\times× cheaper than hiring crowd workers to give feedback in experiments.

4.2.2 Personalization

Prompt engineering. The basic idea is to adapt to personalized needs by providing LLM agents with individual preferences, expected output patterns, background knowledge, etc., thereby making the output closer to the specific needs or preferences of individuals when deployed. For example, in the well-known LLM-based social activity simulation, AI Town park2023generative , personalized interaction behaviors of agents in different scenarios, at different times, and with different other agents can be achieved by introducing professions, behavioral preferences, and interpersonal relationships in the prompts. In economic simulation, specifically simulation of canonical games, the agent’s preferences can be specified in the prompt, such as cooperative, selfish, altruistic, etc., so that the agent will have different levels of cooperative tendencies during the game playing phelps2023investigating .

Tuning. Tuning for personalization requires selectively constructing datasets or fine-tuning multiple models based on feedback from different users, with each model corresponding to one or a type of personalized needs. This can also be achieved by using specific combinations to provide relevant, personalized requirements. Some research attempts to efficiently align LLMs with various preferences tailored to different users’ distinct preferences jang2023personalized . Specifically, user preferences are decomposed into standards across multiple aspects, with personalized optimization based on RLHF targeted towards different aspects. In practical applications, the strategy of LLM response generation is based on linearly weighting strategies according to user preferences. When simulating agents with individual preferences (e.g., users in recommender systems), this approach achieves a more accurate match for different preferences and is also easily generalizable to scenarios with a broader range of preferences.

4.3 How to simulate actions

This section aims to delve into how LLM agents are designed to exhibit complex behaviors that are reflective of real-world cognitive processes. This involves understanding and implementing the mechanisms by which these artificial agents can retain and utilize past experiences (memory) park2023generative ; gao2023s ; zhu2023ghost , introspect and adjust their behavior based on their outcomes (reflection) park2023generative ; shinn2023reflexion , and execute a sequence of interconnected tasks that mimic human workflows (planning) wei2022chain .

4.3.1 Planning

Here, we introduce the methodology by which LLM agents approach complex tasks through decomposition. Initially, an LLM assesses the task to understand its main objectives and context. It then breaks down the task into smaller, manageable subtasks, each contributing towards the overall goal. This segmentation leverages the LLM’s training corpus to recognize patterns and apply relevant knowledge efficiently zhu2023ghost ; wang2023voyager ; sun2023adaplanner ; park2023generative .

Each subtask is executed sequentially, with the LLM agent applying its knowledge base to ensure logical progression and coherence. This approach not only simplifies complex tasks but also enhances the LLM’s accuracy and adaptability. By tackling tasks incrementally, the LLM agent can adapt its strategies and ensure that each step is contextually relevant and logically structured. For example, GITM zhu2023ghost showcases an LLM agent that decomposes the overarching goal of “Mining Diamond" into a series of sub-goals, constructing a sub-goal tree. This model uses its text-based knowledge and memory to navigate in a virtual environment, making strategic decisions at each tree node to achieve the main objective. Voyager wang2023voyager employs an automatic curriculum to aid the LLM agent in understanding the sequence of actions required to reach a goal. By reasoning with the available resources, the LLM agent can plan an efficient course of action, like upgrading tools for better efficacy and demonstrating adaptive problem-solving skills. AdaPlanner sun2023adaplanner introduces an LLM that refines its action plan based on feedback, which has an in-plan refiner for aligning actions with predictions and an out-of-plan refiner to adjust when predictions don’t match outcomes, showcasing the model’s ability to adapt and revise its plan dynamically in response to changing scenarios.

In summary, advancements represent significant strides in task decomposition and strategic planning. They highlight the capability of LLMs to not only break down complex tasks into manageable sub-goals but also to dynamically adapt their strategies and refine plans based on ongoing feedback and changing scenarios, thereby enhancing decision-making and problem-solving efficiency in various contexts.

4.3.2 Memory

Human behavior is largely influenced by past experiences and insights, which are stored in memory. If LLM agents aim to mimic this aspect of human behavior, they also need to reference past experiences and insights when acting. However, the volume of this information is often immense, frequently exceeding the context window length of LLMs. Therefore, it’s necessary to design a memory system that functions as an external database for LLM agents. This system should have appropriate mechanisms for organizing, updating, and retrieving information, enabling LLM agents to reference these memories for future actions.

Generative Agents park2023generative showcases an LLM agent that develops a generative memory system, integrating sensory perceptions with a continuous stream of experiences. This system not only stores information but actively engages in planning and reflection, adapting its behavior based on past outcomes. Chen et al. chen2023put illustrate an LLM agent’s strategic prowess in auction scenarios, where it adapts its bidding strategy by synthesizing new information with existing memories to maximize profits or meet specific objectives.

GITM zhu2023ghost and Voyager wang2023voyager are seen curating a skill library, updating its capabilities through practice and feedback. This approach reflects an understanding of task requirements and environmental challenges, where the LLM’s memory serves as a dynamic repository of actions and strategies. The distinction between explicit and implicit memory comes into play in simulations that require the LLM to navigate complex tasks, such as resource management and goal-oriented action planning in open-world environments. Here, the LLM’s memory functions extend beyond simple recall, enabling the agent to perform with a sense of history and progression.

Lastly, the role of memory in social interactions is explored through simulations in S3 gao2023s that mimic the intricacies of human behavior. LLMs track and adapt to changing social cues and demographic shifts, employing memory not just as a record of past interactions but as a tool for future social navigation and decision-making. Li et al. li2023large demonstrate how a memory module in LLMs can be crucial for understanding and adapting to dynamic social environments. They show that LLMs, equipped with a memory of past social interactions and trends, can more effectively predict and respond to future economic changes, enhancing their decision-making in complex social landscapes.

Collectively, these studies contribute to our understanding of LLMs as agents capable of sophisticated memory management, crucial for their function in dynamic and unpredictable environments. They highlight the remarkable potential of LLMs to transcend traditional data storage, moving towards a more integrated and intelligent use of memory in artificial cognition.

4.3.3 Reflection

The section explores how LLM agents incorporate feedback mechanisms to enhance their memory systems, improving decision-making and learning processes. This reflection encompasses both short-term and long-term memory facets, enabling LLMs to adapt their behaviors and strategies dynamically.

An exemplary implementation of this reflective cycle is Reflexion shinn2023reflexion . In this work, the LLM leverages an integrated evaluator to internally assess the efficacy of actions based on the rewards received. It also utilizes a prompt-based approach to self-reflection, allowing the agent to internally simulate and critique its performance. This dual feedback system enables the agent to refine its memory and behavior in a nuanced and continuous learning process. The model captures short-term memory as trajectories of actions and observations, while long-term memory encompasses accumulated experiences. The interaction between these memory types and the reflective loop ensures that the agent’s memory is not only a repository of past events but also a dynamic foundation for future improvement and learning. This system exemplifies how LLMs can evolve from static knowledge bases to dynamic entities capable of self-improvement through iterative reflection and adaptation. In S3 gao2023s , the LLMs’ ability to reflect is intricately tied to their simulation of human social interactions, where they continuously adjust their understanding and responses based on evolving social dynamics and cues. This reflective capacity enables them to navigate complex social environments with greater finesse. In the work of Li et al. li2023large , reflection is leveraged to refine the LLMs’ approach to socio-economic predictions. By reflecting on past interactions and trends, these models can adapt their predictive algorithms, leading to more accurate responses to future social and economic shifts.

In summary, in the realm of simulating actions, LLMs stand out for their ability to integrate planning, memory, and reflection. They employ a cyclical approach where planning dictates the course of action, memory provides a knowledge base derived from past experiences, and reflection adjusts strategies based on feedback. This dynamic interplay allows LLMs to not only execute actions within varied simulations but also to continuously learn and adapt. By simulating these cognitive processes, LLMs demonstrate an advanced capacity for autonomous decision-making, which is increasingly indistinguishable from human-like behavior in complexity and adaptability.

4.4 Evaluation of LLM agents

4.4.1 Realness validation with real human data

The basic evaluation protocol for LLM-based agents is to compare the simulation’s output with existing real-world data. The evaluation can be conducted at two levels: micro-level and macro-level. Specifically, micro-level evaluation refers to evaluating the ability to simulate the individual agent’s behavior or actions as realistically as possible. For example, in S3 gao2023s , the authors test the performance of the LLM agents in predicting the individual agent’s next state, given the current state and the environment context. On the other hand, since the agent-based simulation always pays more attention to the emerged phenomenon of the population, macro-level evaluation is of great significance, which aims to evaluate whether the simulated process has the same pattern, regularity, etc., as the real-world data. In S3 gao2023s , one of the main goals is to accurately predict the dynamics of information, opinion, and attitude based on the collected real-world social media data. In economics simulation li2023large , the simulated economic system is evaluated on whether can emerge those most representative macroeconomic regularities, such as Okun’s law plosser1979potential , etc. Furthermore, the generated behaviors’ rationality can also be evaluated, such as logical consistency, adherence to established common sense, or following the given rule in the simulation environment. In addition, we can assess the simulated agent’s performance against established benchmarks or standardized tasks relevant to its domain. For example, whether the agent can reach human-level evaluation scores in a web-browsing or game environment chang2023survey .

4.4.2 Provide explanations for simulated behaviors

One of the main advantages of the large language model-based agent against the traditional rule-based or neural network-based agent is its strong ability to engage in interactive conversation and textual reasoning. Therefore, to evaluate whether the agent has understood the simulation rules well, accurately perceived the environment, made a choice rationally, etc., we can directly obtain explanations from the large language model-based agent. We can evaluate whether the agent-based simulation is good by analyzing the explanations and comparing them with the human data or a well-established theory or model. For example, in economic simulation li2023large , the authors query the large language model agent about the reason for economic decision-making, which well explains the simulated actions and behaviors.

4.4.3 Ethics evaluation

Besides the simulation accuracy or explainability of the large language model-empowered agent-based simulation, the ethics issue is also of great importance. The first one is bias and fairness, and it is essential to assess the simulation for biases in language, culture, gender, race, or other sensitive attributes to evaluate whether the generated content perpetuates or mitigates societal biases. Another concern is harmful output detection since the output of the generative artificial intelligence is hard to control compared with traditional approaches. Thus, the practitioners of the large language model agent-based simulation should scrutinize the simulation’s output for potentially harmful or inappropriate content, including hate speech, misinformation, or offensive material.

5 Recent advances of LLM agent-based modeling and simulation

Refer to caption

Figure 2: Illustration of LLM agent-based modeling and simulation in different domains.

In the following, we elaborate on the recent advances to use large language models for agent-based modeling and simulation in social, physical, cyber, and hybrid domains. The typical applications in the first three domains are illustrated in Figure 2, and the details are shown in Table 1.

Table 1: A list of representative works of agent-based modeling and simulation with large language models.

5.1 Social domain I: social sciences

This section discusses the application of LLM agent-based modeling and simulation in social sciences. Specifically, the existing works examine and explore LLM agents’ effectiveness in replicating human behaviors and interactions and their role in validating social theories. They focus on how LLM agents can serve as tools for understanding complex social dynamics, enhancing collaborative problem-solving, etc., offering insights into both individual and collective social behaviors.

5.1.1 Simulation of social network dynamics

The part discusses whether LLM Agents, due to their human-like behavior, can be used to recreate and validate established social laws and patterns. This involves an analysis of how closely these agents can mimic human behavior and whether their actions can be quantified to validate or challenge existing theories in social science.

S3S^{3}italic_S start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT gao2023s utilizes LLM-empowered agents to simulate individual and collective behaviors within a social network. This system effectively replicates human behaviors, including emotion, attitude, and interaction behaviors. It leverages real-world social network data to initialize the simulation environment, where information influences users’ emotions and subsequent behaviors. The study particularly focuses on scenarios of gender discrimination and nuclear energy, demonstrating the ability of LLMs to simulate complex social dynamics. The results underscore LLM’s ability to capture real-world social phenomena. Williams et al. williams2023epidemic study whether LLM agents can accurately reproduce the trend of epidemic spread. The results show that the LLM agent-based simulation system can replicate complex phenomena observed in the real world, such as multi-peak patterns.

With Werewolf Game as the environment, Xu et al. xu2023exploring examine LLMs’ capabilities in simulating individual and collective behaviors in such a rule-based social environment. It reveals that LLMs can effectively engage in strategic social interactions, generating behaviors such as trust and confrontation, thus offering insights into their potential for social simulations. Acerbi et al. acerbi2023large demonstrate that the information transmitted by large language models mirrors the biases inherent in human social communication. Specifically, LLM exhibits preferences for stereotype-consistent, negative, socially oriented, and threat-related content, reflecting biases in its training data. The observation underscores that LLMs are not neutral agents; instead, they echo and potentially amplify existing human biases, shaping the information they generate and transmit. Zhang et al. zhang2023exploring studies the impact of collaboration strategies on the performance of LLM agents. Specifically, three agents with distinct personalities (easy-going or overconfident) formed four different societies, employing eight collaboration strategies over three rounds to solve mathematical problems. It means that strategies initiating a debate show better results than those relying solely on memory reflection. That is, it highlights LLM agents’ capability to exhibit human-like social phenomena of conformity and the Wisdom of Crowds effect, where collective intelligence tends to surpass individual capabilities. Kim et al. kim2023ai assesses the boundaries and effectiveness of LLMs in modeling personal actions and societal dynamics, shedding light on their applicability for believable social simulations. Suzuki et al. suzuki2023evolutionary and Zarza et al. de2023emergent construct simulation systems with multiple agents, employing LLM as a generator of social strategy variations to simulate changes in cooperation/selfish strategies among agents in social cooperation and variations in social network structures, among other factors. Park et al. park2022social investigates LLM agents’ capacity to simulate online behaviors within forums. It demonstrates how LLMs can predict user interactions and responses by generating scenarios based on specific forum rules and descriptions. This simulation assists in refining forum regulations, highlighting the potential of LLMs in understanding and shaping digital social environments.

Refer to caption

Figure 3: Taxonomy of LLM-based modeling and simulation in social sciences. Some materials are from S3 gao2023s , ChatDev qian2023communicative , Humanoid Agents wang2023humanoid .

5.1.2 Simulation of cooperation

Some other works pay attention to the human collaboration replicated by LLM agents. Specifically, they focus on how these agents, assigned distinct roles and functions, can mimic the cooperative behaviors observed in real human societies. The mechanisms and cooperative frameworks designed for these agents can enable them to work together efficiently toward achieving goals.

COLA lan2023stance proposed to organize LLM agents to discuss and finally decide the stance of social media text, with three role-played agents: analyzer, debater, and summarizer. The analyzers dissect texts from linguistic, domain-specific, and social media perspectives; the debaters propose logical links between text features and stances; finally, the summarizer considers all these discussions and determines the text’s stance. The framework achieves SOTA performance on stance detection tasks. MAD liang2023encouraging proposed to use LLM agents to engage in reasoning-intensive question answering through structured debates. LLMs adopt the roles of opposing debaters, each arguing for a different perspective on the solution’s correctness. MAD enforces a “tit for tat” debate dynamic, wherein each agent must argue against the other’s viewpoint, leading to a more comprehensive exploration of potential solutions. A judge agent then evaluates these arguments to arrive at a final conclusion. This work fosters divergent thinking and deep contemplation, addressing the degeneration-of-thought issue common in self-reflection methods. CHATDEV qian2023communicative is a virtual software development company where LLM Agents collaborate to develop computer software, with different roles for agents including CEO, CTO, designers, and programmers. The cooperation process encompasses designing, coding, testing, and documenting, with agents engaging in role-specific tasks like brainstorming, code development, GUI designing, and documentation. MetaGPT hong2023metagpt also introduces a novel framework for collaborative software development with LLM agents, simulating a software company. The roles the agents play including Product Manager, Architect, Project Manager, Engineer, and QA Engineer, which follows the standardized operating procedures. Each role contributes sequentially, from requirement analysis and system design to task distribution, coding, and quality assurance, showcasing LLMs’ potential in efficiently mimicking human cooperative behaviors and workflows in complex software development. ChatEval chan2023chateval presents a multi-agent framework for text quality evaluation, employing LLMs as diverse role-playing agents, in which key roles include the public, critics, journalists, philosophers, and scientists, each contributing unique perspectives. The agents engage in sequential debates, with access to all communication history. Finally, a judge gives a final decision. It results in more accurate and human-aligned evaluations compared to single-agent methods. CAMEL li2023camel introduces a cooperative role-playing framework with communicative agents, focusing on tool development. It involves roles like Task Detailing Assistant, Commander, and Executor. Specifically, Task Detailing Assistant specifies tasks in detail, Commander provides step-by-step instructions based on these specifics, and Executor carries out these instructions.

The above efforts involve designing specific types of agents, their roles, and the collaboration framework for certain tasks. The limitation lies in their lack of versatility, as the design of the agents is not flexible or adaptable. To address it, some work focuses on adaptively performing tasks with automated generated LLM agents and cooperation framework. AgentVerse chen2023agentverse simulate human group problem-solving focusing on adaptively generating LLM agents for diverse tasks. It involves four stages: 1) Expert Recruitment, where agent composition is determined and adjusted; 2) Collaborative Decision-Making, where agents plan problem-solving strategies; 3) Action Execution, where agents implement these strategies; 4) Evaluation, assessing progress, and guiding improvements. That is, it can effectively enhance agents’ capabilities across various tasks, from coding to embodied AI, demonstrating their versatility in collaborative problem-solving. Wang et al. wang2023unleashing introduce Solo Performance Prompting (SPP) to emulate human-like cognitive synergy, which transforms a single LLM into a multi-persona agent, enhancing problem-solving in tasks requiring complex reasoning and domain knowledge. For tasks like trivia creative writing and logic grid puzzles, SPP significantly outperforms standard methods, showcasing its effectiveness in collaborative problem-solving. CoELA zhang2023building integrates LLMs’ critical capabilities, including natural language processing, reasoning, and communication, into a novel cognitive-inspired modular framework. The authors evaluate CoELA in various embodied environments like C-WAH and TDW-MAT, demonstrating its proficiency in perceiving, reasoning, communicating, and planning. The results show that CoELA surpasses traditional planning methods and exhibits effective cooperation and communication behaviors.

In conclusion, simulating collaborative behaviors among LLM agents in various frameworks has shown their potential in emulating human cooperative behaviors to tackle a wide range of problem-solving tasks.

5.1.3 Simulation of individual social behavior

In the simulation of social dynamics and cooperative problem-solving, LLM agents show a strong ability to replicate human behavior. However, achieving a closer approximation to real human responses from the individual perspective is also of great significance. In this section, we discuss how the recent works approach the problem how to better simulate the individual human behavior in a social context with LLM agents, enhancing their decision-making processes, interaction patterns, and emotional responses.

Humanoid Agents wang2023humanoid propose a novel approach to enhancing the realism of LLM agent simulations. By incorporating elements of human cognitive processing (System 1 daniel2017thinking ), such as basic needs, emotions, and relational closeness, Humanoid Agents are designed to behave more like humans. These dynamic elements allow agents to adapt their activities and interactions based on their internal states, thereby bridging the gap between simulated and real human behavior. The platform also facilitates immersive visualization and analysis of these behaviors, advancing the field of social simulation and cooperative problem-solving. This approach demonstrates a significant leap in individual agent design, moving closer to replicating the complexities of human decision-making and interaction patterns. SocioDojo liu2023sociodojo is a lifelong learning environment using real-world data for training agents in societal analysis and decision-making. It introduces an innovative Analyst-Assistant-Actuator framework and Hypothesis-Proof prompting, resulting in notable improvements on the time series forecasting task. Liu et al. liu2023training presents a novel approach to optimizing LLM agents by refining agents’ decision-making processes, interaction patterns, and emotional responses through a three-stage alignment learning framework, Stable Alignment. This framework, which efficiently teaches social alignment to LLMs, is based on simulated social interactions, detailed feedback, and progressive refinement of responses by autonomous social agents.

Moreover, some studies use LLM agents to simulate human responses in social science research. Argyle et al. argyle2023out use LLM agents as proxies for specific human populations to generate responses in social science research. The authors show that, conditioned on socio-demographic profiles, LLM agents can generate outputs similar to human counterparts. Hamalainen et al. hamalainen2023evaluating construct LLM agents to simulate real participants to fill in open-ended questionnaires and analyze the similarity between the response and real data. The results show that synthetic responses generated by large language models cannot be easily distinguished from human data. In short, these works indicate that LLM agents can be useful in social science experiments to simulate human responses with much lower costs.

Some researchers have also studied LLM agents’ ability to simulate human behavior in social psychological experiments. Specifically, they singh2023mind ; binz2023using use psychological tests to simulate the human response to test the cognitive ability, emotional intelligence elyoseph2023chatgpt , and psychological well-being li2022gpt of LLMs, demonstrating that LLM agents have human-like intelligence to a certain degree.

5.2 Social Domain II: Economic System

Refer to caption

Figure 4: Taxonomy of LLM-based modeling and simulation in economic systems.

This section discusses another important field in the social domain, the economic system. Currently, LLM-driven economic simulations can be categorized into three types based on the number of agents involved: individual behavior, interactive behavior, and economic system-level simulations. For individual behavior simulations, the primary goal of related research is to simulate the human-like economic decision-making capabilities of LLMs horton2023large ; geerling2023chatgpt ; bauer2023decoding ; chen2023emergence or their understanding of economic phenomena bybee2023surveying ; xie2023wall ; faria2023artificial . This provides an empirical foundation for the latter two types of economic simulations and is currently a more extensively researched area. In interactive behavior simulations, the focus is mainly on game theory, exploring widely focused behaviors of LLMs during game-playing, such as cooperative and reasoning behaviors phelps2023investigating ; anonymous2023large ; akata2023playing ; guo2023suspicion . For system-level simulations, the research primarily targets market simulations, such as consumption markets or auction markets, and investigates the rationality or optimality of LLMs’ economic behaviors within these markets zhao2023competeai ; anonymous2023rethinking ; chen2023put ; li2023large . The illustration is shown in Figure 4.

5.2.1 Individual economic behavior simulation

Considering the human-like characteristics of LLMs, many researchers attempted to replace humans in behavioral economics experiments with LLMs to observe the rational and irrational factors in their economic decision-making. Horton horton2023large replicated classic behavioral economics experiments using LLMs, including unilateral dictator games, fairness constraints kahneman1986fairness , and status quo bias samuelson1988status , confirming the human-like nature of LLMs in aspects such as altruism, fairness preferences, and status quo bias horton2023large . Although the experiment was conducted simply by asking GPT questions and analyzing responses, this represents a preliminary attempt to explore the use of LLMs for simulating human economic behavior. Chen et al. chen2023emergence have employed standard frameworks, revealed preference theory, to simulate the rationality in the economic decisions of GPTs. Results show that GPT performs largely rationally in risk, time, social, and food preferences domains in terms of budgetary decisions. Additionally, Geerling et al. geerling2023chatgpt have utilized the Test of Understanding in College Economics to simulate LLMs’ comprehension of microeconomics and macroeconomics, with results indicating that LLMs outperform most students who have taken economics courses.

Another research line to test the economic capabilities of LLMs involves accurately understanding certain socio-economic phenomena, specifically by using external text information (such as news) to predict future economic changes. Xie et al. xie2023wall used LLM to predict stock market movements with historical stock data and related tweets based on the perception of investor sentiment. However, the predictive performance of LLMs is worse than that of state-of-the-art methods, and in some cases, it is even inferior to traditional linear regression. Faria et al. faria2023artificial utilized LLMs for quarterly inflation forecasts, achieving accuracy comparable to, if not surpassing, the results of the Survey of Professional Forecasters (SPF). Bybee et al. bybee2023surveying tested LLMs on their predictions of finance and macroeconomics after reading specific sections of The Wall Street Journal, with results equivalent to those of SPF. These results suggest that LLMs possess a basic understanding of economic and financial markets but still lack sufficient and precise perception for accurate prediction, requiring more domain-specific data for additional fine-tuning.

5.2.2 Interactive economic behavior simulation

These simulations mainly focus on game theory, where there are only two or a few agents as opponents. Observing and analyzing the interactive behavior and capabilities of LLMs in various classic games is a current research hotspot. Guo guo2023gpt studied the behavior of large language model agents in the ultimatum game and prisoner’s dilemma game and found that the agents exhibit some similar patterns as humans, such as the positive correlation between offered amounts and acceptance rates in the ultimatum game. Phelps et al. phelps2023investigating found that incorporating individual preferences into prompts can influence the level of cooperation of LLMs. Specifically, they construct LLM agents with different personalities like competitive, altruistic, self-interested, etc., via prompts. Then, they let the agents play the repeated prisoner’s dilemma game with bots with fixed strategies (e.g. always cooperate, always defect, or tit-for-tat) and analyze the agents’ cooperation rate. They find that competitive and self-interested LLM agents show a lower cooperation rate, while altruistic agents demonstrate a higher cooperation rate, indicating the feasibility of constructing agents with different preferences through natural language. However, LLM agents also have limitations in some capabilities, such as the inability to reasonably respond to opponents’ actions, which may lead to higher cooperation preferences with betraying opponents. The understanding of LLM social behaviors is very important for subsequent developments in artificial intelligence and its impact on human social behavior. Other research anonymous2023large measured LLMs’ rationality and strategic reasoning ability using the second-price auction and the Beauty Contest game. In such games, fully rational players are assumed to choose the most beneficial choice from their point of view, which results in the Nash equilibrium. Therefore, the authors define the deviation of LLMs’ behavior from Nash equilibrium as the rationality degree. Moreover, they measure the strategic reasoning ability of LLMs by the ratio of the actual payoff to the optimal payoff. Experiments show that LLMs generally demonstrate rationality to some degree while they often cannot reach the Nash equilibrium. Among them, GPT-4 shows better strategic reasoning ability and can converge to Nash equilibrium faster than other LLMs like GPT-3.5 and text-davinci. The authors claim to provide a benchmark for testing the economic capabilities of the LLM research community. Akata et al. akata2023playing discovered through experiments in multiple game scenarios that LLMs are skilled in games valuing their self-interest but not as adept at coordinating with others. Specifically, in the prisoner’s dilemma, GPT-4 will cooperate well with a cooperative opponent but will always choose to defect after the opponent defects once. In the Battle of the Sexes, GPT-4 cannot coordinate well with the opponent’s choices to obtain maximum payoff.

In addition to the observations on the cooperative behavior and reasoning abilities of LLM agents during gaming, there are also a few studies attempting to construct strong game-playing agents. Guo et al. guo2023suspicion goes beyond simple measurement and enhances LLMs’ gaming abilities through prompt engineering. This work, specifically in an incomplete information game (namely Leduc Hold’em), has created agents with higher-order theory of mind that can significantly outperform traditional algorithm-based opponents without the requirement for training. Meta Fundamental AI Research Diplomacy Team (FAIR) et al. meta2022human proposed the first AI agent Cicero combining a language model and reinforcement learning to play the Diplomacy game. After competing with real humans in online games anonymously, results show that Cicero can outperform 90% of players. Even without employing LLMs, it has been demonstrated that earlier language models can approach or even surpass human capabilities in the realm of strategic gaming.

Moreover, Mao et al. mao2023alympics developed a simulation framework named Alympics, consisting of a sandbox playground and several agent players. The sandbox playground serves as the environment that stores and executes game settings, and agent players interact with the environment. The framework enables controlled, scalable, and reproducible simulation of game theory experiments.

The results from these simple simulation environments further validate the perception, reasoning, and planning capabilities of LLMs. In order to maximize their goals, LLMs consider their own benefits and opponents’ strategies when making economic decisions. It is worth noting that these goals can be customized through prompts, such as maximizing returns or maximizing fairness.

5.2.3 Economic system-level simulation

In an economic system, agents often interact with each other, trade goods, and form a market. These agents may not be limited to individuals but can also represent entities such as companies and banks, as these are also important components of the market. Zhao et al. zhao2023competeai , through simple consumption market simulations, uncovered competitive behaviors of LLM agents in managing restaurants, which are aligned with well-known sociological and economic theories. Specifically, the dish prices tend to be consistent with each other in the two simulated restaurants. Matthew effect also emerges during the simulation, i.e., one restaurant becomes more popular and popular while another has few consumers. Moreover, restaurants imitate competitors’ behaviors, and at the same time, they try to make differentiation to attract more consumers. Similarly, Han et al. han2023guinea studied the collusion between firms’ price strategies. They simulated the product pricing process of two firms in a market environment (i.e., Bertrand duopoly game) based on LLM. The results show that in the absence of communication, prices tend to approach the Bertrand equilibrium price. However, with communication, collusion between the companies tends to bring prices closer to the monopoly price. Nascimento et al. nascimento2023self simulated a simple online book marketplace and observed interesting phenomena such as price negotiation between sellers and buyers. Some research anonymous2023rethinking attempts to have LLMs act as intermediaries in information trading markets to address the issue of information asymmetry between buyers and sellers. Specifically, when a seller presents information and quotes a price as the response to the query from a buyer, an LLM agent, acting as an intermediary, can decide whether to purchase and, if choosing not to, forget the information seen, thus protecting the seller’s interests. In experiments, the information to exchange is actually the ‘passage’ from documents on the topic of LLMs from ArXiv. The results show that LLM can not only make rational purchasing decisions in this information market but also ensure the rationality of the overall market dynamics; for example, a higher budget can improve the quality of purchased answers (response to queries). Chen et al. chen2023put have developed LLM agents with planning capabilities in constructed virtual auction markets to achieve higher profits given limited budgets. Experiments show that LLM has the crucial abilities to participate in the auction, including managing budgets, considering long-term returns, etc, even through only simple prompts.

5.3 Physical domain

For the physical domain, the applications for LLM agent-based modeling and simulation include mobility behaviors, transportation, wireless networks, etc.

5.3.1 LLM agents for simulating mobility behaviors

Understanding real-world space and time is crucial to harness LLMs for agent-based modeling and simulation in human mobility behaviors. Researchers have delved into this issue through various investigations gurnee2023language ; manvi2023geollm . Gurnee et al. gurnee2023language focuses on probing LLMs to extract representations of real-world locations and temporal events, and the results demonstrate that these models build spatial and temporal representations in the neural layers. Manvi et al. manvi2023geollm delves into the geospatial knowledge embedded in LLMs. By fine-tuning LLMs on map-based prompts, substantial geospatial knowledge within LLMs is illustrated and shows improvements in tasks related to population density, asset wealth, and education. These investigations contribute valuable insights into the nuanced understanding of real-world space and time by LLMs, laying the groundwork for their application in agent-based simulations.

Based on their fundamental abilities, LLMs have showcased remarkable capabilities in simulation for the physical domain. For simulating the human-like navigation behaviors in the physical environment, LM-Nav shah2023lm combines large language models with image-language alignment algorithms. Following it, LLM-Planner song2023llm harnesses large language models to achieve few-shot planning for embodied agents. Moving into the domain of real-world planning with large language models, Chen et al. chen2023open introduces NLMap, which creates an open-vocabulary and queryable scene representation, allowing language models to gather and integrate contextual information for context-conditioned planning. Additionally, Shah et al. shah2023gnm study training a general goal-conditioned model to simulate human-like vision-based navigation, demonstrating the broad generalization capabilities of LLMs in complex physical environments.

5.3.2 LLM agent-based modeling and simulation for transportation

The possibility of using LLM agents for other applications in the physical domain, like transportation, has also been explored. Jin et al. jin2023surrealdriver designs an LLM agent to simulate the driving behavior of human drivers. Specifically, the agent interacts with a simulator named CARLA, where it receives information about the state of the car and environment from the simulator and decide what to do next, such as stop, speed up, change lane, and so on, which will be fed back to the simulator. During the decision process, the agent will consider its recent behaviors using a memory module and also take into account safety criteria as well as guidelines learned from human expert drivers. Experiments show that the agent design can significantly reduce collision rate and make the agent’s behavior more human-like. Moreover, the agent manages to perform complex driving tasks such as overtaking.

5.3.3 LLM agent-based modeling and simulation for wireless network

In addition, some researchers focus on deploying LLM agents to simulate device users in the city infrastructure, such as the wireless network. Zou et al. zou2023wireless proposes a framework where multiple on-device LLM agents can interact with the environment and exchange knowledge to solve a complex task together. Specifically, intents from humans or machines are provided to agents through wireless terminals, and the tasks are divided and planned collaboratively among multiple agents by leveraging the knowledge of different LLMs and device capabilities. On each device, the agent observes the environment and actors to execute decisions. On-device LLMs can extract semantic information from various data types and store it for future task planning. To deal with a specific task, the agent can retrieve relevant information or create lower-level tasks and send them to other agents to achieve the goal. The authors demonstrate the ability of the framework by an example of a wireless energy-saving task, where four users aim to reduce the network energy consumption while keeping the transmission rate. In the experiment, the agents gradually decrease their own power level based on previous actions of other users and manage to achieve the target after a few iterations, which shows the potential of LLM agent-based modeling and simulation in solving wireless network problems.

5.4 Cyber domain

Agent-based modeling and simulation cyberspace mainly involves various human behaviors such as information access, website visitation, network attack/defense, etc., in cyberspace.

WebAgent gur2023real is introduced as an LLM-driven agent capable of learning from its experiences to simulate human behaviors on real websites based on natural language instructions. It strategizes by breaking down instructions into manageable sub-parts, condenses lengthy HTML documents into relevant sections for the task at hand, and interacts with websites using Python programs derived from this information. Mind2Web deng2023mind2web further used large language models (LLMs) to construct these generalist web agents. While the sheer size of raw HTML from real websites poses a challenge for LLMs, Mind2Web demonstrates that pre-filtering this data with a smaller language model substantially enhances the effectiveness and efficiency of the LLMs in generating human-like web browsing behaviors. Zhou et al. zhou2023webarena further addressed the discrepancy between current language-guided autonomous agents, often tested in simplified synthetic environments, and the complexity of real-world scenarios. The authors build a highly realistic and reproducible environment specifically tailored for language-guided agents simulating human behaviors on the web. Park et al. park2023choicemates simulates the online decision-making scenarios, exploring the challenges individuals face when lacking domain expertise while searching for and making decisions using online information. Wang et al. wang2023recagent proposed to build large language models to interact with recommender systems by selecting from recommendation results and providing positive or negative feedback. It serves as the testing protocol for evaluating the recommender system’s performance: whether it can satisfy the agents’ preferences well.

In RecAgent wang2023recagent , the researchers explore the potential of LLMs in simulating user behaviors within online environments, particularly recommender systems. By creating an LLM-based autonomous agent framework, the study investigates how these agents can simulate complex human interactions and decisions in a virtual environment. This approach enables a novel method for studying user behavior, offering insights into how users might react to different scenarios in digital platforms, thus advancing our understanding of user dynamics in virtual spaces. Zhang et al. zhang2023generative proposed to build generative agents for the recommender system in which the authors design LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. The proposed agents can emulate the filter bubble effect and discover the underlying causal relationships in recommendation tasks.

5.5 Hybrid domain

In some studies, simulations are conducted that simultaneously consider more than one domain, such as physical and social, and we refer to these simulations as being within a hybrid domain.

As a pioneering work, Generative Agents park2023generative offers a compelling insight into the generation of believable individual and social behaviors. The research focuses on a central question: how can generative agents reliably produce human-like individual actions and social dynamics? It delves into an agent architecture that integrates memory (for storing past experiences), reflection (to make rational present decisions), and planning (for future actions). This architecture is critically evaluated through crowd-sourced assessments, affirming the effectiveness of the memory, reflection, and planning modules in generating rational behaviors. Notably, this approach led to complex social scenarios, such as Valentine’s Day parties and mayoral elections, underscoring the agents’ proficiency in simulating nuanced human interactions and societal events. This research offers a substantial contribution to social simulations, demonstrating the advanced potential of LLM agents in replicating the depth and complexity of human social behaviors.

Williams et al. williams2023epidemic conduct an epidemic simulation within a hybrid domain. In this simulation, social relationships influenced individuals’ perception of the epidemic, while individuals’ physical movements within spatial contexts affected their susceptibility to infection. Welfare Diplomacy mukobi2023welfare sets a benchmark, a nation-to-nation war/welfare equilibrium tabletop game designed to evaluate the collaborative capabilities of large language models.

Hua et al. hua2023war proposed to use LLM agents to represent countries and simulate their decisions and consequences, based on which the historical international conflicts, including World War I, World War II, and the Warring States Period in Ancient China are selected for evaluation. In the LLM agent-based war simulations, the emergent interactions among countries help explain why the wars occur.

Li et al. li2023large simulate a hybrid macroeconomic system and expand the scale of simulation environments from tens to hundreds. Specifically, they simulate LLM-empowered agents’ work and consumption behaviors in a macroeconomic market. The proposed perception, memory, and action modules endow the agents with real-world heterogeneity, the ability to grasp market dynamics, and decision-making considering multiple economic factors, respectively. Experimental results show the emergence of more reasonable and stable macroeconomic indicators (price inflation, unemployment rate, GDP, and GDP growth rate) and regularities (Phillips curve and Okun’s law) compared with traditional rule-based ABM lengnick2013agent ; gatti2011macroeconomics and RL-based approaches zheng2022ai . Especially, only the simulation based on LLM agents can produce the correct Phillips curve, i.e., negative relationship between the unemployment rate and inflation. This advantage is owned by LLM’s accurate perception of market dynamics, such as the deflation of labor markets.

Urban Generative Intelligence (UGI) ugi is a platform that constructs a real-world urban environment provided by digital twins, which provide various interfaces for embodied agents to generate many behaviors, supported by a foundation model named CityGPT, which is trained on city-specific multi-source data. In this platform, multiple categories of LLM-based agents can simulate human-like behaviors, including social interactions, economic activities, mobility, street navigation, etc., showing promising abilities in simulating city activities based on embodied agents.

6 Open problems and future directions

6.1 Efficiency of Scaling Up

Many studies of LLM agents find it advantageous to simultaneously simulate multiple personas and exploit the synergy effect by allowing them to communicate and vote for the final output yao2023tree . For example, researchers find LLM-based software development can be significantly improved by simulating a virtual software company with diverse social identities, including chief officers, professional programmers, test engineers, and art designers qian2023communicative . This virtual company is capable of streamlining the development of complex software solutions in the stages of designing, coding, testing, and documenting. Moreover, researchers generally find scaling up the number of simulated agents and deploying more diverse personas are beneficial in various tasks zhuge2023mindstorms .

However, simulating societies of large-scale LLM agents is very computationally expensive. Extensive research efforts are dedicated to optimizing the memory footprint sheng2023high and operation subroutines aminabadi2022deepspeed of language models. Researchers also develop several effective model compression techniques zhu2023survey , such as knowledge distillation and quantization. In the context of LLM agent simulation, batch prompting cheng2023batch is a highly relevant technique that is capable of simulating multiple agents in batches. Experiments show batch prompting can achieve up to 5×\times× efficiency improvement in inference token and time costs. Besides, MetaGPT is proposed to improve the efficiency of multi-agent collaboration in virtual software companies hong2023metagpt . They leverage a shared message pool and subscribe mechanism to reduce the time and token cost of generating one line of code. Despite the previous efforts of accelerating LLM agents, simulating large-scale LLM agents remains a highly challenging task, which significantly hinders LLM agent simulation from reaching its full potential. Simulating large societies of LLM agents not only can effectively improve the performance in downstream tasks, but also has the potential to mimic the emergence properties of human societies, and hence reveal the underlying mechanisms caldarelli2023role . Therefore, it is an important open problem to achieve full-process acceleration of LLM agent simulations.

6.2 Benchmark

Benchmarks have significantly advanced the development of AI in the past decade. Landmark benchmarks like ImageNet russakovsky2015imagenet , GLUE wang2018glue , and the benchmarks in graph learning hu2020open ; dwivedi2023benchmarking have been pivotal to the rapid innovation in the fields of computer vision, natural language processing and graph neural networks.

Recently, there has been a surge in benchmarks that assess the capabilities of LLM-driven agents, highlighting the growing interest in this emerging area. For example, researchers valmeekam2022large develop benchmarks to evaluate LLM’s capability in planning and reasoning about change, focusing on symbolic models and structured inputs compatible with such representations. Meanwhile, AgentBench develops a multi-dimensional benchmark with 8 distinct environments to assess the capabilities of LLM-driven agents in various multi-turn open-ended generation settings liu2023agentbench . MLAgentBench, on the other hand, designs a suite of ML tasks for benchmarking LLM-driven AI research agents, including tasks like image classification and sentiment classification huang2023benchmarking . Researchers also propose to evaluate LLM-driven agents with embodied tasks, using them as high-level planners in robotics setups or in textual environments, focusing on the interaction between planning and action, like ALFWorld shridhar2020alfworld and ComplexWorld basavatia2023complexworld . On top of textual environments, online reinforcement learning approaches are developed to align LLM agents with human preference and evaluate their performance carta2023grounding .

However, the previous benchmarks mainly focus on the decision and planning capability of LLM-driven agents, the assessment of LLM-driven agent simulation is still inadequate. On the one hand, there still exist challenges in evaluating the performance of agent simulations. Previous works often examine the statistics feature of simulated behaviour feng2020learning , such as the spatial and temporal distribution. Recent studies also recruit human evaluators to gather feedback on the believability of the simulation park2023generative . However, developing benchmarks for quantitative and qualitative evaluation of LLM-driven agent-based simulation remains a largely open problem and a promising future research direction. On the other hand, LLM-driven simulation might serve as a realistic environment that provides high-quality feedback to train other AI models. For example, previous studies explore the simulations of social segregation sert2020segregation , competing firms osoba2020policy , competitive games park2019multi , and coordination of different stakeholders bone2010simulation . Such simulations can serve as a benchmark to train and evaluate the reinforcement learning models. A recent study wu2023plan proposes a PET framework to leverage LLM-driven agents as a supervisor of low-level trainable models, which simplifies challenging control tasks by translating task descriptions into high-level sub-tasks and then tracking the accomplishment of these sub-tasks. Additionally, more research efforts should be dedicated to the benchmarks of AI for social good cowls2021definition .

6.3 Open Platform

Building open platforms for LLM-driven agents will play a pivotal role in this emerging research area that could substantially reduce the barriers of LLM-driven ABS and foster a vibrant community, echoing the calls for open-source software weber2004success and open science national2018open . The recent advance of LLMs has led to the public releases of several powerful pretrained language models. For example, Bidirectional Encoder Representations from Transformers (BERT) has been publicly released and gained huge influences in the past few years devlin2018bert . GPT2, a predecessor to the current ChatGPT family, was released by OpenAI with limited model sizes for open-source use radford2019language . Additionally, Meta AI recently released a collection of open foundation and fine-tuned chat models named LLaMa 2 touvron2023llama , which range in scale from 7 billion to 70 billion parameters. These open-source LLMs demonstrate powerful capabilities in various natural language tasks, which can be further adapted for specific downstream tasks with efficient fine-tuning methods such as Low-Rank Adaptation (LoRA) hu2021lora .

The recent proliferation of LLM-driven agents has also resulted in several open-source platforms. Voyager is an example open-source framework of embodied LLM-driven agents, capable of continuously acquiring diverse skills and making novel discoveries in Minecraft without human intervention xi2023rise . Researchers also develop open-source frameworks for real-world task-solving agents, such as XAgent xagent2023 that are designed as a general-purpose framework of automatic task-solving. Moreover, ModelScope-Agent li2023modelscope is proposed as a general and customizable agent framework designed for real-world applications, which supports model training on multiple open-source LLMs and offers diversified and comprehensive APIs. On top of the textual embodied environment ALFWorld, researchers developed BUTLER framework shridhar2020alfworld that can operate across text and embodied environments with three main components, i.e., brain, vision and body. This arrangement allows BUTLER to effectively bridge the gap between abstract language understanding and practical, embodied task execution in simulated virtual environments. However, these previous works mainly focus on task-solving LLM agents, while the open platforms for LLM-driven ABS are still lacking. Such gap can be largely attributed to the challenges of integrating LLM-driven agents with the complex environment of simulation. Urban Generative Intelligence (UGI) ugi is a recently proposed open platform that integrates embodied agents with the digital twins of cities, offering the opportunity to evaluate urban problems with large-scale urban agent simulations and solve them with multidisciplinary approaches. Despite this early attempt at urban system simulation, the development of an open platform for LLM-driven ABS is an emerging area that calls for more research attention.

6.4 Robustness of LLM-driven ABS

The robustness problems of LLM agent simulation can be classified into two main scenarios, adversarial attack and out-of-distribution generalization, which fundamentally stem from the robustness issues of the underlying language models wang2023robustness . The current methodologies to address out-of-distribution generalization problems primarily resort to classic machine learning techniques shen2021towards , such as unsupervised representation learning, supervised model learning, and optimization methods. As for the adversarial attack, various defense techniques have been proposed in recent studies. For example, researchers propose to certify LLM safety with an erase-and-check filter that detects adversarial prompts kumar2023certifying . Besides, moving target defense chen2023jailbreaker aims to select safe answers from the responses generated by different LLMs to enhance LLM system’s robustness against jailbreaking attacks. Moreover, extensive benchmarks of adversarial prompts are formulated to evaluate LLM zhu2023promptbench .

As for the LLM agents, they often have tool-use capability qin2023tool and engage in human interactive scenarios, such as the conflict simulation actor that helps users learn conflict resolve through rehearsal shaikh2023rehearsal , which make the robustness of LLM agents have far-reaching consequences. Furthermore, in the context of multi-agent simulation, adversarial attacks might propagate among agents tian2023evil . More importantly, recent works show the simulations of multiple LLM agents show human-like collective behaviors aher2023using ; zhou2023sotopia , such as social conformity and homophily, which could be exploited by adversaries as weaknesses in the societies of LLM agents. Improving the robustness of LLM agent simulation at both the individual and collective levels is an open problem.

6.5 Ethical Risks in LLM Agents

The advances of LLM unleash the unprecedented capability of human-like text generation and reasoning, raising concerns of potential ethical risks of misuse, such as jailbreaking zhuo2023exploring . For example, recent studies highlight the risks of generating malicious network payloads that could jeopardize cyber security at scale charan2023text , and emphasize the concerns of accuracy, recency, coherence, and transparency of LLM agents in medical practice thirunavukarasu2023large . To gauge LLM agents’ susceptibility to social bias and stereotype, researchers use semantic illusions and cognitive reflection tests hagendorff2023human , typically administered to human subjects, to quantify LLM’s tendency to produce intuitive yet erroneous responses. They find early models from the GPT family have an increasing tendency to generate intuitive errors as their size increases, while ChatGPT-3.5 and 4 have a pattern shift that radically eliminates these errors and achieves superhuman accuracy. They speculate the pattern shift is driven by the employment of reinforcement learning from human feedback, a sophisticated technique only deployed in ChatGPT-3.5 and later models. These findings highlight the importance of embedding human preferences into the language models, instead of solely relying on web corpus. In the context of LLM-driven agent simulations, researchers find when certain personas are assigned to ChatGPT it will generate output with 6×\times× toxicity, engaging in discriminatory stereotypes, harmful conversation, and offensive language deshpande2023toxicity . Besides, a recent work acerbi2023large shows LLM agent exhibits human-like biases that prefer gender-stereotype-consistent, negative, and biologically counter-intuitive content. More importantly, such biases could be further amplified in the transmission chain in multi-agent settings. The experimental results from previous studies emphasize the importance of ethical considerations in LLM-driven agent-based simulations, especially against the backdrop of the rapid proliferation of LLM agents in various domains.

Extensive efforts have been made to mitigate the potential ethical risks of LLM agents. A primary focus is to fundamentally align language models with human values yi2023unpacking ; yao2023instructions . A recent survey classifies the alignment goals into three distinct levels, i.e., human instructions, human preferences, and human values. Besides, Moral Foundation theory is invoked to benchmark mainstream language models’ alignment with the foundational ethical values of care, fairness, loyalty, authority, and sanctity yi2023unpacking . Researchers also find LLM agents are susceptible to flattened caricatures when specific personas are assigned to them cheng2023compost . The CoMPosT framework is proposed to evaluate the multidimensionality of simulated LLM agents and provide a measure for caricature in LLM agent simulations. They find even the agents driven by the latest GPT-4 in the simulation of political and marginalized demographic groups. Finally, to fundamentally address the potential ethical risks, many scholars advocate enhancing the interpretability of LLM agents, questioning the falsifiability of any moral principles learned by black box LLM agents vijayaraghavan2023minimum . Therefore, they propose to benchmark and continuously improve LLM agents’ interpretability zhao2023explainability .

7 Conclusion

Agent-based modeling and simulation is one of the most important methods to model complex systems in various domains. The recent advances in large language models have reshaped the paradigm of agent-based modeling and simulation, providing a new perspective for constructing intelligent human-like agents rather than those driven by simple rules or limited-intelligence neural models. In this paper, we take the first step to provide a survey of the agent-based modeling and simulation with large language models. We systematically analyze why the LLM agents are required for agent-based modeling and simulation and how to address the critical challenges. Afterward, we extensively summarize the existing works in four domains: cyber, physical, social, and hybrid, carefully describing how to design the simulation environment, how to construct the large language model-empowered agents, and what to observe and achieve based on agent-based simulation. Lastly, given the unresolved limitations of existing works and this new and fast-growing area, we discuss the open problems and point out the important research directions, which we hope can inspire future research.

References