AI智能体 - 维基百科

Autonomous artificial intelligence agent

In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) represent a category of intelligent agents capable of pursuing goals, utilizing tools, and executing actions with various levels of independence. In practice, they typically function within objectives, constraints, and toolkits established by humans.^[1]^[2]

Overview

AI agents are characterized by several essential traits, including goal-oriented behavior, natural language interfaces, the capability to leverage external tools, and the competence to handle multi-step tasks. Their operational flow is often powered by large language models (LLMs). These agent systems might also comprise memory units, planning mechanisms, tool connectors, and orchestration software designed to coordinate the agent's various components.^[2]^[3]

There is currently no universally accepted standard definition for AI agents.^[4]^[5]^[6] The National Institute of Standards and Technology (NIST) has identified agentic AI as an evolving field that requires standards to ensure secure operations, interoperability, and trustworthy interactions with external systems.^[1]

A frequent use case for AI agents is the automation of complex tasks. A typical example is arranging travel itineraries according to a user's prompted request.^[7]^[8]^[9]

Major technology firms, including Google, Microsoft, and Amazon Web Services, have introduced platforms for deploying pre-configured AI agents.^[10] Various protocols have been suggested to standardize communication between agents, such as the Model Context Protocol and Gibberlink,^[11] among others. Some of these protocols also facilitate connections between agents and external software applications.^[12]

In late 2025, the Linux Foundation revealed the establishment of the Agentic AI Foundation (AAIF), aiming to guarantee that agentic AI develops in a collaborative and transparent manner.^[13]^[14]

History

Research into AI agents can be traced back to the 1990s. Notably, Harvard professor Milind Tambe observed that the definition of an AI agent was unclear even during that period. Researcher Andrew Ng is credited with popularizing the term "agentic" among the general public in 2024.^[15]

Training and testing

Experts have attempted to construct world models^[16]^[17] and reinforcement learning environments^[18] for the purpose of training or assessing AI agents. For instance, video games like Minecraft^[19] and No Man's Sky,^[20] as well as digital replicas of corporate websites,^[21] have been utilized as training grounds for these agents.

Autonomous capabilities

The Financial Times has drawn an analogy between the autonomy of AI agents and the SAE levels for self-driving cars. Most current applications are compared to Level 2 or Level 3, while some reach Level 4 in very specific scenarios. Level 5 autonomy remains largely theoretical.^[22]

Cognitive architecture

See also: Large language model § Agency

Several internal design strategies can be employed for reasoning within an agent:^[23]

Retrieval-augmented generation
The ReAct (Reason + Act) pattern describes an iterative cycle where the AI agent switches between reasoning steps and executing actions, receives observations from the environment or external tools, and integrates this data into the next reasoning phase.^[24]
Reflexion, a method that utilizes an LLM to generate feedback on the agent's action plan and saves this feedback into a memory bank.
A tool/agent registry, used to organize software functions or other agents accessible to the main agent.
One-shot model querying, a technique where the model is queried once to generate the complete action plan.

Reference architecture

Ken Huang has proposed a reference architecture for AI Agents. This architecture comprises seven interconnected layers, with each layer relying on the functionality of the ones below it^[25]:

Layer 1: Foundation models — act as the primary AI engines driving agent capabilities.
Layer 2: Data operations — handle the complex data infrastructure necessary for agent operations, including Vector databases, data loaders, and RAG.
Layer 3: Agent frameworks — advanced software and tools designed to streamline the creation and management of AI agents.
Layer 4: Deployment and infrastructure — offer the sturdy technical base required to run AI agents.
Layer 5: Evaluation and observability — concentrate on analyzing the safety and performance of AI agents.
Layer 6: Security and compliance — a vital protective framework ensuring agents function safely, securely, and within regulatory limits. This layer integrates security and compliance features embedded throughout the entire stack.
Layer 7: Agent ecosystem — signifies the interface through which AI agents interact with real-world software and users.

Orchestration patterns

To handle complex assignments, autonomous agents are frequently combined with other agents or specialized tools. These configurations, referred to as orchestration patterns or workflows, include the following:^[26]^[27]

Prompt chaining: A workflow where the result of one step acts as the input for the subsequent step.
Routing: The classification of an input to guide it to a specific downstream task or tool.
Parallelization: The concurrent execution of multiple tasks.
Sequential processing: A fixed, linear sequence of tasks flowing through a set pipeline.
Planner-critic: An iterative design where one agent creates a proposal and another evaluates it, providing feedback for improvement.

Multimodal AI agents

Besides large language models (LLMs), vision-language models (VLMs) and other multimodal foundation models can serve as the basis for agents. In September 2024, the Allen Institute for AI released an open-source vision-language model.^[28] Nvidia released a framework...

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]