AI Agent 为什么是AIGC最后的杀手锏?

作者:胡晓萌、陈楚仪 腾讯研究院‍‍‍

AI Agent无疑是当下大模型最激动人心的发展主线,被称为“大模型下一场战事”“最后的杀手产品”“开启新工业革命时代的Agent-centric”。11月7日,OpenAI首届开发者大会(OpenAI DevDay)引爆了AI Agent。OpenAI发布了AI Agent初期形态产品GPTs,并推出了相应的制作工具GPT Builder。用户仅仅通过跟GPT Builder聊天,把想要的GPT功能描述一遍,就能生成专属GPT。专属GPT可以在日常生活、特定任务、工作或家庭中更为适用。为此,OpenAI还开放了大量的新API(包括视觉、图像DALL·E3、语音),以及新推出的Assistants API,让开发者可以更便捷地开发自己专属的GPT。比尔·盖茨最新发表一篇文章明确提出,5年内AI Agent将大行其道,每个用户都将拥有一个专属AI Agent。用户不需要再因为不同的功能需求而使用不同的APP,他只需用日常语言告诉他的Agent想做什么就可以。[1]


那么,AI Agent究竟是什么?为什么如此重要,以至于业界有这么高的关注度,甚至有学者断言“美国Agent Store(智能体商店)发展得好,这会令中美大模型差距持续拉大”[2]

AI Agent是什么?


OpenAI将AI Agent定义为,以大语言模型为大脑驱动,具有自主理解感知、规划、记忆和使用工具的能力,能自动化执行完成复杂任务的系统。[4]AI Agent基本框架如下图:





(3)工具使用(Tool use)。工具使用模块指的是智能体能够利用外部资源或工具来执行任务。如学习调用外部API来获取模型权重中缺失的额外信息,包括当前信息、代码执行能力、对专有信息源的访问等,以此来补足LLM自身弱项。例如LLM的训练数据不是实时更新的,这时可以使用工具访问互联网来获取最新信息,或者使用特定软件来分析大量数据。现在市场上已经存在大量数字化、智能化的工具,智能体使用工具比人类更为顺手和高效,通过调用不同的API或工具,完成复杂任务和输出高质量结果,这种使用工具的方式也代表了智能体的一个重要特点和优势。



AI Agent将带来





实际上,2021年微软在GitHub首次引入了Copilot(副驾驶)的概念。GitHub Copilot是一个辅助开发人员编写代码的AI服务。2023年5月,微软在大模型的加持下,Copilot迎来全面升级,推出Dynamics 365 Copilot、Microsoft 365 Copilot和Power Platform Copilot等,并提出“Copilot是一种全新的工作方式”的理念。工作如此,生活也同样需要“Copilot”,“出门问问”创始人李志飞认为大模型的最好工作,是做人类的“Copilot”。





AI Agent将改变软件的游戏规则


AI Agent正在重新定义软件。比尔·盖茨认为,AI Agent将彻底颠覆软件行业,将影响我们如何使用软件以及如何编写软件。[9]

AI Agent将使软件架构的范式从面向过程迁移到面向目标。现有的软件(包括APP)通过一系列预定义的指令、逻辑、规则和启发式算法将流程固定下来,以满足软件运行结果符合用户的预期,即用户按照指令逻辑一步一步操作达成目标。这样一种面向过程的软件架构具有高可靠性、确定性。但是,这种面向目标的架构只能应用于垂直领域,而无法普遍应用到所有领域,因此标准化和定制化之间如何平衡也成为SaaS行业面对的难题之一。


AI Agent范式将原本由人类主导的功能开发,逐渐迁移为以AI为主要驱动力。以大模型为技术基础设施,Agent为核心产品形态,把传统软件预定义的指令、逻辑、规则和启发式算法的任务层级演变成目标导向的智能体自主生成。这样一来,原本的架构只能解决有限范围的任务,未来的架构则可以解决无限域的任务。[11]未来的软件生态,不仅是最上层与所有人交互的媒介是Agent,整个产业的发展,无论是底层技术,商业模式,中间组件,甚至是人们的生活习惯和行为都会围绕Agent来改变,这就是Agent-Centric时代的开启。[12]

RPA范式(Robotic Process Automation)与APA范式(Agentic Process Automation)的比较[13]

以面壁智能发布的首个“大模型+Agent”SaaS级产品ChatDev智能软件开发平台为例。该平台就像一家完全由AI Agents组成的软件开发公司,里面会有CEO、CTO、开发经理、产品经理、测试专员、监督员等各类Agent角色。用户只需要把明确的需求告诉CEO角色的Agent,这个CEO就会基于用户的需求,组织整个软件开发流程。最后交付给用户的包含了软件产品和整个开发过程中的代码,并且所有流程都是自动化的。[14]这将使软件行业降低生产成本、提高定制化能力,进入软件的“3D 打印”时代。

AI Agent的展望与挑战

AI Agent是人工智能成为基础设施的重要推动力。回顾技术发展史,技术的尽头是成为基础设施,比如电力成为像空气一样不易被人们察觉,但是又必不可少的基础设施,还如云计算等。当然这个要经历以下三个阶段:创新与发展阶段--新技术被发明并开始应用;普及与应用阶段--随着技术成熟,它开始被广泛应用于各个领域,对社会和经济产生深远影响;基础设施阶段--当技术变得普及到几乎无处不在,它就转变成了一种基础设施,已经成为人们日常生活中不可或缺的一部分。几乎所有的人都认同,人工智能会成为未来社会的基础设施。而智能体正在促使人工智能基础设施化。这不仅得益于低成本的Agent软件生产优势,而且因为Agent能够适应不同的任务和环境,并能够学习和优化其性能,使得它可以被应用于广泛的领域,进而成为各个行业和社会活动的基础支撑。



从技术优化迭代和实现上来看,AI Agent的发展也面临一些瓶颈:



突破多智能体的发展困境,是未来智能体社会(Agent Society)建立的重要前提。多智能体协同可以组成智能体社会这一最高形态的技术社会系统。智能体社会具有复杂、动态,自组织和自适应的特性,能够协作、竞争、不断进化。在这个社会系统中,智能体能够根据目标和环境变化执行复杂灵活的任务,并与人类及其他智能体进行高级别、多维度的互动和协作。智能体社会不仅有助于人类探索和拓展物理及虚拟世界,还能增强和扩展人类的能力与体验。

同时,这些发展趋势预示着AI Agent可能面临诸如安全性与隐私性、伦理与责任、经济和社会就业影响等多方面的挑战。





Author Hu Xiaomeng Chen Chuyi Tencent Research Institute is undoubtedly the most exciting development thread of the current big model. It is called the big model, the next war, the last killer product, and the opening of the new industrial revolution era. The first developer conference detonated and released the initial products and launched the corresponding production tools. Users can generate exclusive products only by describing the desired functions through chatting, which is more suitable for daily life, specific tasks, work or family. For this reason, a large number of new products have been opened. Visual images, voice and the newly launched Bill Gates, which allows developers to develop their own exclusive products more conveniently, recently published an article clearly stating that every user will have an exclusive user within the year. He only needs to tell him what he wants to do in everyday language, and after the release, it has accumulated more than one in a week. So what is so important that the industry has such a high degree of attention and even scholars. It is asserted that the development of American agent stores will continuously widen the gap between China and the United States. What is it in the field of computer artificial intelligence? It is generally translated as an agent. It is defined as a software or hardware entity that embodies one or more intelligent characteristics in a certain environment, such as autonomy, responsiveness, social anticipation, speculation, critical thinking and cognition. It is defined as a large language model driven by the brain, with the ability to independently understand, perceive, plan, remember and use tools, and can automatically perform complex tasks. The basic framework of the service system is as follows. Based on the basic framework of driving, it has four main modules: memory, planning, action and using tools. The memory module is responsible for storing information, including knowledge learned from past interactions and even temporary task information. For an agent, an effective memory mechanism can ensure that it can call past experience and knowledge when facing new or complex situations. For example, a chat robot with memory function can remember the user's preferences or previous conversations. So as to provide a more personalized and coherent communication experience, which is divided into short-term memory and long-term memory. All context learning uses short-term memory to learn long-term memory, which provides agents with the ability to retain and recall infinite information for a long time, usually by using external vector databases and fast retrieval, such as a large amount of data and knowledge deposited in a certain industry field. With long-term memory, a lot of data can be accumulated, making the usability of agents more powerful and deeper in the industry. The planning module has two stages: pre-planning and post-reflection. In the pre-planning stage, it involves the prediction and decision-making of future actions. For example, when performing complex tasks, the agent breaks down the big goal into smaller manageable sub-goals, so that it can efficiently plan a series of steps or actions to achieve the expected results. In the post-reflection stage, the agent has the ability to check and improve the shortcomings in the planning, reflect on the mistakes and learn lessons. After forming and adding long-term memory, it helps agents avoid mistakes and update their cognition of the world. Tools using modules means that agents can use external resources or tools to perform tasks, such as learning to call the outside to obtain additional information missing from the model weight, including the current information code execution ability and access to proprietary information sources, so as to make up for their weaknesses. For example, if the training data is not updated in real time, tools can be used to access the Internet to obtain the latest information. Or use specific software to analyze a large amount of data. Now there are a large number of digital and intelligent tools in the market. Agents use tools more conveniently and efficiently than humans. This way of using tools also represents an important feature and advantage of agents. The action module is the part that agents actually implement decisions or respond to. For different tasks, agent systems have a complete set of action strategies when making decisions. You can choose the actions you need to perform, such as the well-known memory retrieval, reasoning, learning and programming. Generally speaking, these four modules cooperate with each other to enable agents to take actions and make decisions in a wider range of situations, and to perform complex tasks in a smarter and more efficient way will bring about a wider range of man-machine integration. Based on the large model, the exclusive intelligent assistant will not only enhance everyone's ability, but also change the mode of man-machine cooperation and bring about a wider range of man-machine integration and generative intelligent revolution. Up to now, there have been three modes of human-computer collaboration, namely, the embedded mode, in which users set goals by communicating with language and using prompts, and then assist users to accomplish these goals, such as ordinary users inputting prompts to create novels and music works, etc. In this mode, the role is equivalent to the tool of executing commands, while human beings play the role of decision makers and commanders. In this mode, human beings and partners participate in the workflow together and play their respective roles. In the workflow, from providing suggestions to assisting in the completion of the process, for example, in software development, it can help programmers to write code, detect errors or optimize performance. Human beings and their ability to work together in this process complement each other is more like a knowledgeable partner than a simple tool. In fact, last year, Microsoft introduced the concept of co-pilot for the first time, which is a service to assist developers to write code. With the blessing of a big model, Microsoft ushered in a comprehensive upgrade and launch. And put forward the idea that it is a brand-new way of working, so life also needs the founder of Mobvoi, Li Zhifei, who thinks that the best work of the big model is to be an agent model of human beings. Humans set goals and provide necessary resources, such as computing power, and then undertake most of the work independently. Finally, human beings supervise the process and evaluate the final results. This model fully embodies the interactive autonomy and adaptability of agents, which is close to independent actors, while human beings play more supervisory roles. The role of the evaluator and the role of human beings and collaboration: From the functional analysis of the four main modules of agent memory, planning, action and using tools, it can be seen that the agent mode is undoubtedly more efficient than the embedded mode co-pilot mode, or it will become the main mode of man-machine collaboration in the future. Every ordinary individual may become a super-individual. The super-individual has his own team and automated task workflow. Based on the establishment of more intelligent and automated collaboration with other super-individuals, there are many people in the industry. 