游戏里的智能革命：AI如何与游戏共创未来？

币圈资讯阅读：38 2024-04-22 03:12:32 评论：0

美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载全球官网大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

作者：王枢腾讯研究院博士后

早在2001年，就有研究指出游戏人工智能领域，有极大的潜力实现或创造类人级别的人工智能（human-level AI）[1]。游戏作为人工智能研究的起点，以其任务场景的复杂性和多样性，为人工智能在广度、深度和灵活性等方面接近人类智能提供了保障。

当前，伴随着生成式AI和决策AI技术的迅猛发展，游戏与人工智能共振共生的发展态势更加明显。在全球游戏顶会GDC2024（全球游戏者开发大会2024）上，AI成为大会关注焦点，以AI为主题的演讲达64场，占比达8%。在生成式AI领域，62%的游戏业受访者正在使用AI工具制作游戏内容[2]。在决策AI领域，Google DeepMind团队继Alphastar后再次推出通用游戏智能体SIMA（Scalable Instructable Multiworld Agent），可根据人类自然语言指令在各类3D游戏世界中执行超过600多种任务。

技术试验场：

基于游戏环境的通用AI Agent实践

游戏为决策AI提供清晰测量标准，用游戏中清晰、可量化的规则评估决策AI的能力，能够解决人工智能科研场景缺失问题，大幅提升技术迭代与测试效率。当前，包括OpenAI、DeepMind等在内的大部分决策AI研究团队都选取游戏作为训练场景，致力于在不同类型的游戏场景中的打造通用智能体，并以此为基础构建通用人工智能。

2024年3月13日，Google DeepMind团队发布名为SIMA（Scalable Instructable Multiworld Agent）的AI智能体，它可以理解广泛的3D游戏世界，并能够像人类一样遵循自然语言指令在各类3D游戏世界中执行超过600多种任务。强大的自然语言理解和迁移学习的能力，让不不少研究人员将SIMA的出现视为“智能体的ChatGPT时刻”。

DeepMind在技术报告中详细阐释了SIMA的基本原理及技术路径，将其定义为一种在多重3D虚拟世界中可扩展、可指导的通用游戏智能体。DeepMind团队选取了9款当下流行的3D网络游戏和4个基于Unity引擎制作的3D场景作为SIMA智能体的训练环境，并从游戏中收集了大量人类玩家的行为和操作数据，用以训练智能体。在具体训练的过程中，智能体会不断观察学习屏幕中的游戏图像信息，并将其与玩家在游戏中的各类操作指令结合起来，随后实现通过键盘和鼠标输出，来控制游戏中的角色执行各种操作[3]。

图1 SIMA智能体项目概述

SIMA项目是DeepMind团队在通用人工智能（AGI）研究领域的一个重要里程碑，从围棋人工智能AlphaGO和AlphaZero，到基于游戏《星际争霸2》的AlphaStar，再到如今基于大语言模型的SIMA，DeepMind团队一直在基于游戏环境进行通用智能体的测试和研究，在DeepMind看来，智能体在游戏环境中训练出的决策和行动能力，有望能够迁移到现实世界的场景中，为孵化通用人工智能提供新思路和新实践。

早在SIMA发布之前，业内已经存在着多个通用游戏智能体研究项目，其中比较有代表性的工作有两个，分别是由DeepMind发布的Gato，以及由英伟达发布的Minedojo。

Gato由DeepMind团队于2022年11月发布，可游玩雅达利系列游戏（Atari Games），并可操控真实的机器人手臂堆叠积木。Gato使用了类GPT的大语言模型架构，其训练材料包括图像、文本、机械臂关节数据以及其他多模态数据集（multimodal dataset）[4]。微软在2023年3月的一篇研究中指出，Gato这类融合了多模态信息的大模型，极有可能诞生出初期的智能[5]。

图2 DeepMind 打造的Gato

与Gato类似的还有英伟达、加州理工学院（Caltech）和斯坦福（Stanford）等研究机构基于《我的世界》游戏共同打造的智能体MineDojo。Minedojo将《我的世界》游戏的玩家视频（YouTube）、百科（Wiki）和用户社区（Reddit）的资讯作为训练材料，训练出一个能够在《我的世界》游戏中根据文字提示信息，完成各种不同任务的通用智能体。Minedojo不仅能够完成一些简单的程序化任务（programmatic tasks），还可以根据简单描述完成一系列创造任务（creative tasks），例如根据描述建造一个图书馆等[6]。

图3 Minedojo 能力模型

Gato和Minedojo这两项工作分别对应着人工智能研究中的两类不同思路：解决足够多的任务或解决一个足够复杂的任务。但上述两项研究也存在一定局限，比如Minedojo只是针对特定游戏类的专用智能，只能在单一游戏中完成各类任务，并不具备迁移学习的能力；Gato虽然具备一定迁移学习能力，但其主要应用环境都是一些2D游戏，而非3D游戏环境，与现实世界场景差距较大。

当前，基于游戏环境训练通用AI Agent已经成为业内共识。在TED AI 2023演讲上，英伟达高级科学家 Jim Fan 提出了基础模型（Foundation Agent）概念，认为AI研究的下一个前沿将是塑造一个可以在虚拟世界和现实世界里泛化，掌握广泛技能，控制许多身体，并能够泛化到多个环境中“基础模型”，而这个模型的训练，同样离不开游戏环境[7]。在国内，腾讯也牵头构建起AI多智能体与复杂决策开放研究平台——开悟，依托腾讯AI Lab和《王者荣耀》在算法、算力、实验场景方面的核心优势，为学术研究人员和算法开发者提供国内领先的应用探索平台。

能力新突破：

SIMA实现大语言模型

与AI Agent训练的有效融合

SIMA的出现，将大语言模型与智能体训练进行结合，实现了AI智能体决策能力和泛化的突破。SIMA不仅能较好地理解各种3D游戏环境，而且还能像人类一样按照自然语言指令在各种3D游戏世界中执行各类任务，并且在决策效率与能力上远超其他智能体，具备了与人类相近的决策能力[8]。DeepMind 创始人及CEO德米斯·哈萨比斯（Demis Hassabis）在采访中更是直言，“将大语言模型、AI智能体训练与游戏环境相结合的这个领域，有着巨大的发展前景，DeepMind未来将持续加大对该领域的研究投入[9]。”总体来看，与其他SIMA的特征和突破主要体现在以下几个方面：

第一，SIMA使用游戏环境进行训练，但更加关注智能体行为与接收指令的一致性。在DeepMind团队看来，“游戏是人工智能 (AI) 系统的重要试验场，与现实世界一样，游戏也是一种丰富的学习环境，具有反应灵敏的实时设置和不断变化的目标。” SIMA与DeepMind团队之前发布的游戏智能体相比，相同之处在于其训练过程中也观察学习了大量人类玩家的行为数据，不同之处在于SIMA训练的目的不在于击败人类玩家或在游戏内取得高分，而是为了学会在各种游戏环境中遵从人类发出的自然语言指令，并在游戏环境中作出与指令一致的行为。

第二，SIMA将大语言模型与智能体训练进行结合，并采用统一且人性化的交互界面。“语言和环境的学习是相辅相成的，通过学习自然语言，能够提升智能体对于通用表征和抽象概念的理解能力，提高学习效率。”相较于之前各种基于游戏环境的智能体，SIMA在训练中引入了大语言模型，整个训练过程都遵循语言优先的规则，所有的训练行为都由自然语言直接驱动。也就是说，SIMA 既不需要访问游戏的源代码，也不需要定制的 API。它只需要两个输入：屏幕上的图像信息，以及用户提供的自然语言指令，即可使用键盘和鼠标控制游戏中的角色执行这些指令。在具体交互方式上，SIMA采用了统一且人性化的交互界面，人类可以直接调用该交互界面向SIMA发出自然语言指令（如下图4）。

图4 SIMA智能体架构

第三，SIMA拥有良好的泛化能力，能够在不同虚拟场景中保持较高能力水平。据DeepMind团队目前公布数据，SIMA已经通过600项基础技能进行评估，涵盖导航（例如左转）、对象交互（爬梯子）和菜单使用（打开地图）等，并且在多个游戏环境中都表现出了高于同类智能体的性能水平。DeepMind研究人员评估了 SIMA 按照指令完成近 1500 个具体游戏内（in-game）任务的能力，其中部分采用了人类评估，结果显示无论在哪种游戏环境中，SIMA的表现都远超同类型智能体（如图5）。

图5 多个智能体在不同环境下的性能对比

应用新场景：

AI助力游戏创作

提升内容创作效能

游戏已经成为打造通用AI Agent的试验场和孵化器，不断推动决策AI 技术的更新迭代。与此同时，伴随着以Stable Diffusion、Transformer等生成式AI技术的成熟，AI技术也开始反向助力游戏以及更广泛的文化行业的内容创作，越来越多的从业者能够以更低成本生成图片、文字、音视频、NPC等数字资产，提升产品研发效能，进一步降低交互内容的制作门槛。

在应用层面，生成式AI模型已经成为游戏开发者的有力助手。《2024 Unity 游戏业报告》数据显示，在使用AI技术之后，有71%游戏工作室表示其研发和运营效能得到了提升，这种效率的提升不仅体现在赋能单个内容创作者方面，还体现在能够有效降低不同环节工作者的沟通成本方面。

在游戏内容的生产侧，生成式AI已经被广泛应用于文本生成、2D美术创作、代码生成与检测、关卡设计生成等环节。在AI工具介入游戏美术工作流程之前，游戏美术工作者完成一张高质量的插画图的时间大概在一周左右，在使用Stable Diffusion等生成式AI工具后，能将一张高质量插画图的生成时间缩短至1天。

图6 基于AIGC工具的插画人物绘制过程

在降低不同类型工作者沟通成本方面，生成式AI也有着巨大的应用空间。例如在游戏制作过程中，尤其是在对游戏美术风格进行定调和选型时，游戏策划和美术工作者之间的沟通往往需要耗费大量的时间成本。生成式AI工具的介入，能够帮助策划者快速将创意落地并呈现，极大降低沟通成本。

在工具层面，随着生成式AI对游戏研发效能的提升，各类游戏公司也开始将其融入各自内容制作工具中。游戏芯片公司英伟达于2023年6月发布了面向游戏开发者的AI工具平台NVIDIA ACE for Games，让游戏开发者可以在游戏中构建和部署定制化的语音、对话和动画等AI模型，极大提升游戏内容生产和制作效率；在GDC 2024上，NVIDIA和Inworld 联合公布了一项全新的数字人技术 Covert Protocol，基于该技术塑造的游戏NPC能够与玩家进行实时交互，并且能够能够基于互动内容，实时生成游戏玩法[10]。

图7 NVIDIA发布的Covert Protocol技术demo

游戏引擎公司Unity和Unreal也相继发布基于生成式AI的新产品。Unity于2023年7月发布两款基于人工智能技术的新产品：Sentis 和Muse，据悉两款产品可将传统内容创作的效率提升十倍；Unreal也在自身引擎中集成了大量应用了AIGC工具，如数字人制作工具Metahuman creator，尝试以人工智能技术加速创作高质量的角色及大规模场景生成效率。

游戏制作公司也全面拥抱AI技术，用AI赋能内容制作工具，不断提升内容研发效率。以腾讯为例，腾讯AI Lab 在GDC 2024 重磅发布了自研游戏全生命周期AI引擎“GiiNEX”，该引擎借助腾讯自研生成式AI和决策AI模型，面向AI驱动的NPC、场景制作、内容生成等领域，可提供包括3D图形、动画、城市及音乐等多种AIGC能力。在GiiNEX引擎助力下，原本需要5天才能完成的城市建模任务，现在只需要25分钟即可完成，效率提升达百倍[11]。

图8 腾讯游戏AI引擎GiiNEX架构图

结语

自1956年达特茅斯会议开始，在人工智能领域，早期的计算机科学家们将AI定义为“使一部机器的反应方式像一个人在行动时所依据的智能”[12]，后来几乎所有的人工智能研究都循着“模拟”人类智能的路径，试图打造出能听、能看、能说、能思考、能学习、能行动的人工智能，提升其感知、认知现实世界与遂行决策行动的能力。

时至今日，人工智能研究依旧遵循着模拟人类的路径和目标。如果说以ChatGPT、Sora等为代表的生成式AI大模型，提升了人工智能对事物的“感知”与“认知”能力，完成了迈向通用人工智能的第一步。那么能够让人工智能在复杂、多样的游戏环境中通过机器学习做出合适的“选择”的决策AI模型，则让人工智能具备了“行动”能力，能够根据自身和环境信息进行自主决策，实现了迈向通用人工智能至关重要的一步。

尽管当下的人工智能研究距离实现AGI还有相当长的路要走，但生成式AI和决策AI的结合，无疑为实现AGI开辟了新的可能性，而游戏作为训练AI的试验场，在通用人工智能研究中的角色也愈发重要。我们看到，基于大语言模型和AI智能体的结合，已经能够塑造出像SIMA这样的通用游戏智能体，不仅能在给定环境下做出有效决策，还能不断学习和适应未知环境，并根据自然语言指令完成各类复杂任务，表现出类人智能。未来，随着训练环境的不断增加，通用游戏智能体或将具备对更复杂、更高级语言指令的理解和能力，人们有望创造出更为灵活、适应性更强、更接近人类智能的AI系统。我们也期待，有一天，通用智能体能够通过游戏这个小世界的测试，顺利走向现实大世界的广阔舞台，服务人类社会的千行百业。

感谢曹建峰、刘林、王鹏等在本文写作过程中给予的指导！

The author Wang Shu, a postdoctoral fellow at Tencent Research Institute, pointed out that there is great potential in the field of game artificial intelligence to realize or create humanoid artificial intelligence. As the starting point of artificial intelligence research, the complexity and diversity of its task scenes provide a guarantee for artificial intelligence to approach human intelligence in breadth, depth and flexibility. At present, with the rapid development of generative and decision-making technology, the development trend of game and artificial intelligence resonance symbiosis is more obvious in the global game summit. In the game industry in the generative field, the respondents are using tools to make game content, and in the decision-making field, the team will launch a general game agent again, which can perform more than a variety of tasks in various game worlds according to the instructions of human natural language. The general practice game based on the game environment provides clear measurement standards for decision-making, and the ability to evaluate decision-making with clear and quantifiable rules in the game can be solved. Solve the problem of lack of artificial intelligence research scenes, greatly improve the efficiency of technical iteration and testing. At present, most decision-making research teams, including others, choose games as training scenes, and are committed to building general-purpose agents in different types of game scenes, and on this basis, the team released an agent named "General Artificial Intelligence", which can understand a wide range of game worlds and perform more than a variety of tasks in various game worlds like human beings. However, the ability of language understanding and transfer learning makes many researchers regard the emergence of the agent as the moment. The basic principle and technical path explained in detail in the technical report define it as a universal game agent team that can be extended and guided in multiple virtual worlds. The team selected a popular online game and an engine-based scene as the training environment of the agent, and collected a large number of human players' behavior and operation data from the game to train the agent in detail. In the process of training, the intelligent person will constantly observe the game image information in the learning screen and combine it with the various operation instructions of the player in the game, and then realize the control of the characters in the game to execute various operation diagrams through keyboard and mouse output. The project overview of the intelligent person is an important milestone for the team in the research field of general artificial intelligence, from Go artificial intelligence and StarCraft-based game to the team based on the big language model. It seems that the decision-making and action ability trained by agents in the game environment is expected to migrate to real-world scenes, providing new ideas and new practices for incubating general artificial intelligence. Long before the release, there have been many research projects on general game agents in the industry, two of which are representative ones released by NVIDIA and released by the team in May to play Atari series games and control real robot hands. The arm stacking building blocks use a kind of large language model architecture, and its training materials include images, texts, joint data of manipulator and other multi-modal data sets. In a study in June, Microsoft pointed out that this kind of large model with multi-modal information is very likely to produce an initial intelligent map. Similar to this, NVIDIA, California Institute of Technology and Stanford and other research institutions have jointly built an agent based on my world game to make players' video encyclopedia and user community information of my world game. To train a general agent who can complete various tasks according to the text prompts in my world game for training materials, he can not only complete some simple programmed tasks, but also complete a series of creative tasks according to simple descriptions, such as building a library and drawing capacity model according to descriptions. These two tasks correspond to two different ideas in artificial intelligence research to solve enough tasks or solve a complex task, but there is also one in the above two studies. Limited, for example, specialized intelligence for a specific game can only complete all kinds of tasks in a single game, but it does not have the ability of transfer learning. Although it has certain transfer learning ability, its main application environment is some games rather than game environment, which is far from the real world scene. At present, it has become the consensus in the industry to train universality based on game environment. In the speech, senior scientists in NVIDIA put forward the concept of basic model, and thought that the next frontier of research would be to shape a virtual world. And the training of this model is also inseparable from the game environment. In China, Tencent has also taken the lead in building an open research platform for multi-agent and complex decision-making. Enlightenment relies on the core advantages of Tencent and the glory of the king in the experimental scene of algorithm computing power to provide academic researchers and algorithm developers with a new breakthrough in the ability of the leading application exploration platform in China and realize the effective integration of large language models and training. The appearance of "The Big Language Model" combines with agent training to achieve the breakthrough of agent's decision-making ability and generalization, which can not only better understand all kinds of game environments, but also perform various tasks in all kinds of game worlds according to natural language instructions like human beings, and has a decision-making ability similar to human beings in decision-making efficiency and ability. The founder and Demis Hassabis even bluntly combined the big language model agent training with game environments in an interview. This field has a huge development prospect, and the research investment in this field will continue to increase in the future. Generally speaking, compared with other characteristics and breakthroughs, it is mainly reflected in the following aspects: first, using game environment for training, but paying more attention to the consistency of agent behavior and receiving instructions. In the team's view, games are an important testing ground for artificial intelligence systems, and like the real world, games are also a rich learning environment with responsive real-time settings and changing goals. The drama agent is similar in that it has also observed and learned a large number of human players' behavior data during the training process. The difference is that the purpose of training is not to beat human players or get high scores in the game, but to learn to obey the natural language instructions issued by human beings in various game environments and make behaviors consistent with the instructions in the game environment. Secondly, the large language model is combined with the agent training and the unified and humanized interactive interface language and environment learning are complementary. By learning natural language, the agent's ability to understand general representations and abstract concepts can be improved. 比特币今日价格行情网_okx交易所app_永续合约_比特币怎么买卖交易_虚拟币交易所平台

文字格式和图片示例

注册有任何问题请添加微信：MVIP619 拉你进入群