谷歌反击：Project Astra正面硬刚GPT-4o Veo对抗Sora

币圈资讯阅读：36 2024-05-15 19:44:28 评论：0

美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载全球官网大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

```html

来源：机器之心

机器之心编辑部

Laiyuan machine zhixin machine zhixin editorial department 比特币今日价格行情网_okx交易所app_永续合约_比特币怎么买卖交易_虚拟币交易所平台

通用的 AI，能够真正日常用的 AI，不做成这样现在都不好意思开发布会了。

5 月 15 日凌晨，一年一度的「科技界春晚」Google I/O 开发者大会正式开幕。长达 110 分钟的主 Keynote 提到了几次人工智能？谷歌自己统计了一下：

是的，每一分钟都在讲 AI。

生成式 AI 的竞争，最近又达到了新的高潮，本次 I/O 大会的内容自然全面围绕人工智能展开。

「一年前在这个舞台上，我们首次分享了原生多模态大模型 Gemini 的计划。它标志着新一代的 I/O，」谷歌首席执行官桑达尔・皮查伊（Sundar Pichai）说道。「今天，我们希望每个人都能从 Gemini 的技术中受益。这些突破性的功能将进入搜索、图片、生产力工具、安卓系统等方方面面。」

24 小时以前，OpenAI 故意抢先发布 GPT-4o，通过实时的语音、视频和文本交互震撼了全世界。今天，谷歌展示的 Project Astra 和 Veo，直接对标了目前 OpenAI 领先的 GPT-4o 与 Sora。

这是 Project Astra 原型的实时拍摄：

我们正在见证最高端的商战，以最朴实的方式进行着。

反击 Sora：发布视频生成模型 Veo

在 AI 视频生成领域，谷歌宣布推出视频生成模型 Veo。Veo 能够生成各种风格的高质量 1080p 分辨率视频，时长超过一分钟。

凭借对自然语言和视觉语义的深入理解，Veo 模型在理解视频内容、渲染高清图像、模拟物理原理等方面都取得了突破。Veo 生成的视频能够准确、细致地表达用户的创作意图。

例如，输入以下文本 prompt：

Many spotted jellyfish pulsating under water. Their bodies are transparent and glowing in deep ocean.
（许多斑点水母在水下搏动。它们的身体透明，在深海中闪闪发光。）

再比如生成人物视频，输入 prompt：

A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors.
（在美丽的日落、柔和的光线、温暖的色彩下，一个孤独的牛仔骑着马穿过开阔的平原。）

或者是近景人物视频，输入 prompt：

A woman sitting alone in a dimly lit cafe, a half-finished novel open in front of her. Film noir aesthetic, mysterious atmosphere. Black and white.

``````html Google Introduces Veo, Imagen 3, and Trillium at Google I/O

Google Introduces Veo, Imagen 3, and Trillium at Google I/O

Google has unveiled a trio of groundbreaking advancements at the Google I/O conference, showcasing their commitment to advancing creative and technological innovation.

Veo: Revolutionizing Video Generation

In a breakthrough for video creation, Google introduces Veo, a model capable of unprecedented creative control and understanding of film terminology, resulting in seamless and realistic videos.

For instance, with Veo, filmmakers can effortlessly capture stunning aerial shots along the Hawaii coastline on a sunny day, simply by inputting a prompt:

Drone shot along the Hawaii jungle coastline, sunny day

Veo also supports generating videos using both images and text prompts, ensuring the video aligns with the style of the provided reference image and textual description.

Interestingly, Google's demo showcases a video of a llama created by Veo, reminiscent of Meta's Llama, an open-source model.

With Veo, creators can produce videos lasting up to 60 seconds or longer. This capability is crucial for applying video generation models in filmmaking, whether through a single prompt or a series of prompts telling a story.

Veo builds upon Google's visual content generation work, leveraging models like Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere.

Starting today, Google will provide a preview version of Veo to select creators for use in VideoFX, with plans to integrate some of Veo's features into products like YouTube Shorts.

Imagen 3: Advancements in Text-to-Image Generation

Continuing its series of model upgrades, Google unveils Imagen 3, enhancing details, lighting, noise, and significantly improving its understanding of prompts.

To capture finer details from longer prompts, such as specific camera angles or compositions, Google enriches each image's title in the training data with more descriptive information.

For example, by adding details like "slightly blurred foreground" and "warm lighting" to the input prompt, Imagen 3 can generate images accordingly:

Furthermore, Google addresses the issue of text blurriness in generated images by optimizing image rendering, ensuring that text in the generated images is clear and stylized.

Imagen 3 will offer multiple versions optimized for different types of tasks to enhance usability.

Starting today, Google will provide a preview version of Imagen 3 to select creators for use in ImageFX, with users able to join the waitlist.

Trillium: The Next Generation TPU Chip

As generative AI continues to reshape human-technology interactions, Google unveils the sixth-generation TPU, Trillium, the most powerful and energy-efficient TPU to date, set to launch by the end of 2024.

Trillium TPU, highly customized for AI applications, powers several innovations announced at the Google I/O conference, including Gemini 1.5 Flash, Imagen 3, and Gemma 2, all trained and serviced using TPUs.

Compared to TPU v5e, Trillium TPU boasts a four-fold increase in peak chip-level compute performance, providing the computational, memory, and communication capabilities necessary for training and fine-tuning the most powerful models.

``````html

谷歌在最新的发布会上宣布，他们的最新一代人工智能处理器 Trillium 已经问世。据悉，Trillium 不仅将性能提升了 7 倍，还将高带宽内存（HBM）以及芯片间互连（ICI）带宽翻了一番。此外，Trillium 还配备了第三代 SparseCore，专门用于处理高级排名和推荐工作负载中常见的超大型嵌入。

据谷歌表示，Trillium 能够以更快的速度训练新一代 AI 模型，同时减少延迟和降低成本。此外，Trillium 还被称为迄今为止谷歌最具可持续性的 TPU，与其前代产品相比，能效提高了超过 67%。

Trillium 单个高带宽、低延迟的计算集群（pod）中最多可扩展到 256 个 TPU（张量处理单元）。除了这种集群级别的扩展能力之外，通过多片技术（multislice technology）和智能处理单元（Titanium Intelligence Processing Units，IPUs），Trillium TPU 还可以扩展到数百个集群，连接成千上万的芯片，形成一个由每秒数 PB（multi-petabit-per-second）数据中心网络互联的超级计算机。

谷歌早在 2013 年就推出了首款 TPU v1，随后在 2017 年推出了云 TPU，这些 TPU 一直在为实时语音搜索、照片对象识别、语言翻译等各种服务提供支持，甚至为自动驾驶汽车公司 Nuro 等产品提供技术动力。

Trillium 也是谷歌 AI Hypercomputer 的一部分，这是一种开创性的超级计算架构，专为处理尖端的 AI 工作负载而设计。谷歌正在与 Hugging Face 合作，优化开源模型训练和服务的硬件。