一个GPT的幽灵在Gemini上空徘徊

币圈资讯阅读：35 2024-04-22 11:53:03 评论：0

美化布局示例

欧易(OKX)最新版本

【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载全球官网大陆官网

币安(Binance)最新版本

币安交易所app【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

火币HTX最新版本

火币老牌交易所【遇到注册下载问题请加文章最下面的客服微信】永久享受返佣20%手续费！

APP下载官网地址

作者：骆轶航

每次Google在生成式人工智能领域祭出大招，都能让人感到一种隐秘且巨大的情绪力量：隐忍、不甘与较量。
在5月的Google I/O上，Google发布PaLM系列模型，即被视为是对彼时风头正盛的GPT-4的强劲挑战。年底毫无预警地祭出Gemini系列大语言模型（包含移动版Nano、基础版Pro和高级版Ultra），在关键性能和基准评测指标上针对GPT的意图更加直接。
Google官方声称：在被大型语言模型研究和开发广泛使用的 32 项学术基准中，Gemini Ultra 的性能有 30 项超越了目前GPT–4代表的最先进水平。Gemini标榜“原生多模态”，这让它一开始就在文本、图像、音频、视频和代码等组合模态上进行预训练，故而可能在复杂理解和推理方面有更强表现，特别是解决数学和物理问题。
Google不遗余力地强调上述优势——OpenAI走的是“渐进式多模态”的路，先基于文本语料，代码跟进，再图像视频音频，最后把这些能力组合在一起训练。而Gemini从一开始就在多模态语料上训练，再用多模态数据调优，比OpenAI训练GPT的方法“先进”一些。
“先进”的多模态训练，理论上能带来更强劲的性能。公布的Gemini Ultra学术基准评测结果大面积超越GPT-4的细节似可说明。但学术基准测试本身就是理论的一部分，并不能真正体现应用的效果。不少人挤兑中国一些大语言模型热衷于基准测试“刷分”，我们该一视同仁，Google的做法与国内大模型冲着超越GPT刷分，本质上没有什么不同。
目前社交平台X上实测Gemini Pro（Bard聊天机器人目前只支持Pro版）的用户已经贡献了不少吐槽。比如它混淆了2023年和2022年的奥斯卡奖获得者，也不会用Python写入两个多边形的交集这种简单函数。我们还发现它识别不出叶子的数量，以及做不对简单的求锐角几何题。即便Gemini Pro对标的是GPT-3.5，它也还是差了点意思。
被人们指出的另一个突出问题是Gemini的宣传视频“造假”：Gemini Ultra对一组手势动作很快做出反应，说这是一个石头剪子布的游戏，但它未被视频显示的功能文档却给出了至少两条提示：“我在做什么？”，“提示，这是一个游戏”。其它的一些测试甚至需要更多的提示问题辅助生成结果，但这个过程在Gemini的官方视频里被省略掉了。以至于大多数不太较真的人高估Gemini的理解能力和反应速度，这不能不说是个误导。
我到现在都记得2017年我在Google I/O的现场，台上的Google员工演示如何通过Google Assistant语音助手直接预订一家餐厅，下面掌声雷动，我也跟着拍巴掌，觉得太棒了。但一个月后即传出这是一个事先准备好的桥段。Google没有造假，但它通常太希望展示其AI能力的无与伦比，太急切地呈现自己的AI乐观主义，以至于经常缩略呈现了背后的过程，也就事实上夸大了效果。
说白了，Gemini视频的夸张呈现，只是说明Google太在意Gemini比GPT强了。它很着急，加上人们对任何跟ChatGPT较劲的任何大模型，特别是巨头的“杰作”，通常都比较苛刻。当然，人们对Google是最苛刻的——毕竟OpenAI选择用Google发明的Transformer架构搞出了划时代的GPT模型，动机之一就是摆脱Google无所不在的AI压制，那谁还不希望看见Google露怯呢。
某种程度上，Google是OpenAI在这个星球上唯一的孪生。包括Meta的LLamA架构都是开源的，以马斯克老师对开源的偏爱，Grok未来大概率也得开源。中国的大语言模型也在不同程度上都走了开源道路。只有OpenAI和Google是坚定闭源的，这就让Google在大语言模型上的进展，本能地与OpenAI形成了强绑定关系。
还有一个戏剧性的张力：每次OpenAI有围绕GPT的大动作，舆论都会喊Google出来挨一回落后就要挨的打。然后Google差不多一定会在OpenAI出招之后的一到两个月，祭出一个新的大招，证明一下你大爷还是你大爷。然后再憋几个月，OpenAI再出招，Google再被喊出来挨打。格局真就会因此改变么？
某种程度看，Google在生态建设上还是比OpenAI落后了一个身位。毕竟这个世界上已经有几百万个开发者自己做的GPTs了。而Google最早要到明年初才能向开发者和企业客户提供强化训练反馈后的Gemini Ultra，让人们在上面开发自己的应用。到那时恐怕GPT Store都已经正式推出来了。我一直有点困惑，Google当年难道不是靠Android的开源夺得苹果半壁江山的么？这次怎么把这个角色让给Meta了？
我真的不是要怪Google，我更期待Google证明自己。我们这群在1990年代末接触互联网的人，对Google是有一些很微妙的特殊感情的。而Google也必须证明自己的AI First战略能结出真正的果实。只是GPT的幽灵在Google徘徊，是一个事实。谁都可以试图摆脱这个幽灵，唯独Google不行，这是它无可选择的对手。
其实Google今天围绕Gemini所做的一切，某种程度上能让我们中国的大语言模型开发者心有共鸣：大家的头顶上都徘徊着GPT的幽灵，这就让大家都试图通过某些努力，证明自己在某些方面比GPT做得更好。
Google在Gemini基准测试中采用了一切小“技巧”（采用更复杂的思维链提示和结果选优，而测试GPT只用5次反馈且无提示词）获得了碾压GPT的成绩，类似的测试方法我们是不是听起来很熟悉？中国的大语言模型研发者有没有一种老乡见到了老乡的戚戚然的感觉？
我们经常喜欢将智谱、百度和Minimax的努力与OpenAI做对比，但换一个思路，其实大语言模型的较量，何尝不是百度、智谱、Minimax、Google、Meta、Anthropic和Grox们一起在围攻OpenAI的光明顶？从这个意义上，中国大模型和美国除OpenAI之外的大模型在一个阵营一个战壕里，是报团也是互相学习的对象。大模型的百草我们这些神农尝多了，就发现我们中国的大模型不是比美国的大模型差，只是不如ChatGPT，就这样。
Google这次训练Gemini另一个值得圈点的地方，是它完全采用了自家的芯片集群——TPU进行的训练。Google官宣TPUs v4和 v5e在通过AI优化过的基础设施上实现了这一大规模训练，可扩展性强且推理最高效。这恐怕是我们听说的一款性能还算强劲的大语言模型，不依赖英伟达的算力和软硬件架构而修成正果的。当然人家TPU是自产自销的，但我看到了“替代英伟达”这件事在大模型训练实操上的可能性和可行性。这对中国的大语言模型训练意味着什么，不言自明。
此外，Gemini Nano也是一个亮点，这是一款尺寸最小的Gemini模型，优先用于G家自产的Pixel 8手机上。“端侧大模型”是近期的一个话题，其实它更接近“小模型”。中国智能手机厂商OPPO、vivo和小米近期都有发布自己的端侧模型，联想则从AI PC的维度也切进了事实上的同一个领域。这次Google加入了这个阵营，应该是一个信号，这件事值得努力，有的做。
其实很神奇。从Google这次Gemini模型的发布，我看到了Google与一众我们熟悉的中国大语言模型开发者同样的境遇和努力：那种隐忍、不甘和较量，那种偶尔展露的在一些基准测试关键指标上跟OpenAI较劲的小手段和小心思，那些试图建立自己生态的步步为营，那种试图摆脱英伟达算力的尝试，以及基于移动优势在端侧模型的努力……面对OpenAI，大家都是一样的。
一个GPT的幽灵，在Gemini的上空徘徊，也在我们每一个中国大语言模型的上空徘徊。

Every time the author Luo Yihang makes a big move in the field of generative artificial intelligence, it can make people feel a kind of secret and huge emotional power. The release of a series of models on the basis of the month is regarded as a strong challenge at that time. At the end of the year, a series of large language models including the basic version and the advanced version of the mobile version were presented without warning, which is more direct in terms of key performance and benchmark evaluation indicators. The official claim is that it is an academic benchmark widely used in the research and development of large language models. Some of its performance has surpassed the most advanced level represented at present, flaunting native multimodal, which makes it pre-train on combined modes such as text, image, audio, video and code from the beginning, so it may have stronger performance in complex understanding and reasoning, especially in solving mathematical and physical problems, and spare no effort to emphasize the above advantages. It takes a gradual multimodal road, first following up based on text corpus code, then image, video and audio, and finally combining these abilities to train in multimodal from the beginning. Corpus training and multimodal data optimization are more advanced than training methods. In theory, some advanced multimodal training can bring stronger performance. The published academic benchmark evaluation results seem to explain the details of large-scale transcendence, but the academic benchmark itself is a part of the theory and can not really reflect the application effect. Many people run on some big language models in China and are keen on benchmarking. We should treat it equally, which is essentially the same as that of domestic big models. At present, the users of the only supported version of the chat bot on the former social platform have contributed a lot. For example, it confuses the Oscar winners in and will not use the simple function of writing the intersection of two polygons. We also find that it can't recognize the number of leaves and can't do the simple geometric problem of finding an acute angle correctly. Even if the target is it, it is still a little less interesting. Another prominent problem that people have pointed out is that propaganda video fraud quickly responds to a group of gestures, saying that it is a. The game of rock, paper, scissors, but its function document, which has not been shown by video, gives at least two hints about what I am doing. This is a game, and other tests even need more hints to help generate results. However, this process was omitted in the official video, so that most people who are not serious overestimate their understanding ability and reaction speed. This has to be said to be misleading. I still remember that my staff on the live stage demonstrated how to book one directly through the voice assistant in. Applause thundered at the bottom of a restaurant, and I clapped my hands. I thought it was great, but a month later, it came out that this bridge was prepared in advance. There was no fraud, but it was usually so eager to show its unparalleled ability that it often abbreviated the process behind it, which actually exaggerated the effect. To put it bluntly, the exaggerated presentation of the video just showed that it was too concerned about it, and people were anxious about any big model that was competing with it, especially the masterpieces of giants. Of course, people are the most demanding. After all, one of the motives for choosing to use the invented architecture to create an epoch-making model is to get rid of the ubiquitous repression. Who doesn't want to see make a fool of oneself? To some extent, it is the only twin on this planet. All the architectures included are open source. With Mr. Musk's preference for open source, it is highly probable that China's big language model will also be open source in the future. Only peace is firmly closed, which makes progress in the big language model. Instinctively, there is a strong binding relationship with it, and there is also a dramatic tension. Every time there is a big action around it, public opinion will shout out, and when it falls back, it will be beaten. Then it will almost certainly offer a new big move one to two months after the move, prove that your uncle is still your uncle, and then hold back for a few months and then be called out and beaten. Will this really change the pattern? To some extent, it is still one position behind in ecological construction. After all, there are millions of developments in this world. I have done it myself, but it is not until early next year that I can provide intensive training feedback to developers and corporate customers, so that people can develop their own applications on it. By then, I am afraid it will have been officially launched. I have always been a little confused. Didn't I win half of Apple by open source? How did I lose this role to me this time? I really don't want to blame me, but I am more looking forward to proving myself. We people who came into contact with the Internet at the end of the 1990 s have some subtle special feelings, and we must also prove it. It is a fact that one's own strategy can bear real fruit, but the ghost is wandering. Anyone can try to get rid of this ghost, but it can't. This is its opponent that has no choice. In fact, everything that has been done today can resonate with the hearts of China's big language model developers to some extent. The ghost hovering above everyone's head makes everyone try to prove that they are better in some aspects through some efforts, and adopt all the tricks and more complicated thinking chains in the benchmark test. Tips and results were selected, but the test only used secondary feedback and the results were crushed without prompts. Do we sound familiar with the similar test methods? Do the developers of large language models in China have a feeling that the villagers have met their fellow villagers? We often like to compare the efforts of Zhipu Baidu and others, but to put it another way, in fact, the contest of large language models is not the bright top of Baidu Zhipu and the students besieged together. In this sense, the big model in China and the big model in the United States are not the same. In a camp and a trench, it is the grass of the big model, which is also the object of learning from each other. We Shennong have tasted it too much and found that our big model in China is not worse than that in the United States, but it is not as good as this. Another thing worth noting about this training is that it completely adopts its own chip cluster training officer Xuanhe to realize this large-scale training on the optimized infrastructure, which is highly scalable and the most efficient in reasoning. I am afraid it is a model we have heard of with relatively strong performance. Of course, people are self-produced and self-sold, but I have seen the possibility and feasibility of replacing NVIDIA in the practice of large-scale model training. What this means for China's large-scale model training is self-evident. In addition, it is also a bright spot. This is the smallest model, which is given priority to the upper-end model of home-made mobile phones. In fact, it is closer to the small model. China smartphone manufacturers and Xiaomi have recently released their own end-to-end models, and Lenovo has actually cut into the same field from the new dimension. 比特币今日价格行情网_okx交易所app_永续合约_比特币怎么买卖交易_虚拟币交易所平台

文字格式和图片示例

注册有任何问题请添加微信：MVIP619 拉你进入群