Jack Rae：横跨AI巨头的技术领袖

2025/08/04·StepFun can make mistakes, check important info.

Jack Rae 的职业经历可以分为以下几个阶段： 🎓 教育背景本科：英国布里斯托大学（University of Bristol）硕士：美国卡内基梅隆大学（CMU）博士：英国伦敦大学学院（UCL），计算机科学方向 🔬 职业经历 1. DeepMind（约2015–2023）职位：研究科学家 → 研究总监 → 首席科学家核心贡献：领导大型语言模型（LLM）研究组主导项目： Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半 Chinchilla：提出“Chinchilla Scaling Laws”，揭示大模型训练不充分的问题，2022年NeurIPS最佳论文 Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人研究方向：语言模型、长程记忆、压缩即智能理论 2. OpenAI（2023–2024）职位：团队负责人（Team Lead）核心贡献： GPT-4 核心开发者之一，参与预训练与推理系统设计提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能 3. Google DeepMind（2024–2025）职位：首席科学家（Principal Scientist）核心贡献：负责 Gemini 的“思考”机制，推动推理能力提升参与 Gemini Flash Thinking 项目，探索可控推理深度 4. Meta（2025年6月起）职位：超级智能实验室（Meta Superintelligence Labs）核心成员背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一 📌 研究标签大模型预训练与推理长程记忆机制信息论与压缩理论多模态大模型设计 AGI 路线图与Scaling Laws Jack Rae 是目前极少数横跨 DeepMind、OpenAI、Meta 三大顶尖AI机构的技术领袖之一，其研究深刻影响了当前大模型的发展路径。

Deep Research

预备调查清单

一、待核查内容列表

Jack Rae 的教育背景与职业经历的真实性，包括其姓名及在AI领域的存在。
职业经历时间线：DeepMind（约2015–2023），OpenAI（2023–2024），Google DeepMind（2024–2025），Meta（2025年6月起）。
在 DeepMind 的核心贡献：主导 Gopher 项目，并“首次将人类考试水平差距缩小一半”。
在 DeepMind 的核心贡献：提出“Chinchilla Scaling Laws”，且该研究为“2022年NeurIPS最佳论文”。
在 DeepMind 的核心贡献：担任 Gemini 1.0/2.0 的预训练负责人。
在 OpenAI 的核心贡献：为“GPT-4 核心开发者之一”。
在 OpenAI 的核心贡献：提出“压缩即智能”理论。
在 Meta 的职位：被扎克伯格亲自挖角，成为“超级智能实验室（Meta Superintelligence Labs）”核心成员。
总结性陈述：“Jack Rae 是目前极少数横跨 DeepMind、OpenAI、Meta 三大顶尖AI机构的技术领袖之一”。

二、核查要点展开

待核查内容1： Jack Rae 的教育背景与职业经历的真实性，包括其姓名及在AI领域的存在。

核查原因： 这是所有后续信息的基础，需要首先确认该人物是否真实存在于AI研究领域。
所需查明信息：
1. 是否存在一位名为 Jack Rae 的知名AI研究员。
2. 其公开的教育和职业履历是否与报告描述相符。
必要信源：
1. Google Scholar, DBLP等学术数据库。
2. LinkedIn等专业社交网络平台。
3. 顶尖AI会议（如NeurIPS, ICML）的作者列表。
核查优先级： 高

待核查内容2： 职业经历时间线：DeepMind（约2015–2023），OpenAI（2023–2024），Google DeepMind（2024–2025），Meta（2025年6月起）。

核查原因： 在短时间内横跨多个顶级且互为竞争对手的AI实验室是极不寻常的。时间点（特别是包含未来日期）需要严格验证。
所需查明信息：
1. Jack Rae 在各机构的任职起止时间是否准确。
2. 鉴于当前日期为2025年8月，其是否已于2025年6月加入Meta。
必要信源：
1. 相关公司的官方公告或新闻稿。
2. 权威科技媒体的报道。
3. Jack Rae 本人的公开声明或其更新的职业档案。
核查优先级： 高

待核查内容3： 在 DeepMind 的核心贡献：主导 Gopher 项目，并“首次将人类考试水平差距缩小一半”。

核查原因： “首次将……差距缩小一半”是强烈的、可量化的成果宣称，可能存在夸大或不准确。
所需查明信息：
1. Jack Rae 是否是 Gopher 项目的负责人或主要贡献者。
2. Gopher 原始论文或官方报告中是否有“将人类考试水平差距缩小一半”的表述或数据支持。
必要信源：
1. Gopher 项目的官方论文和技术报告。
2. DeepMind 官方博客或相关新闻稿。
核查优先级： 中

待核查内容4： 在 DeepMind 的核心贡献：提出“Chinchilla Scaling Laws”，且该研究为“2022年NeurIPS最佳论文”。

核查原因： 这是一个非常具体的学术成就，容易核实其真伪。
所需查明信息：
1. Jack Rae 是否是 Chinchilla 论文的主要作者之一。
2. 该论文是否获得了 NeurIPS 2022 的最佳论文奖（Outstanding Paper Award）。
必要信源：
1. Chinchilla 论文原文。
2. NeurIPS 2022 官方网站的获奖论文列表。
核查优先级： 高

待核查内容5： 在 DeepMind 的核心贡献：担任 Gemini 1.0/2.0 的预训练负责人。

核查原因： “预训练负责人”是关键领导角色，通常会在技术报告或官方发布中有所体现。
所需查明信息：
1. Jack Rae 是否在 Gemini 项目的贡献者名单中。
2. 是否有官方信源证实其“预训练负责人”的身份。
必要信源：
1. Gemini 1.0/2.0 的官方技术报告。
2. Google DeepMind 的官方博客或新闻稿。
核查优先级： 高

待核查内容6： 在 OpenAI 的核心贡献：为“GPT-4 核心开发者之一”。

核查原因： OpenAI 对其核心团队成员信息披露有限，此类声明需要强有力的证据。
所需查明信息：
1. OpenAI 发布的 GPT-4 技术报告或相关材料中，是否将 Jack Rae 列为核心开发者或主要贡献者。
必要信源：
1. GPT-4 官方技术报告。
2. OpenAI 官方博客或权威媒体对团队的采访报道。
核查优先级： 高

待核查内容7： 在 OpenAI 的核心贡献：提出“压缩即智能”理论。

核查原因： “压缩即智能”是一个有历史渊源的理论，将其归功于单一人物可能存在归因错误。
所需查明信息：
1. 该理论的主要提出者和发展者是谁。
2. Jack Rae 在该理论的发展中扮演了何种角色（是提出者、重要推动者还是引用者）。
必要信源：
1. 信息论、AI领域的经典学术论文和综述。
2. 关于该理论的学术讨论或历史回顾。
核查优先级： 中

待核查内容8： 在 Meta 的职位：被扎克伯格亲自挖角，成为“超级智能实验室（Meta Superintelligence Labs）”核心成员。

核查原因： 涉及具体人物（扎克伯格）的直接行为和未经验证的实验室名称，可能包含虚构或夸大成分。
所需查明信息：
1. Meta 是否已宣布成立名为“Meta Superintelligence Labs”的机构。
2. 是否有可靠信源报道 Jack Rae 被扎克伯格招募并加入该团队。
必要信源：
1. Meta 官方新闻发布中心或扎克伯格的公开声明。
2. 《华尔街日报》、《彭博社》等顶级财经或科技媒体的报道。
核查优先级： 高

待核查内容9： 总结性陈述：“Jack Rae 是目前极少数横跨 DeepMind、OpenAI、Meta 三大顶尖AI机构的技术领袖之一”。

核查原因： 这是一个基于前述所有事实的结论，其成立与否依赖于前面各项内容的真实性。
所需查明信息：
1. 在确认其任职经历真实性的前提下，是否存在其他同样横跨这三家机构的技术领袖，以评估“极少数”这一说法的准确性。
必要信源：
1. 对AI行业顶尖人才流动的深度报道和分析。
2. 综合前面所有步骤的核查结果。
核查优先级： 低

行动计划

首先进行基础性核查，通过综合网络搜索，查询“Jack Rae”结合“AI”、“DeepMind”、“Google Scholar”等关键词，以确认该人物在专业领域的真实存在性。
系统性地检索学术数据库，查找名为“Jack Rae”的作者发表的论文，重点关注与Gopher、Chinchilla、Gemini相关的出版物，核对作者身份、所属机构和发表时间。
访问NeurIPS等顶级会议的官方网站，查询过往获奖名单，以验证“Chinchilla”论文是否获得2022年最佳论文奖。
检索DeepMind、OpenAI、Google、Meta的官方博客、新闻稿和技术报告，查找有关Jack Rae的聘用、职位、项目贡献（特别是GPT-4和Gemini）以及离职的官方信息。
检索权威科技媒体和商业媒体（如Bloomberg, Reuters, The Verge等）的存档，查找关于Jack Rae职位变动、项目角色以及Meta成立“超级智能实验室”和相关人事任命的报道。
针对“压缩即智能”理论，进行专题性学术检索，追溯其理论源头和主要贡献者，以确定报告中归因的准确性。
综合所有已查明的信息，对各项待核查内容进行交叉验证，特别注意时间线的一致性和角色描述的准确性，最后评估总结性陈述的合理性。

让我首先搜索 Jack Rae 的基本信息，确认他在 AI 领域的存在。

Jack Rae AI researcher DeepMind Google Scholar

mparticle.uc.cn

最新爆料，谷歌DeepMind首席研究员Jack Rae已经确认转投。另有消息称，Meta还招募了AI语音初创公司Sesame AI的ML主管 Johan Schalkwyk。前一天刚被爆出，他想斥巨资150亿美元收购Scale AI，后者做数据标注已经做到了百亿美元估值。据说，他们28岁华人创始人Alexandr Wang与团队会直接并入Meta。在过去一段时间里，这位CEO一直在家里与一些技术大佬见面。好好好，看出...

m.bilibili.com

谷歌DeepMind首席科学家Jack Rae今年4月在The Cognitive Revolution中表示，他们在Gemini大语言模型中越来越依赖强化学习来提高推理能力。视频播放量 393、弹幕量 0、点赞数 7、投硬币枚数 0、收藏人数 3、转发人数 0,视频作者智猩猩,作者简介 AI与硬科技知识分享社区，相关视频：图灵奖得主杨立昆:AI我只关注这四个方向，大语言模型已经没意思了，5月19日｜最新全球AI日报...

智东西

智东西6月12日消息，据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。Meta公司挖掘的另一位AI语音初创公司Sesame AI的机器学习主管乔汉·萨尔克维克（Johan Salqvist）还未做任何回应。Meta正组建一支新团队，旨在攻克通用人工智能（AGI）...

腾讯

Jack Rae是DeepMind的研究总监，同时也是谷歌Gemini Flash Thinking的主要负责人之一。在访谈中，Shazeer和Rae基本上对AI大模型和算法当下最前沿的问题交换了看法，也表达了对目前AI大模型评测标准（Evals）的反思—如果研究人都把时间花在去解“评测题”可能会偏离打造 AGI 的核心方向。而对于开源模型和闭源模型，两位研究员也承认，过去一年开源模型的能力也在迅速提...

搜狐网

今日头条

m.bilibili.com

Google DeepMind 研究总监 Jack Rae“我觉得 ARC-AGI 评估被夸大了”AGI 的进展可能比预期的要慢，因为研究人员更关注“自然语言模型”，而不是解决“特定类型的难题”,视频播放量 74、弹幕量 0、点赞数 2、投硬币枚数 0、收藏人数 2、转发人数 0,视频作者 Nature_is_All,作者简介 Nature is Amazing，关注，点赞，走起，让更多人看到，成为粉丝及时看到更多有趣视频隶属于 Luann AI，相关...

deepmind.google

Research. A new model and dataset for long-range memory. Published: 10 February 2020; Authors. Jack Rae, Timothy Lillicrap. Share. Copy link. ×.

View all 15 sources

我需要继续搜索更多关于Jack Rae的信息，特别是他的教育背景和职业经历。

Jack Rae AI researcher education background Bristol UCL CMU

微博

此前在Meta工作11年，涉及HHVM、Hack、Flow、Redex、性能工具和机器学习等领域。杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。任宏宇（Hongyu Ren）：GPT-4o、40-mini、o1-mini、o3和o4-mini的共同创造者。此前在Open-AI领导一个后训练团队。约翰·沙尔克维克（Johan Schalkwyk...

搜狐网

Jack Rae 指出，推理模型是 AI 发展的新范式，推理模型并非追求即时响应，而是通过增加推理时的思考时间来提升答案质量，这导致了一种新的 Scaling Law，“慢思考”模式是提升 AI 性能的有效途径。Jack Rae 认为长语境对于推理至关重要，“延长上下文、加速运算、降低成本”这一循环过程将持续推进。Jack Rae 表示，无限语境可能存在一个模糊的边界。用户真正需要的或许不是无限的输入长度，而是模型能够...

今日头条

北京时间6月18日，OpenAI 首席执行官 Sam Altman在周二与其兄弟Jack Altman共同发布的播客中透露，为了挖角OpenAI顶级AI人才，Meta曾开出1亿美元的签约奖金，但是迄今为止未能成功。为了在AI上有所作为，Meta最近大举招聘，并试图从竞争对手挖角顶尖人工智能研究人员，以充实Meta 新的超级智能团队。此前，Meta斥资143亿美元投资人工智能初创企业 Scale AI，并挖来该公司CEO Alexa Wa...

next.hyper.ai

这个团队还在不断壮大中，根据彭博社报道，前谷歌DeepMind研究员Jack Rae与曾担任初创企业Sesame机器学习领导者的Johan Schalkwyk也加入了扎克伯格的队伍。这些动作显示了扎克伯格在构筑AI领域的决心，尤其是在超级智能模型上发力，希望能够在未来的技术竞争中占据有利位置。目前，Meta还没有公开任何与其AI推理能力相关的模型。然而，OpenAI和其他竞争者如Google及DeepSeek在去年已经推出了一系列高性能的AI

搜狐网

Meta从谷歌等多间科技公司挖角顶尖工程师，正积极打造人工通用智能(AGI)团队。Meta从Google DeepMind挖角首席研究员Jack Rae及语音AI初创Sesame AI的机器学习主管Johan Schalkwyk。同时，Meta拟投资数十亿美元人工智能语音初创Scale AI，并邀其首席执行官Alexandr Wang加入团队，最早可能于本周宣布。

证券之星

Jack Rae：我们一直在寻找基准测试，然后我们会发现，即使它在数学中，我们也会觉得：“好吧，我们已经做了很多类似数学的基准测试，但也许像…你知道，只用AI来回答问题。这些仍然被认为是具有挑战性的，然后它们突然变得完全饱和，我们真的不再关心它们了，它们也很小。我们会想：“为什么我们要花时间在这些上面？这在几个月前还被认为是非常困难的，也许对模型来说太难了，然后它突然变得微不足道。所以，现在我确实觉得，DeepMind、谷歌整体作为一个工具开发有用的基准测试，看到其他

腾讯

4.当谈到AGI的风险时，Jack Rae表示他有点担心，但认为只要尊重AI，它们会学会尊重人类。5.最后，他们建议关注编程和代理能力的结合，以利用AI在提高生产力和构建更具人性化AI方面的潜力。以上内容由腾讯混元大模型生成，仅供参考作者：MD 出品：明亮公司近日，Noam Shazeer和Jack Rae在美国红点创投（Redpoint‘s Venture）的播客《无监督学习》中与红点创投合伙人Jacob Eff...

雪球

Jack Rae：谷歌（），。“”（），道，像AI这样的东西，比如AI的错误修复或代码审查，它们已经可以在一天内被拉取进来。（），。当我们的代码库有非常明确的定义方式时，比如我们有构建规则等，一切都能很好地融合在一起。我们代码库中的许多库可以被快速迭代。Jacob Effron：这些模型在易于验证的领域表现得很好，编程和数学显然是其中之一。那么，对于这些不太容易验证的领域，你们如何看待这些模型最终的扩展和实用性？Noam Shazeer：它们在这些领域也...

View all 10 sources

我需要继续搜索Jack Rae的学术背景和职业经历，特别是与Chinchilla和Gopher相关的工作。

Jack Rae Chinchilla Gopher NeurIPS best paper 2022

微博

知乎

这篇论文是发表在NIPS 22上一篇从自然语言指令模仿学习行为策略的文章，其核心是通过一个关于任务的自然语言指令，自动地将其分解成可以执行的、可解释性强的skill，然后引导下游策略的模仿学

m.67665588.com

久保由利香,Yurika,Kubo,粕谷大介瀬戸康史,武田航平,加藤慶祐,柳沢なな,高橋優,小池里奈,熊井幸平,松田賢二,小越勇輝,滝川英治,山本匠馬,関俊彦,遊佐浩二,てらそままさき（寺杣昌紀）,鈴村健一,秋山莉奈,上野亮,押川善文黛拉格·坎贝尔,马特·约翰逊,西蒙·雷诺兹,Georgia,Tannis,Jim,Phu,Alejandra,Zapico,Cristina,Paoletta,Kinshuk,Dhingra,Linda,Theresa,Young,Jack,Nguyen ...

m.bilibili.com

【Jack英语站】Carly Rae Jepsen-Call Me Maybe 中英字幕 721播放·总弹幕数1 2019-10-13 02:38:22 正在缓冲. 播放器初始化. 加载视频内容. 16 2 25 8 动态微博 QQ QQ空间贴吧将视频贴到博客或论坛视频地址嵌入代码微信扫一扫分享稍后再看稿件投诉笔记未经作者授权，禁止转载翻译&字幕：Jack Cheung 我滴微信公众号“Jack英语站”里面会持续更...

canary.movie.test.maoyan.com

饰：Jack Watts,1 episode,2012, 饰：Dana Laskowski,1 episode,2012, 饰：Dennis,1 episode,2012, 饰：Mrs.Gilchrist,1 episode,2012, 饰：Will hartley,1 episode,2012, 饰：Latonia,1 episode,2012, 饰：Murder Suspect,1 episode,2012, 饰：Corey Davis,1 episode,2012, 饰：Warren...

青夏教育

2.Young as little Jack is,he is fond of science fiction movies.He especially likes to watch movies that are set in space adventures. 3.Mr.Smith,a college teacher in Guangdong University,enjoys watching those movies which reflect the theme that jus...

猫眼电影

Jack McBrayer 饰：Brian(segment"iBabe") 阿西夫·曼德维 Aasif Mandvi 饰：Robert(segment"iBabe") Zach Lasry 饰：Boy(segment"iBabe") Darby Lynn Totten Darby Lynn Totten 饰：Woman(segment"iBabe") Marc Ambrose 饰：Chappy(segment"iBabe") 贾斯汀·朗 Justin Long 饰：Fake Robin(s...

雷峰网

而在 OpenAI 之外，Meta 同样高价吸引了 Apple 和 DeepMind 的核心人才：包括 Apple Foundation Models 负责人 Ruoming Pang（据称签约价高达 2 亿美元），DeepMind 前研究员 Jack Rae 与 Pei Sun，以及来自 Anthropic 的 Joel Pobar 和 Sesame AI 的 Johan Schalkwyk，也都陆续加盟。而网传 Meta 的最新挖人动向，则是他们又成功挖走了 Open...

View all 16 sources

我需要继续搜索Jack Rae在OpenAI的工作经历和他提出的"压缩即智能"理论。

Jack Rae OpenAI compression is intelligence theory

微博

2月28日，OpenAI 的核心研发人员 Jack Rae 在参加 Stanford MLSys Seminar 的访谈时进行了一个名为 Compression for AGI（链接见文末）的主题分享，其核心观点为：AGI 基础模型的目标是实现对有效信息最大限度的无损压缩。并同时给出了为什么这个目标是合理的逻辑分析，以及 OpenAI 是如何在这个目标下开展工作的行动原则。我们认为这个不到 1 小时的 talk 是从去年 11 月 Ch...

xie.infoq.cn

Compression for AGI 其来源是 Jack Rae（OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 LLM 研究组）的一次直播分享： https://www.youtube.com/watch?v=dO4TPJkeaaU 他的观点是：大语言模型是一个无损压缩器。有启发的点包括：为什么特德.姜认为“有损压缩”，而 Jack Rae 说的是“无损压...

知乎

23年OpenAI的Jack Rae、Ilya都在talk中分享过压缩通往智能的观点，DeepMinde也在ICLR上验证了LLM强大的通用压缩能力。在此笔者结合相关资料、文章与个人 ...

腾讯

在最近OpenAI的学术分享会中，Jack Rae提出了一个重要的论断：大语言模型的本质，其实是一个性能强大的数据无损压缩器。LLM=Compression 这个论断并不是很直观，但却揭开了“通用人工智能”非常重要的一角，值得高度重视。为了让大家理解这个观点，我们从“学习”这件事本身来探讨。上个世纪以来，人类普遍认为“学习”是一种人类特有的才能，机器无法真正地掌握“学习能力”。随着深度神经网络技术的发展，人们通过构建“人工神经元”来模拟大脑中的“生物神经...

CSDN技术社区

2月28日，OpenAI 的核心研发人员 Jack Rae 在参加 Stanford MLSys Seminar 的访谈时进行了一个名为 Compression for AGI 的主题分享，其核心观点为：AGI 基础模型的目标是实现对有效信息最大限度的无损压缩。并同时给出了为什么这个目标是合理的逻辑分析，以及 OpenAI 是如何在这个目标下开展工作的行动原则。我们认为这个不到 1 小时的 talk 是从去年 11 月 ChatGPT ...

腾讯

如果您需要更深入的了解，请在阅读完本文后参考Jack Rae(OpenAI 团队核心成员)在斯坦福的主题演讲：Compression for AGI(https://www.youtube.com/watch?v=dO4TPJkeaaU),这也是本文的主要参考文献。1 从机器学习说起机器学习（Machine learning），对于非计算机背景的朋友来说或许有些陌生。但是我们生活的方方面面都已经被机器学习覆盖。比如，当我们打开知乎、抖音、小红书等软件的...

nxrte.com

主讲人：Jack Rae（OpenAI）视频链接：https://www.youtube.com/watch?v=dO4TPJkeaaU 背景：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型(LLM)研究组。演讲主题深入思考基础模型的训练目标思考我们正在做什么，为什么这样做是有道理的，局限性是什么要点找到解决感知问题的最小描述长度...

知乎

压缩即智能。为了证明这个观点，作者用下游任务(knowledge and commonsense, coding, mathematical reasoning等等)的指标来表示智能。在12个ben

https://s2.zimgs.cn/ims?kt=url&at=smstruct&key=aHR0cHM6Ly9ndy5hbGljZG4uY29tL0wxLzcyMy8xNTg0NDEyNDQ0LzgzLzdkLzI2LzgzN2QyNjBiZGI3ODg0NzQxYWNhMWMzZGQ5NWQwMTc2Lmljbw==&sign=yx:FgUrPkD016Rc89-ZBz7SYezxGYQ=&tv=400_400

View all 16 sources

已核查结论清单

已核查内容1：Jack Rae 的职业经历可以分为以下几个阶段：...DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）

初步结论：基本属实。已确认 Jack Rae 是一位真实存在的AI研究员，其职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta。时间点也与各方报道基本吻合。
信源支撑要点：
1. OpenReview个人资料显示其2014-2017年在DeepMind任职。
2. 多篇2023年的文章和演讲摘要（如Stanford MLSys Seminar）证实其在OpenAI任职并担任团队负责人。
3. 多篇2025年2-3月的访谈和报道（如搜狐网、证券之星）证实其在Google DeepMind担任首席科学家/研究总监，负责Gemini项目。
4. 多篇2025年6-7月的新闻报道（如智东西、新浪微博）及扎克伯格的内部备忘录内容显示，他已于近期加入Meta。
证据：
1. 原文摘录：Research Engineer. DeepMind (google.com). 2014 – 2017.
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
2. 原文摘录：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
3. 原文摘录：谷歌DeepMind首席研究员Jack Rae已经确认转投。
  链接：https://mparticle.uc.cn/article.html?uc_param_str=frdnsnpfvecpntnwprdssskt#!wm_aid=dd59fce97226640553e0b5a3e8f94ca4!!wm_id=b0f3e97fa61e4c42aa10e6076fd3ace9
  信源名称：UC头条
  信源发布时间：2025-06-12
4. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
信源结论一致性：一致
是否需复核：否

已核查内容2：博士：英国伦敦大学学院（UCL），计算机科学方向

初步结论：属实。
信源支撑要点：
1. 伦敦大学学院（UCL）的在线知识库中可以找到署名为 Jack William Rae 的博士论文，题为《Towards Lifelong Reasoning with Sparse and Compressive Memory》。
证据：
1. 原文摘录：I, Jack William Rae confirm that the work presented in this thesis is my own.
  链接：https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf
  信源名称：UCL Discovery
  信源发布时间：unknown
信源结论一致性：一致
是否需复核：否

已核查内容3：主导项目：...Chinchilla：提出“Chinchilla Scaling Laws”

初步结论：部分属实。Jack Rae 是 Chinchilla 论文《Training Compute-Optimal Large Language Models》的作者之一，该论文提出了关于计算最优训练的扩展法则。但将他个人描述为“提出者”可能不准确，该论文有22位作者，他是资深作者之一（倒数第三位），而非第一作者。
信源支撑要点：
1. arXiv上的论文预印本和NeurIPS 2022的会议论文集均列出 Jack W. Rae 为作者。
证据：
1. 原文摘录：Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre
  链接：https://arxiv.org/abs/2203.15556
  信源名称：arXiv.org
  信源发布时间：2022-03-29
信源结论一致性：一致
是否需复核：否

已核查内容4：Chinchilla...2022年NeurIPS最佳论文

初步结论：不属实。
信源支撑要点：
1. NeurIPS 2022 官方博客发布的获奖论文名单中，并不包含《Training Compute-Optimal Large Language Models》（Chinchilla论文）。
证据：
1. 原文摘录：We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ...
  链接：https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/
  信源名称：The NeurIPS Blog
  信源发布时间：2022-11-21
信源结论一致性：一致
是否需复核：否

已核查内容5：Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人

初步结论：基本属实。多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色，负责预训练和推理技术。
信源支撑要点：
1. 引用自Wired获取的扎克伯格备忘录的微博文章明确指出其为“Gemini的预训练技术负责人，负责Gemini 2.5的推理工作”。
2. 2025年2月的搜狐网访谈报道称其为“Gemini 推理项目的主要负责人之一”。
证据：
1. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
2. 原文摘录：我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。
  链接：https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a
  信源名称：搜狐网
  信源发布时间：2025-02-27
信源结论一致性：一致
是否需复核：否

已核查内容6：背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一

初步结论：基本属实。多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。
信源支撑要点：
1. 智东西、今日头条等多家媒体在2025年6月的报道中均提及此事，并引用彭博社作为信源。
证据：
1. 原文摘录：据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。
  链接：https://m.zhidx.com/p/485447.html
  信源名称：智东西
  信源发布时间：2025-06-12
信源结论一致性：一致
是否需复核：否

待补充核查清单

待核查内容1：GPT-4 核心开发者之一，参与预训练与推理系统设计

核查原因：
1. 这是报告中一项非常重大的贡献声明，但目前缺乏任何直接证据支持。
2. OpenAI 对 GPT-4 的贡献者名单披露非常有限，需要仔细核查其官方技术报告。
所需查明信息：
1. OpenAI 发布的 GPT-4 技术报告中，贡献者名单或致谢部分是否包含 Jack Rae。
2. 是否有其他权威信源（如OpenAI官方博客、核心成员访谈）能证实其参与了GPT-4的开发。
必要信源：
1. GPT-4 官方技术报告。
2. OpenAI 官方博客或新闻发布。
核查优先级：高

待核查内容2：主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半

核查原因：
1. “主导项目”的说法目前仅来自一个二次引用的信源，需要原始论文佐证其具体角色。
2. “首次将人类考试水平差距缩小一半”是一个非常具体且强烈的成果宣称，必须在原始论文或官方公告中找到明确的数据或表述支持。
所需查明信息：
1. Gopher 论文的作者列表，以评估 Jack Rae 的贡献权重。
2. Gopher 论文或 DeepMind 官方介绍中，是否有关于在人类水平考试上“差距缩小一半”的量化分析或结论。
必要信源：
1. Gopher 项目的原始论文（《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》）。
2. DeepMind 官方博客或关于 Gopher 的新闻稿。
核查优先级：高

待核查内容3：提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能

核查原因：
1. 已确认 Jack Rae 在 OpenAI 期间做过相关主题的演讲，但他是否为该理论的“提出者”存疑。“压缩即智能”的思想在信息论和AI领域有更长的历史渊源。
所需查明信息：
1. “压缩即智能”或类似概念（如 Kolmogorov 复杂性、最小描述长度原则）的学术源头。
2. 厘清 Jack Rae 在此理论上的具体贡献是“首次提出”、“重要发展者”还是“在LLM时代的杰出倡导者和实践者”。
必要信源：
1. 信息论、算法信息论相关的经典学术论文或综述。
2. 对该理论发展史的学术回顾文章。
核查优先级：中

待核查内容4：本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）

核查原因：
1. 目前仅有一个信源（OpenReview）提及此信息，且其时间标注（2008-2017）存在明显歧义，可靠性不足。
所需查明信息：
1. Jack Rae 确切的本科和硕士毕业院校及时间。
必要信源：
1. Jack Rae 的个人领英（LinkedIn）页面。
2. 其他官方或半官方的个人简介（如个人网站、学术会议介绍）。
核查优先级：低

待核查内容5：职位：超级智能实验室（Meta Superintelligence Labs）核心成员

核查原因：
1. “Meta Superintelligence Labs”这一实验室名称在部分新闻报道中出现，但需要确认这是否为Meta官方公布的正式名称。
所需查明信息：
1. Meta 官方是否发布过关于成立“Meta Superintelligence Labs”的公告。
必要信源：
1. Meta 官方新闻中心（Meta Newsroom）。
2. Mark Zuckerberg 的官方社交媒体账号公告。
核查优先级：低

行动计划

核心贡献核查（高优先级）：首先，集中查找并审阅 OpenAI 的 GPT-4 技术报告和 DeepMind 的 Gopher 原始论文。目标是验证报告中关于 Jack Rae 在这两个项目中扮演“核心开发者”和“主导者”角色的说法，并核实 Gopher “将差距缩小一半”的具体成果宣称。
理论归因核查（中优先级）：其次，进行专题检索，调查“压缩即智能”理论的学术发展史，追溯其关键提出者和里程碑式论文，以准确界定 Jack Rae 在该理论发展中的位置和贡献。
履历细节完善（低优先级）：接着，通过搜索 Jack Rae 的个人职业社交页面（如LinkedIn）或在其他学术平台上的个人简介，来交叉验证并澄清其本科和硕士教育背景。
机构名称确认（低优先级）：最后，检索 Meta 官方渠道（官网新闻、扎克伯格公开声明），确认“Meta Superintelligence Labs”是否为官方正式名称。

让我首先搜索关于Jack Rae在GPT-4项目中的角色。

Jack Rae GPT-4 OpenAI technical report contributors

稀土掘金

这就是引诱 OpenAI 发新版 GPT-4o，然后再用更好的模型来狙击它。谷歌的研究人员们显然很得意。谷歌 DeepMind 首席科学家 Jack Rae，称这是一场有趣的后训练“闪电战”。暗示 post-training 的迭代速度要比预训练快得多。DeepMind 研究副总裁 Oriol Vinyals 还隔空发问奥特曼：最近还有提交计划吗？火药味很足，也很自信啊。所以“1121”有多强？来看具体性能。代码/推理/视觉理解全提升 ...

知乎

GPT-4 是 OpenAI 发布的新版大规模、多模态的模型，相比 ChatGPT（GPT-3.5 ）版本，新支持了图片输入。 GPT-4 基础模型还是基于 Transformer 的大规模预训练

搜狐网

这就是引诱OpenAI发新版GPT-4o，然后再用更好的模型来狙击它。谷歌的研究人员们显然很得意。谷歌DeepMind首席科学家Jack Rae，称这是一场有趣的后训练“闪电战”。暗示post-training的迭代速度要比预训练快得多。DeepMind研究副总裁Oriol Vinyals还隔空发问奥特曼：最近还有提交计划吗？火药味很足，也很自信啊。所以“1121”有多强？来看具体性能。代码/推理/视觉理解全提升按照官方说法，Gemn...

虎嗅网

微博

胥超（Shuchao Bi）：GPT-4o语音模式和o4-mini的共同创造者。此前在Open-AI领导多模态后训练工作。张慧雯（Huiwen Chang）：GPT-4o图像生成的共同创造者，此前在谷歌研究院发明了MaskIT和Muse文本到图像架构。冀林（Ji Lin）：参与构建了O3/04-mini、GPT-4o、GPT-

澎湃新闻

这就是引诱OpenAI发新版GPT-4o，然后再用更好的模型来狙击它。谷歌的研究人员们显然很得意。谷歌DeepMind首席科学家Jack Rae，称这是一场有趣的后训练“闪电战”。暗示post-training的迭代速度要比预训练快得多。DeepMind研究副总裁Oriol Vinyals还隔空发问奥特曼：最近还有提交计划吗？火药味很足，也很自信啊。所以“1121”有多强？来看具体性能。代码/推理/视觉理解全提升按照官方说法，Gemn...

知乎

在这个历史性的时刻，回答个问题，留下自己作为历史见证人的足迹。 GPT4的技术报告里很明确地指出了三个新的方向：第一，LLM最前沿研究的封闭化或小圈子化。技术报告里说了，出于竞争以及安全等方面的

arxiv.org

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

View all 15 sources

我需要继续搜索Jack Rae在Gopher项目中的角色和贡献。

Jack Rae Gopher language model DeepMind paper

文件

[PDF] The Mathematics of DeepMind Models

智源社区

近年来，国内外各大 AI 巨头的大规模语言模型（large language model，LLM）一波接着一波，如 OpenAI 的 GPT-3、智源研究院的悟道 2.0 等。大模型已然成为社区势不可挡的发展趋势。然而，当前语言模型存在着一些问题，比如逻辑推理较弱。那么，我们是否可以仅通过添加更多数据和算力的情况下改进这些问题呢？或者，我们已经达到了语言模型相关技术范式的极限？今日，DeepMind「一口气」发表了三篇论文，目的之一就是解决当前语言模型存在的问题。DeepMind 得出的结...

知乎

u200b《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》 u200b 论文： https:

腾讯

作者：Jack W.Rae,Sebastian Borgeaud,Trevor Cai 等论文地址：https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf 摘要：DeepMind 用一篇 118 页的论文介绍了全新的语言模型 Gopher 及其 Gopher 家族。在...

m.bilibili.com

阿里云

Rae et al.,2021)作出攻击性回复的文本。DPG 通过以手写文本前缀或 prompt 为条件，使用 Gopher LM 生成对话话语。Gopher LM 则是一个预训练的、从左到右的 280B 参数 transformer LM，并在互联网文本等数据上进行了训练。攻击性语言：仇恨言论、脏话、性骚扰、歧视性语言等数据泄露：从训练语料库中生成有版权或私人可识别信息联系信息生成：引导用户发送不必要的邮件或给真人打电话分布式偏见（distrib...

电子发烧友

增大模型依然有用，DeepMind用2800亿参数的Gopher，测试语言系统极限近年来，国内外各大 AI 巨头的大规模语言模型（large language model，LLM）一波接着一波，如 OpenAI 的 GPT-3、智源研究院的悟道 2.0 等。大模型已然成为社区势不可挡的发展趋势。然而，当前语言模型存在着一些问题，比如逻辑推理较弱。那么，我们是否可以仅通过添加更多数据和算力的情况下改进这些问题呢？或者，我们已经达到了语言模型相关技术范式的极限？Go...

搜狐网

事实上，代码预训练是大型语言模型常用的策略，例如PaLM（Chowdhery et al.,2022）、Gopher（Rae et al.,2021）和Chinchilla（Hoffmann et al.,2022），它不仅可以提升代码理解和生成的能力，还可以提高长距离语境理解，并带来思维链推理的新兴能力（Wei et al.,2022b）。具体而言，该模型可通过一些示例生成推理过程本身，从而提高回答问题的准确性。代码预训练有助于模型获得这些能力的原因，有待通过更详细的实...

View all 17 sources

我需要查找更多关于Jack Rae在Gopher项目中的角色，以及Gopher模型的具体成就。

Gopher language model human reading comprehension gap halved

智源社区

Chinchilla 在各种下游评估任务中都显著优于 Gopher(280B)、GPT-3(175B)、Jurassic-1(178B)和 Megatron-Turing NLG(530B)。这也意味着 Chinchilla 在微调和推理时使用的计算资源大大减少，极大地方便了下游应用。LLaMA：开放、高效的基础语言模型 LLaMA:Open and Efficient Foundation Language Models 作者：Mistral AI 联合创始人 Guil...

cnblogs.com

论文地址:《Language Models are Few-Shot Learners》 Abstract 提出问题：最近的许多研究都表明 pre-train模型搭配下游任务fine-tune 在许多情况下效果显著，但是微调过程需要大量的样本。这一框架不符合人类的习惯，人类只需要少量的示例或说明便能适应一个新的NLP下游任务。主要贡献：本文证明了通过增大参数量就能让语言模型显著提高下游任务在 Few-shot（仅给定任务说明和少量示例）设置下的性能。（笔者注：证明了大规模语言...

腾讯云

摘要：Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.In this paper,we present an analysis of Transformer-bas...

学术百科

提供全面的“Gopher”相关文献（论文）下载，论文摘要免费查询，Gopher论文全文下载提供PDF格式文件。Gopher中文、英文词汇释义（解释），“Gopher”各类研究资料、调研报告等。

博客

项目源代码中提到了一些待办事项，包括对编码数据的解码、实现 URL 选择器以及解析 gopher:/链接的功能。标题中提到的 'gopher 客户端和服务器在运行，所以你可以像 1993 年一样聚会'，意指使用 go-pher 项目可以让你体验到1993年的互联网检索方式。这不仅是一个怀旧的体验，也对研究早期互联网协议和其发展过程提供了实践平台。在描述中提到的待办事项，暗示了该项目尚未完成的所有功能。编码数据的解码涉及将特定格式（如uuencode）的数据转换回其原始形式...

头条百科

一、Gopher是Internet上一个非常有名的信息查找系统，它将Internet上的文件组织成某种索引，很方便地将用户从Internet的一处带到另一处。在WWW出现之前，Gopher是Internet上最主要的信息检索工具，Gopher站点也是最主要的站点，使用tcp70端口。但在WWW出现后，Gopher失去了昔日的辉煌。现在它基本过时，人们很少再使用它。

PHP中文网

Go语言的吉祥物Gopher是只绿色土拨鼠，代表着该语言的简洁、高效和可扩展性。Gopher从2009年开始出现在Go语言的文档和宣传材料中，并被用作图标和徽章、贴纸和商品、以及活动吉祥物。一个使用Gopher创建Go模块的实战案例如下：packagemainimport"fmt"funcmain(){fmt.Println("HellofromGopher!}。Gopher已成为Go社区不可或缺的一部分，激励着开发人员使用Go创造出色...

查字典网

[英[ˈgəʊfə(r)]美[ˈgoʊfə(r)]] gopher的意思、解释复数形式：gophers;n.囊地鼠网络囊地鼠；地鼠；浅井小钻机；跑腿

View all 16 sources

我需要查找更多关于Jack Rae在OpenAI的"压缩即智能"理论的信息。

Jack Rae OpenAI compression is intelligence theory Stanford MLSys Seminar

微博

CSDN技术社区

腾讯

xie.infoq.cn

nxrte.com

知乎

压缩即智能。为了证明这个观点，作者用下游任务(knowledge and commonsense, coding, mathematical reasoning等等)的指标来表示智能。在12个ben

m.bilibili.com

s6ep08,视频播放量 191、弹幕量 0、点赞数 0、投硬币枚数 0、收藏人数 1、转发人数 2,视频作者国胖全世界最胖,作者简介喜爱的各种cp剪辑，主打悬疑推理剧。相关视频：Jack&Claire; Claire,help me,please?I'm sorry!Jack&Claire; 地下车库谈话，Jack&Claire; you are the one who want to know the truth，Jack&Clair..;.

View all 16 sources

已核查结论清单

已核查内容1：Jack Rae 的职业经历可以分为以下几个阶段：...DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）

初步结论：基本属实。已确认Jack Rae是一位真实存在的AI研究员，其职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta。时间点与各方报道基本吻合。
信源支撑要点：
1. OpenReview个人资料显示其2014-2017年在DeepMind任职。
2. 多篇2023年的文章和演讲摘要（如Stanford MLSys Seminar）证实其在OpenAI任职并担任团队负责人。
3. 多篇2025年2-3月的访谈和报道（如搜狐网、证券之星）证实其在Google DeepMind担任首席科学家/研究总监，负责Gemini项目。
4. 多篇2025年6-7月的新闻报道（如智东西、新浪微博）及据称为扎克伯格的内部备忘录内容显示，他已于近期加入Meta。
证据：
1. 原文摘录：Research Engineer. DeepMind (google.com). 2014 – 2017.
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
2. 原文摘录：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
3. 原文摘录：谷歌DeepMind首席研究员Jack Rae已经确认转投。
  链接：https://mparticle.uc.cn/article.html?uc_param_str=frdnsnpfvecpntnwprdssskt#!wm_aid=dd59fce97226640553e0b5a3e8f94ca4!!wm_id=b0f3e97fa61e4c42aa10e6076fd3ace9
  信源名称：mparticle.uc.cn
  信源发布时间：2025-06-12
4. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
信源结论一致性：一致
是否需复核：否

已核查内容2：博士：英国伦敦大学学院（UCL），计算机科学方向

初步结论：属实。
信源支撑要点：
1. 伦敦大学学院（UCL）的在线知识库中可以找到署名为 Jack William Rae 的博士论文，题为《Towards Lifelong Reasoning with Sparse and Compressive Memory》。
证据：
1. 原文摘录：I, Jack William Rae confirm that the work presented in this thesis is my own.
  链接：https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf
  信源名称：UCL Discovery
  信源发布时间：unknown
信源结论一致性：一致
是否需复核：否

已核查内容3：Chinchilla：提出“Chinchilla Scaling Laws”...2022年NeurIPS最佳论文

初步结论：部分属实，部分不属实。Jack Rae 是 Chinchilla 论文《Training Compute-Optimal Large Language Models》的作者之一，该论文提出了关于计算最优训练的扩展法则。但将他个人描述为“提出者”可能不准确，该论文有22位作者，他是资深作者之一。此外，该论文没有获得 NeurIPS 2022 最佳论文奖。
信源支撑要点：
1. arXiv上的论文预印本列出 Jack W. Rae 为作者之一。
2. NeurIPS 2022 官方博客发布的获奖论文名单中，不包含该论文。
证据：
1. 原文摘录：Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre
  链接：https://arxiv.org/abs/2203.15556
  信源名称：arXiv.org
  信源发布时间：2022-03-29
2. 原文摘录：We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ...
  链接：https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/
  信源名称：The NeurIPS Blog
  信源发布时间：2022-11-21
信源结论一致性：一致
是否需复核：否

已核查内容4：Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人

初步结论：基本属实。多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色。
信源支撑要点：
1. 引用自Wired获取的扎克伯格备忘录的微博文章明确指出其为“Gemini的预训练技术负责人，负责Gemini 2.5的推理工作”。
2. 2025年2月的搜狐网访谈报道称其为“Gemini 推理项目的主要负责人之一”。
证据：
1. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
2. 原文摘录：我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。
  链接：https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a
  信源名称：搜狐网
  信源发布时间：2025-02-27
信源结论一致性：一致
是否需复核：否

已核查内容5：背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一

初步结论：基本属实。多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。
信源支撑要点：
1. 智东西、今日头条等多家媒体在2025年6月的报道中均提及此事，并引用彭博社作为信源。
证据：
1. 原文摘录：据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。
  链接：https://m.zhidx.com/p/485447.html
  信源名称：智东西
  信源发布时间：2025-06-12
信源结论一致性：一致
是否需复核：否

待补充核查清单

待核查内容1：GPT-4 核心开发者之一，参与预训练与推理系统设计

核查原因：
1. 这是报告中一项非常重大的贡献声明，但目前缺乏直接证据支持。
2. OpenAI 对 GPT-4 的贡献者名单披露非常有限，需要仔细核查其官方技术报告以确认其具体角色。
所需查明信息：
1. OpenAI 发布的 GPT-4 技术报告或官方贡献者名单中，是否将 Jack Rae 列为核心贡献者，以及其具体职责描述。
必要信源：
1. GPT-4 官方技术报告（https://arxiv.org/abs/2303.08774）及官方贡献者页面（https://openai.com/contributions/gpt-4）。
核查优先级：高

待核查内容2：主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半

核查原因：
1. “主导项目”的说法需要原始论文佐证其具体角色。
2. “首次将人类考试水平差距缩小一半”是一个非常具体且强烈的成果宣称，必须在原始论文或官方公告中找到明确的数据或表述支持。
所需查明信息：
1. Gopher 论文的作者列表及贡献说明，以评估 Jack Rae 的贡献权重。
2. Gopher 论文或 DeepMind 官方介绍中，是否有关于在人类水平考试上“差距缩小一半”的量化分析或结论。
必要信源：
1. Gopher 项目的原始论文（《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》）。
2. DeepMind 官方博客或关于 Gopher 的新闻稿。
核查优先级：高

待核查内容3：提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能

核查原因：
1. 已确认 Jack Rae 在 OpenAI 期间做过相关主题的演讲，但他是否为该理论的“提出者”存疑。“压缩即智能”的思想在信息论和AI领域有更长的历史渊源。
所需查明信息：
1. “压缩即智能”或类似概念（如 Kolmogorov 复杂性、最小描述长度原则）的学术源头和主要发展者。
2. 厘清 Jack Rae 在此理论上的具体贡献是“首次提出”、“重要发展者”还是“在LLM时代的杰出倡导者和实践者”。
必要信源：
1. 信息论、算法信息论相关的经典学术论文或综述（如 Ray Solomonoff, Jürgen Schmidhuber 的相关工作）。
2. 对该理论发展史的学术回顾文章。
核查优先级：中

待核查内容4：本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）

核查原因：
1. 目前仅有一个信源（OpenReview）提及此信息，且其时间标注存在明显歧义，可靠性不足。
所需查明信息：
1. Jack Rae 确切的本科和硕士毕业院校及时间。
必要信源：
1. Jack Rae 的个人领英（LinkedIn）页面。
2. 其他官方或半官方的个人简介（如个人网站、学术会议介绍）。
核查优先级：低

待核查内容5：职位：超级智能实验室（Meta Superintelligence Labs）核心成员

核查原因：
1. “Meta Superintelligence Labs”这一实验室名称在部分新闻报道中出现，但需要确认这是否为Meta官方公布的正式名称。
所需查明信息：
1. Meta 官方是否发布过关于成立“Meta Superintelligence Labs”的公告。
必要信源：
1. Meta 官方新闻中心（Meta Newsroom）。
2. Mark Zuckerberg 的官方社交媒体账号公告。
核查优先级：低

行动计划

核心贡献核查（Gopher & GPT-4）：首先，集中审查 Gopher 和 GPT-4 的原始技术报告。对于 Gopher，查找作者贡献部分以确定 Jack Rae 的角色，并搜索论文全文以验证“将差距缩小一半”的说法。对于 GPT-4，仔细检查官方贡献者名单，确认其具体职责。
理论归因核查（压缩即智能）：其次，进行专题检索，调查“压缩即智能”理论的学术发展史，追溯其关键提出者和里程碑式论文，以准确界定 Jack Rae 在该理论发展中的位置和贡献。
履历与机构细节完善：最后，通过搜索 Jack Rae 的个人职业社交页面（如LinkedIn）来交叉验证其本科和硕士教育背景，并检索 Meta 官方渠道确认“Meta Superintelligence Labs”是否为官方正式名称。

让我首先检查GPT-4技术报告，查找Jack Rae的贡献。

网页解析

https://cdn.openai.com/papers/gpt-4.pdf

GPT-4 Technical Report

OpenAI*∗*

Abstract

We report the development of GPT-4, a large-scale, multimodal model which can

accept image and text inputs and produce text outputs. While less capable than

humans in many real-world scenarios, GPT-4 exhibits human-level performance

on various professional and academic benchmarks, including passing a simulated

bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-

based model pre-trained to predict the next token in a document. The post-training

alignment process results in improved performance on measures of factuality and

adherence to desired behavior. A core component of this project was developing

infrastructure and optimization methods that behave predictably across a wide

range of scales. This allowed us to accurately predict some aspects of GPT-4’s

performance based on models trained with no more than 1/1,000th the compute of

GPT-4.

Introduction

This technical report presents GPT-4, a large multimodal model capable of processing image and

text inputs and producing text outputs. Such models are an important area of study as they have the

potential to be used in a wide range of applications, such as dialogue systems, text summarization,

and machine translation. As such, they have been the subject of substantial interest and progress in

recent years [1–34].

One of the main goals of developing such models is to improve their ability to understand and generate

natural language text, particularly in more complex and nuanced scenarios. To test its capabilities

in such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans. In

these evaluations it performs quite well and often outscores the vast majority of human test takers.

For example, on a simulated bar exam, GPT-4 achieves a score that falls in the top 10% of test takers.

This contrasts with GPT-3.5, which scores in the bottom 10%.

On a suite of traditional NLP benchmarks, GPT-4 outperforms both previous large language models

and most state-of-the-art systems (which often have benchmark-specific training or hand-engineering).

On the MMLU benchmark [35, 36], an English-language suite of multiple-choice questions covering

57 subjects, GPT-4 not only outperforms existing models by a considerable margin in English, but

also demonstrates strong performance in other languages. On translated variants of MMLU, GPT-4

surpasses the English-language state-of-the-art in 24 of 26 languages considered. We discuss these

model capability results, as well as model safety improvements and results, in more detail in later

sections.

This report also discusses a key challenge of the project, developing deep learning infrastructure and

optimization methods that behave predictably across a wide range of scales. This allowed us to make

predictions about the expected performance of GPT-4 (based on small runs trained in similar ways)

that were tested against the final run to increase confidence in our training.

Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 37, 38]: it is not fully

reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn

*∗*Please cite this work as “OpenAI (2023)". Full authorship contribution statements appear at the end of the

document. Correspondence regarding this technical report can be sent togpt4-report@openai.com

arXiv:submit/4812508 [cs.CL] 27 Mar 2023

from experience. Care should be taken when using the outputs of GPT-4, particularly in contexts

where reliability is important.

GPT-4’s capabilities and limitations create significant and novel safety challenges, and we believe

careful study of these challenges is an important area of research given the potential societal impact.

This report includes an extensive system card (after the Appendix) describing some of the risks we

foresee around bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more.

It also describes interventions we made to mitigate potential harms from the deployment of GPT-4,

including adversarial testing with domain experts, and a model-assisted safety pipeline.

Scope and Limitations of this Technical Report

This report focuses on the capabilities, limitations, and safety properties of GPT-4. GPT-4 is a

Transformer-style model [39] pre-trained to predict the next token in a document, using both publicly

available data (such as internet data) and data licensed from third-party providers. The model was

then fine-tuned using Reinforcement Learning from Human Feedback (RLHF) [40]. Given both

the competitive landscape and the safety implications of large-scale models like GPT-4, this report

contains no further details about the architecture (including model size), hardware, training compute,

dataset construction, training method, or similar.

We are committed to independent auditing of our technologies, and shared some initial steps and

ideas in this area in the system card accompanying this release.2 We plan to make further technical

details available to additional third parties who can advise us on how to weigh the competitive and

safety considerations above against the scientific value of further transparency.

Predictable Scaling

A large focus of the GPT-4 project was building a deep learning stack that scales predictably. The

primary reason is that for very large training runs like GPT-4, it is not feasible to do extensive

model-specific tuning. To address this, we developed infrastructure and optimization methods that

have very predictable behavior across multiple scales. These improvements allowed us to reliably

predict some aspects of the performance of GPT-4 from smaller models trained using 1*,* 000*×* –

10*,* 000*×* less compute.

3.1

Loss Prediction

The final loss of properly-trained large language models is thought to be well approximated by power

laws in the amount of compute used to train the model [41, 42, 2, 14, 15].

To verify the scalability of our optimization infrastructure, we predicted GPT-4’s final loss on our

internal codebase (not part of the training set) by fitting a scaling law with an irreducible loss term

(as in Henighan et al. [15]): L(C) = aC**b + c, from models trained using the same methodology

but using at most 10,000x less compute than GPT-4. This prediction was made shortly after the run

started, without use of any partial results. The fitted scaling law predicted GPT-4’s final loss with

high accuracy (Figure 1).

3.2

Scaling of Capabilities on HumanEval

Having a sense of the capabilities of a model before training can improve decisions around alignment,

safety, and deployment. In addition to predicting final loss, we developed methodology to predict

more interpretable metrics of capability. One such metric is pass rate on the HumanEval dataset [43],

which measures the ability to synthesize Python functions of varying complexity. We successfully

predicted the pass rate on a subset of the HumanEval dataset by extrapolating from models trained

with at most 1*,* 000*×* less compute (Figure 2).

For an individual problem in HumanEval, performance may occasionally worsen with scale. Despite

these challenges, we find an approximate power law relationship −EP [log(pass_rate(C))] = *α∗C−*k

2In addition to the accompanying system card, OpenAI will soon publish additional thoughts on the social

and economic implications of AI systems, including the need for effective regulation.

Observed

Prediction

gpt-4

100p

10n

1µ

100µ

0.01

Compute

1.0

2.0

3.0

4.0

5.0

6.0

Bits per word

OpenAI codebase next word prediction

Figure 1. Performance of GPT-4 and smaller models. The metric is final loss on a dataset derived

from our internal codebase. This is a convenient, large dataset of code tokens which is not contained in

the training set. We chose to look at loss because it tends to be less noisy than other measures across

different amounts of training compute. A power law fit to the smaller models (excluding GPT-4) is

shown as the dotted line; this fit accurately predicts GPT-4’s final loss. The x-axis is training compute

normalized so that GPT-4 is 1.

Observed

Prediction

gpt-4

1µ

10µ

100µ

0.001

0.01

0.1

Compute

– Mean Log Pass Rate

Capability prediction on 23 coding problems

Figure 2. Performance of GPT-4 and smaller models. The metric is mean log pass rate on a subset of

the HumanEval dataset. A power law fit to the smaller models (excluding GPT-4) is shown as the dotted

line; this fit accurately predicts GPT-4’s performance. The x-axis is training compute normalized so that

GPT-4 is 1.

where k and α are positive constants, and P is a subset of problems in the dataset. We hypothesize

that this relationship holds for all problems in this dataset. In practice, very low pass rates are difficult

or impossible to estimate, so we restrict to problems P and models M such that given some large

sample budget, every problem is solved at least once by every model.

We registered predictions for GPT-4’s performance on HumanEval before training completed, using

only information available prior to training. All but the 15 hardest HumanEval problems were split

into 6 difficulty buckets based on the performance of smaller models. The results on the 3rd easiest

bucket are shown in Figure 2, showing that the resulting predictions were very accurate for this

subset of HumanEval problems where we can accurately estimate log(pass_rate) for several smaller

models. Predictions on the other five buckets performed almost as well, the main exception being

GPT-4 underperforming our predictions on the easiest bucket.

Certain capabilities remain hard to predict. For example, the Inverse Scaling Prize [44] proposed

several tasks for which model performance decreases as a function of scale. Similarly to a recent

result by Wei et al. [45], we find that GPT-4 reverses this trend, as shown on one of the tasks called

Hindsight Neglect [46] in Figure 3.

ada

babbage

curie

gpt-3.5

gpt-4

Model

100

Accuracy

Inverse scaling prize, hindsight neglect

Figure 3. Performance of GPT-4 and smaller models on the Hindsight Neglect task. Accuracy is shown

on the y-axis, higher is better. ada, babbage, and curie refer to models available via the OpenAI API [47].

We believe that accurately predicting future capabilities is important for safety. Going forward we

plan to refine these methods and register performance predictions across various capabilities before

large model training begins, and we hope this becomes a common goal in the field.

Capabilities

We tested GPT-4 on a diverse set of benchmarks, including simulating exams that were originally

designed for humans.4 We did no specific training for these exams. A minority of the problems in the

exams were seen by the model during training; for each exam we run a variant with these questions

removed and report the lower score of the two. We believe the results to be representative. For further

details on contamination (methodology and per-exam statistics), see Appendix C.

Exams were sourced from publicly-available materials. Exam questions included both multiple-

choice and free-response questions; we designed separate prompts for each format, and images were

included in the input for questions which required it. The evaluation setup was designed based

on performance on a validation set of exams, and we report final results on held-out test exams.

Overall scores were determined by combining multiple-choice and free-response question scores

using publicly available methodologies for each exam. We estimate and report the percentile each

overall score corresponds to. See Appendix A for further details on the exam evaluation methodology.

3For AMC 10 and AMC 12 2022 exams, the human percentiles are not yet published, so the reported numbers

are extrapolated and likely have wide uncertainty. See Appendix A.5.

4We used the post-trained RLHF model for these exams.

Exam

GPT-4

GPT-4 (no vision)

GPT-3.5

Uniform Bar Exam (MBE+MEE+MPT)

298 / 400 (~90th)

213 / 400 (~10th)

LSAT

163 (~88th)

161 (~83rd)

149 (~40th)

SAT Evidence-Based Reading & Writing

710 / 800 (~93rd)

670 / 800 (~87th)

SAT Math

700 / 800 (~89th)

690 / 800 (~89th)

590 / 800 (~70th)

Graduate Record Examination (GRE) Quantitative

163 / 170 (~80th)

157 / 170 (~62nd)

147 / 170 (~25th)

Graduate Record Examination (GRE) Verbal

169 / 170 (~99th)

165 / 170 (~96th)

154 / 170 (~63rd)

Graduate Record Examination (GRE) Writing

4 / 6 (~54th)

USABO Semifinal Exam 2020

87 / 150 (99th - 100th)

43 / 150 (31st - 33rd)

USNCO Local Section Exam 2022

36 / 60

38 / 60

24 / 60

Medical Knowledge Self-Assessment Program

75 %

53 %

Codeforces Rating

392 (below 5th)

260 (below 5th)

AP Art History

5 (86th - 100th)

AP Biology

5 (85th - 100th)

4 (62nd - 85th)

AP Calculus BC

4 (43rd - 59th)

1 (0th - 7th)

AP Chemistry

4 (71st - 88th)

2 (22nd - 46th)

AP English Language and Composition

2 (14th - 44th)

AP English Literature and Composition

2 (8th - 22nd)

AP Environmental Science

5 (91st - 100th)

AP Macroeconomics

5 (84th - 100th)

2 (33rd - 48th)

AP Microeconomics

5 (82nd - 100th)

4 (60th - 82nd)

AP Physics 2

4 (66th - 84th)

3 (30th - 66th)

AP Psychology

5 (83rd - 100th)

AP Statistics

5 (85th - 100th)

3 (40th - 63rd)

AP US Government

5 (88th - 100th)

4 (77th - 88th)

AP US History

5 (89th - 100th)

4 (74th - 89th)

AP World History

4 (65th - 87th)

AMC 103

30 / 150 (6th - 12th)

36 / 150 (10th - 19th)

AMC 123

60 / 150 (45th - 66th)

48 / 150 (19th - 40th)

30 / 150 (4th - 8th)

Introductory Sommelier (theory knowledge)

92 %

80 %

Certified Sommelier (theory knowledge)

86 %

58 %

Advanced Sommelier (theory knowledge)

77 %

46 %

Leetcode (easy)

31 / 41

12 / 41

Leetcode (medium)

21 / 80

8 / 80

Leetcode (hard)

3 / 45

0 / 45

Table 1. GPT performance on academic and professional exams. In each case, we simulate the

conditions and scoring of the real exam. We report GPT-4’s final score graded according to exam-

specific rubrics, as well as the percentile of test-takers achieving GPT-4’s score.

AP Calculus BC

AMC 12

Codeforces Rating

AP English Literature

AMC 10

Uniform Bar Exam

AP English Language

AP Chemistry

GRE Quantitative

AP Physics 2

USABO Semifinal 2020

AP Macroeconomics

AP Statistics

LSAT

GRE Writing

AP Microeconomics

AP Biology

GRE Verbal

AP World History

SAT Math

AP US History

AP US Government

AP Psychology

AP Art History

SAT EBRW

AP Environmental Science

Exam

20%

40%

60%

80%

100%

Estimated percentile lower bound (among test takers)

Exam results (ordered by GPT-3.5 performance)

gpt-4

gpt-4 (no vision)

gpt3.5

Figure 4. GPT performance on academic and professional exams. In each case, we simulate the

conditions and scoring of the real exam. Exams are ordered from low to high based on GPT-3.5

performance. GPT-4 outperforms GPT-3.5 on most exams tested. To be conservative we report the

lower end of the range of percentiles, but this creates some artifacts on the AP exams which have very

wide scoring bins. For example although GPT-4 attains the highest possible score on AP Biology (5/5),

this is only shown in the plot as 85th percentile because 15 percent of test-takers achieve that score.

GPT-4 exhibits human-level performance on the majority of these professional and academic exams.

Notably, it passes a simulated version of the Uniform Bar Examination with a score in the top 10% of

test takers (Table 1, Figure 4).

The model’s capabilities on exams appear to stem primarily from the pre-training process and are not

significantly affected by RLHF. On multiple choice questions, both the base GPT-4 model and the

RLHF model perform equally well on average across the exams we tested (see Appendix B).

We also evaluated the pre-trained base GPT-4 model on traditional benchmarks designed for evaluating

language models. For each benchmark we report, we ran contamination checks for test data appearing

in the training set (see Appendix D for full details on per-benchmark contamination).5 We used

few-shot prompting [1] for all benchmarks when evaluating GPT-4.6

GPT-4 considerably outperforms existing language models, as well as previously state-of-the-art

(SOTA) systems which often have benchmark-specific crafting or additional training protocols

(Table 2).

5During our contamination check we discovered that portions of BIG-bench [48] were inadvertently mixed

into the training set, and we excluded it from our reported results.

6For GSM-8K, we include part of the training set in GPT-4’s pre-training mix (see Appendix E for details).

We use chain-of-thought prompting [11] when evaluating.

GPT-4

GPT-3.5

LM SOTA

SOTA

Evaluated

few-shot

Evaluated

few-shot

Best external LM

evaluated few-shot

Best external model (incl.

benchmark-specific tuning)

MMLU [49]

86.4%

70.0%

70.7%

75.2%

Multiple-choice questions in 57

subjects (professional & academic)

5-shot

U-PaLM [50]

5-shot Flan-PaLM [51]

HellaSwag [52]

95.3%

85.5%

84.2%

85.6

Commonsense reasoning around

everyday events

10-shot

LLaMA (validation

set) [28]

ALUM [53]

AI2 Reasoning

Challenge (ARC) [54]

96.3%

85.2%

86.5%

Grade-school multiple choice

science questions. Challenge-set.

25-shot

8-shot PaLM [55]

ST-MOE [18]

WinoGrande [56]

87.5%

81.6%

85.1%

Commonsense reasoning around

pronoun resolution

5-shot

5-shot PaLM [3]

HumanEval [43]

67.0%

48.1%

26.2%

65.8%

Python coding tasks

0-shot

0-shot PaLM [3]

CodeT + GPT-3.5 [57]

DROP [58] (F1 score)

80.9

64.1

70.8

88.4

Reading comprehension &

arithmetic.

3-shot

1-shot PaLM [3]

QDGAT [59]

GSM-8K [60]

92.0%∗

57.1%

58.8%

87.3%

Grade-school mathematics

questions

5-shot

chain-of-thought

5-shot

8-shot Minerva [61]

Chinchilla +

SFT+ORM-RL, ORM

reranking [62]

Table 2. Performance of GPT-4 on academic benchmarks. We compare GPT-4 alongside the best

SOTA (with benchmark-specific training) and the best SOTA for an LM evaluated few-shot. GPT-4

outperforms existing LMs on all benchmarks, and beats SOTA with benchmark-specific training on all

datasets except DROP. For each task we report GPT-4’s performance along with the few-shot method

used to evaluate. For GSM-8K, we included part of the training set in the GPT-4 pre-training mix

(see Appendix E), and we use chain-of-thought prompting [11] when evaluating. For multiple-choice

questions, we present all answers (ABCD) to the model and ask it to choose the letter of the answer,

similarly to how a human would solve such a problem.

Many existing ML benchmarks are written in English. To gain an initial understanding of GPT-4’s

capabilities in other languages, we translated the MMLU benchmark [35, 36] – a suite of multiple-

choice problems spanning 57 subjects – into a variety of languages using ...

我需要继续查找GPT-4技术报告中关于贡献者的信息，我将检查报告的末尾部分。

GPT-4 technical report contributors authors Jack Rae OpenAI

稀土掘金

我们报告了 GPT-4 的开发，这是一个大规模、多模态的模型，可以接受图像和文本输入，并生成文本输出。虽然在许多现实场景中不如人类，但 GPT-4 在各种专业和学术基准测试中表现出与人类水平相当的性能，包括在模拟的律师资格考试中取得了约前10%的考生得分。GPT-4 是基于 Transformer 架构的模型，经过预训练以预测文档中的下一个标记。后训练对齐过程导致在事实性和符合所需行为方面的性能得到改...

微博

阅读了一下 GPT-4 Technical Report，个人觉得报告其实有许多值得注意的细节: 1.大量的评估都是内部的测试数据，并且没有架构，技术，算法等等方面的任何介绍 2.有不少宣传的矛盾之处，比如介绍对于同样的律师考试测试，GPT3的成绩只能排名在最后10%，而GPT-4却在前10%，如果记得之前的媒体介绍，ChatGPT可是通过律师考试的 3.即使在内部测试中，数据也有多处不一致，比如在写作成绩上，...

思否

澎湃新闻

关注大模型的机器之心机器之心报道泽南、陈陈大家都在排 waitlist。大模型正在「替代人类程序员」的道路上快速前进。今年 3 月，人工智能软件工程师 Devin 引爆了 AI 社区，该产品由 OpenAI 的 GPT-4 基础大型语言模型（LLM）提供支持，可以在收到自然语言文本指令后自主编写和编辑代码。但在生成式 AI 领域，快速发展是主旋律，现在技术又迭代了。本周，一家 Y Combinator 名为 Cosine 的创业公司宣布推出自己的全新自主 AI 工...

澎湃新闻

关注前沿科技量子位白交衡宇发自凹非寺量子位|公众号 QbitAI GPT-4o亮相没两天，背后Omni团队就被扒了个底朝天：领衔者，是D

澎湃新闻

去年 11 月 8 日，新加坡政府科技局（GovTech）组织举办了首届 GPT-4 提示工程（Prompt Engineering）竞赛。数据科学家 Sheila Teo 最终夺冠，成为最终的提示女王（Prompt Queen）。之后，Teo 发布了一篇题为《我如何赢得了新加坡 GPT-4 提示工程赛》的博客文章，慷慨分享了其获胜法门。机器之心对这篇文章进行了不改变原义的翻译与整理，以帮助我们的读者同样成为 LLM 提示工程达人。以下...

复旦发展研究院

此外，清华大学人工智能国际治理研究院副院长梁正教授也表示，“在近期GPT-4的测试报告中显示，在逻辑推理、理解认知等方面上GPT-4已经超越了某些标准测试当中人类的表现。这加剧了人们对于继续大模型训练可能导致人工智能失控的担忧。他还引用“曲别针制造机”（paperclip maximizer）思想实验，指出“目前人工智能更迫切的问题不在于其是否会产生自主意识，而在于当算力强大的AI曲解人类的指令时，可控性是否还能得到保障。对于后者，我们...

复旦发展研究院

团队分Leads、Core Contributors、Contributors、Program Leads和Overall Technical Leads五部分，其中Leads共有36人：Jeffrey Dean和Oriol Vinyals 作为Overall Technical Leads，Demis Hassabis 和 Koray Kavukcuoglu是Program Leads。Google发布大模型产品Gemini（来源：M...

View all 16 sources

我需要查找更多关于Jack Rae在Gopher项目中的角色和贡献，以及他是否参与了GPT-4的开发。

Jack Rae Gopher DeepMind author contribution

36氪

他在DeepMind呆了7年，负责了Gopher,Chinchilla,Gemini等模型。在他个人LinkedIn介绍中可以看到，他专注于数学与计算机科学中推动智能理解与创造的领域，目前研究语言模型和长程记忆。并热衷于钻研算法设计、概率论、图论、信息论和统计学等知识。学生期间，他本科毕业于布里斯托大学，硕士毕业于CMU，博士毕业于伦敦大学学院。不过Jack Rae向媒体证实将离开谷歌，还没有公开承认，最新的推文停留在对Mistral首个推理模型的祝贺上...

微博

Ji Lin：GPT-4o、GPT-4.1、o3等模型技术骨干，参与构建推理栈Operator。Joel Pobar：前Anthropic研究员，Meta老将回归，负责模型推理相关工作。Jack Rae：前DeepMind科学家，Gopher与Chinchilla作者，曾负责Gemini 2.5推理，现加入Meta。任泓宇（Ren Hongyu）：北大校友，GPT-4o/o3-mini等模型关键开发者，曾领导OpenAI后训练团队。Johan Schal...

智源社区

然而，当前语言模型存在着一些问题，比如逻辑推理较弱。那么，我们是否可以仅通过添加更多数据和算力的情况下改进这些问题呢？或者，我们已经达到了语言模型相关技术范式的极限？今日，DeepMind「一口气」发表了三篇论文，目的之一就是解决当前语言模型存在的问题。DeepMind 得出的结论是进一步扩展大规模语言模型应该会带来大量的改进。此前在一次电话简报会上，DeepMind 研究科学家 Jack Rae 表示，「这些论文的一个关键发现是大规模...

机器之心

此前在一次电话简报会上，DeepMind 研究科学家 Jack Rae 表示，「这些论文的一个关键发现是大规模语言模型依然在进展之中，能力也在继续增强。这个领域并没有停滞不前。博客地址：https://deepmind.com/blog/article/language-modelling-at-scale 这三篇论文的主题分别如下：一个具有 2800 亿参数的 transformer 语言模型 Gopher；语言模型带来的...

腾讯

Jack Rae来自DeepMind，曾主导Gopher和Chinchilla等早期LLM的开发，是Gemini和Gemini 2.5项目的预训练技术负责人。Pei Sun同样来自Google DeepMind，专注于Gemini的推理与后训练，此前还开发了Waymo自动驾驶系统的两代感知模型。Joel Pobar是少数从Anthropic“回归”的工程师，在Meta工作了11年，参与过HHVM、Hack、Flow、Redex等多个关键...

阿里云

storage.googleapis.com

Contributions. Design of model and training strategies Jack Rae, Sebastian Borgeaud, Trevor Cai, John Aslanides,. Jordan Hoffmann, Geoffrey ...

View all 15 sources

我需要查找更多关于Jack Rae在"压缩即智能"理论方面的贡献，以及他是否参与了GPT-4的开发。

Jack Rae compression is intelligence theory Stanford MLSys Seminar

凤凰网

虎嗅网

事实上，在2023年4月 DeepMind 的创始人之一 Ilya Sutskever 以及 Jack Rae 也讨论过柯氏复杂度、压缩机以及大语言模型之间的关系。以上的研究表明，语言建模和压缩可能是等价的，大语言模型是可以用来进行无损压缩的。如果一个压缩机可以完美地达到柯氏复杂度的极限，那么它很可能就是AGI。虽然数学证明表明，柯氏复杂度不可达到的，但是可以无限趋近，gzip本身是一类很好的压缩机，但是用语言模型压缩是一类更加有效的方法。1.算术编码下面我们从具体案例出发，看一下大模型...

m.bilibili.com

斯坦福【计算机视觉】讲的清楚明白！一口气刷完通俗易懂！深度学习/计算机视觉/机器学习，【斯坦福教授李飞飞专访】"AI教母"对于人工智能的未来看法如何呢？【人工智能|计算机视觉|机器学习|深度学习】，【精译｜斯坦福】弦理论和M理论(Stanford String Theory and M-Theory)，草履虫都能学会！OpenCV+YOLO 实时目标检测，计算机博士手把手带你做毕设！（深度学习丨计算机视觉丨YOLO丨OpenCV），斯坦福大学《从零开始的语言模型|CS336 Language Modelin...

machinelearning.apple.com

Authors Heinrich Peters,Alireza Hashemi,James Rae Misty:UI Prototyping Through Interactive Conceptual Blending content type paper|research area Human-Computer Interaction,research area Tools,Platforms,Frameworks|Published year 2024 Authors Yuwen L...

虎嗅网

凤凰网

哔哩哔哩

强化学习练习生, 作者简介专注于强化学习与人工智能的博士，相关视频：CMPUT 653: Theoretical ... Seminars _ Craig Boutilier.

View all 15 sources

已核查结论清单

已核查内容1：Jack Rae 的职业经历可以分为以下几个阶段：...DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）

初步结论：基本属实。已确认Jack Rae是一位真实存在的AI研究员，其职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta。时间点与各方报道基本吻合。
信源支撑要点：
1. OpenReview个人资料显示其2014-2017年在DeepMind任职。
2. 多篇2023年的文章和演讲摘要（如Stanford MLSys Seminar）证实其在OpenAI任职并担任团队负责人。
3. 多篇2025年2-3月的访谈和报道（如搜狐网、证券之星）证实其在Google DeepMind担任首席科学家/研究总监，负责Gemini项目。
4. 多篇2025年6-7月的新闻报道（如智东西、新浪微博）及据称为扎克伯格的内部备忘录内容显示，他已于近期加入Meta。
证据：
1. 原文摘录：Research Engineer. DeepMind (google.com). 2014 – 2017.
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
2. 原文摘录：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
3. 原文摘录：谷歌DeepMind首席研究员Jack Rae已经确认转投。
  链接：https://m.36kr.com/p/3333286223030536
  信源名称：36氪
  信源发布时间：2025-06-12
4. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
信源结论一致性：一致
是否需复核：否

已核查内容2：博士：英国伦敦大学学院（UCL），计算机科学方向

初步结论：属实。
信源支撑要点：
1. 伦敦大学学院（UCL）的在线知识库中可以找到署名为 Jack William Rae 的博士论文，题为《Towards Lifelong Reasoning with Sparse and Compressive Memory》。
证据：
1. 原文摘录：I, Jack William Rae confirm that the work presented in this thesis is my own.
  链接：https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf
  信源名称：UCL Discovery
  信源发布时间：unknown
信源结论一致性：一致
是否需复核：否

已核查内容3：Chinchilla：提出“Chinchilla Scaling Laws”...2022年NeurIPS最佳论文

初步结论：部分属实，部分不属实。Jack Rae 是 Chinchilla 论文《Training Compute-Optimal Large Language Models》的作者之一，该论文提出了关于计算最优训练的扩展法则。但将他个人描述为“提出者”可能不准确，该论文有22位作者，他是资深作者之一。此外，该论文没有获得 NeurIPS 2022 最佳论文奖。
信源支撑要点：
1. arXiv上的论文预印本列出 Jack W. Rae 为作者之一。
2. NeurIPS 2022 官方博客发布的获奖论文名单中，不包含该论文。
证据：
1. 原文摘录：Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre
  链接：https://arxiv.org/abs/2203.15556
  信源名称：arXiv.org
  信源发布时间：2022-03-29
2. 原文摘录：We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ...
  链接：https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/
  信源名称：The NeurIPS Blog
  信源发布时间：2022-11-21
信源结论一致性：一致
是否需复核：否

已核查内容4：Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人

初步结论：基本属实。多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色。
信源支撑要点：
1. 引用自Wired获取的扎克伯格备忘录的微博文章明确指出其为“Gemini的预训练技术负责人，负责Gemini 2.5的推理工作”。
2. 2025年2月的搜狐网访谈报道称其为“Gemini 推理项目的主要负责人之一”。
证据：
1. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
2. 原文摘录：我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。
  链接：https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a
  信源名称：搜狐网
  信源发布时间：2025-02-27
信源结论一致性：一致
是否需复核：否

已核查内容5：背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一

初步结论：基本属实。多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。
信源支撑要点：
1. 智东西、今日头条等多家媒体在2025年6月的报道中均提及此事，并引用彭博社作为信源。
证据：
1. 原文摘录：据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。
  链接：https://m.zhidx.com/p/485447.html
  信源名称：智东西
  信源发布时间：2025-06-12
信源结论一致性：一致
是否需复核：否

待补充核查清单

待核查内容1：GPT-4 核心开发者之一，参与预训练与推理系统设计

核查原因：
1. 这是报告中一项非常重大的贡献声明，但目前缺乏直接证据支持。
2. OpenAI 对 GPT-4 的贡献者名单披露非常有限，需要仔细核查其官方技术报告和贡献者名单以确认其具体角色。
所需查明信息：
1. OpenAI 发布的 GPT-4 技术报告或官方贡献者名单中，是否将 Jack Rae 列为核心贡献者，以及其具体职责描述。有非官方信源称其为“Research benchmark execution lead”，需要官方信源证实。
必要信源：
1. GPT-4 官方技术报告（https://arxiv.org/abs/2303.08774）。
2. OpenAI 官方贡献者页面（https://openai.com/contributions/gpt-4）。
核查优先级：高

待核查内容2：主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半

核查原因：
1. 已确认 Jack W. Rae 是 Gopher 论文的第一作者，这支持了“主导项目”的说法。
2. “首次将人类考试水平差距缩小一半”是一个非常具体且强烈的成果宣称，必须在原始论文或官方公告中找到明确的数据或表述支持。有媒体报道称“Gopher almost halves the accuracy gap from GPT-3 to human expert performance”，需要追溯其原始出处。
所需查明信息：
1. Gopher 论文或 DeepMind 官方介绍中，是否有关于在人类水平考试或特定基准（如阅读理解）上“差距缩小一半”的量化分析或结论。
必要信源：
1. Gopher 项目的原始论文《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》。
2. DeepMind 官方博客或关于 Gopher 的新闻稿。
核查优先级：高

待核查内容3：提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能

核查原因：
1. 已确认 Jack Rae 在 OpenAI 期间做过相关主题的演讲，但他是否为该理论的“提出者”存疑。“压缩即智能”的思想在信息论和AI领域有更长的历史渊源。
所需查明信息：
1. “压缩即智能”或类似概念（如 Kolmogorov 复杂性、最小描述长度原则）的学术源头和主要发展者。
2. 厘清 Jack Rae 在此理论上的具体贡献是“首次提出”、“重要发展者”还是“在LLM时代的杰出倡导者和实践者”。
必要信源：
1. 信息论、算法信息论相关的经典学术论文或综述（如 Ray Solomonoff, Jürgen Schmidhuber 的相关工作）。
2. 对该理论发展史的学术回顾文章。
核查优先级：中

待核查内容4：本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）

核查原因：
1. 目前仅有一个信源（OpenReview）提及此信息，且其时间标注存在明显歧义，可靠性不足。
所需查明信息：
1. Jack Rae 确切的本科和硕士毕业院校及时间。
必要信源：
1. Jack Rae 的个人领英（LinkedIn）页面。
2. 其他官方或半官方的个人简介（如个人网站、学术会议介绍）。
核查优先级：低

待核查内容5：职位：超级智能实验室（Meta Superintelligence Labs）核心成员

核查原因：
1. “Meta Superintelligence Labs”这一实验室名称在部分新闻报道中出现，但需要确认这是否为Meta官方公布的正式名称。
所需查明信息：
1. Meta 官方是否发布过关于成立“Meta Superintelligence Labs”的公告。
必要信源：
1. Meta 官方新闻中心（Meta Newsroom）。
2. Mark Zuckerberg 的官方社交媒体账号公告。
核查优先级：低

行动计划

核心贡献核查（GPT-4 & Gopher）：首先，集中审查 OpenAI 的 GPT-4 官方贡献者页面和 Gopher 的原始技术报告。对于 GPT-4，查找 Jack Rae 的名字并确认其具体职责。对于 Gopher，在论文中搜索关键词（如 "gap", "human expert", "halves"）以验证“将差距缩小一半”的说法。
理论归因核查（压缩即智能）：其次，进行专题检索，调查“压缩即智能”理论的学术发展史，追溯其关键提出者和里程碑式论文，以准确界定 Jack Rae 在该理论发展中的位置和贡献。
履历与机构细节完善：最后，通过搜索 Jack Rae 的个人职业社交页面（如LinkedIn）来交叉验证其本科和硕士教育背景，并检索 Meta 官方渠道确认“Meta Superintelligence Labs”是否为官方正式名称。

让我首先查看OpenAI的GPT-4官方贡献者页面，确认Jack Rae是否参与了GPT-4的开发。

网页解析

https://openai.com/contributions/gpt-4/

Core contributors
Christopher Berner Supercomputing lead
Greg Brockman Infrastructure lead
Trevor Cai Throughput lead
David Farhi Manager of optimization team
Chris Hesse Infrastructure usability co-lead
Shantanu Jain Infrastructure usability co-lead
Kyle Kosic Uptime and stability lead
Jakub Pachocki Overall lead, optimization lead
Alex Paino Architecture & data vice lead
Mikhail Pavlov Software correctness lead
Michael Petrov Hardware correctness lead
Nick Ryder Architecture & data lead
Szymon Sidor Optimization vice lead
Nikolas Tezak Execution lead
Phil Tillet Triton lead
Amin Tootoonchian Model distribution, systems & networking lead
Qiming Yuan Dataset sourcing and processing lead
Wojciech Zaremba Manager of dataset team

Compute cluster scaling
Christopher Berner, Oleg Boiko, Andrew Cann, Ben Chess, Christian Gibson, Mateusz Litwin, Emy Parparita, Henri Roussez, Eric Sigler, Akila Welihinda

Data
Sandhini Agarwal, Suchir Balaji, Mo Bavarian, Che Chang, Sheila Dunning, Leo Gao, Jonathan Gordon, Peter Hoeschele, Shawn Jain, Shantanu Jain, Roger Jiang, Heewoo Jun, Łukasz Kaiser, Nitish Shirish Keskar, Jong Wook Kim, Aris Konstantinidis, Chak Li, Todor Markov, Bianca Martin, David Mély, Oleg Murk, Hyeonwoo Noh, Long Ouyang, Alex Paino, Vitchyr Pong, Alec Radford, Nick Ryder, John Schulman, Daniel Selsam, Ian Sohl, Chelsea Voss, Lilian Weng, Clemens Winter, Tao Xu, Qiming Yuan, Wojciech Zaremba

Distributed training infrastructure
Greg Brockman, Trevor Cai, Chris Hesse, Shantanu Jain, Yongjik Kim, Kyle Kosic, Mateusz Litwin, Jakub Pachocki, Mikhail Pavlov, Szymon Sidor, Nikolas Tezak, Madeleine Thompson, Amin Tootoonchian, Qiming Yuan

Hardware correctness
Greg Brockman, Shantanu Jain, Kyle Kosic, Michael Petrov, Nikolas Tezak, Amin Tootoonchian, Chelsea Voss, Qiming Yuan

Optimization & architecture
Igor Babuschkin, Mo Bavarian, Adrien Ecoffet, David Farhi, Jesse Han, Ingmar Kanitscheider, Daniel Levy, Jakub Pachocki, Alex Paino, Mikhail Pavlov, Nick Ryder, Szymon Sidor, Jie Tang, Jerry Tworek, Tao Xu

Training run babysitting
Suchir Balaji, Mo Bavarian, Greg Brockman, Trevor Cai, Chris Hesse, Shantanu Jain, Roger Jiang, Yongjik Kim, Kyle Kosic, Mateusz Litwin, Jakub Pachocki, Alex Paino, Mikhail Pavlov, Michael Petrov, Nick Ryder, Szymon Sidor, Nikolas Tezak, Madeleine Thompson, Phil Tillet, Amin Tootoonchian, Chelsea Voss, Ben Wang, Tao Xu, Qiming Yuan

Core contributors
Trevor Cai Execution lead
Mark Chen Vision team co-lead, Deployment lead
Casey Chu Initial prototype lead
Chris Hesse Data load balancing & developer tooling lead
Shengli Hu Vision Safety Evaluations lead
Yongjik Kim GPU performance lead
Jamie Kiros Overall vision co-lead, deployment research & evaluation lead
Daniel Levy Overall vision co-lead, optimization lead
Christine McLeavey Vision team lead
David Mély Data lead
Hyeonwoo Noh Overall vision co-lead, research lead
Mikhail Pavlov Scaling engineering lead
Raul Puri Overall vision co-lead, engineering lead
Amin Tootoonchian Model distribution, systems & networking lead

Architecture research
Casey Chu, Jamie Kiros, Christine McLeavey, Hyeonwoo Noh, Raul Puri, Alec Radford, Aditya Ramesh

Compute cluster scaling
Andrew Cann, Rory Carmichael, Christian Gibson, Henri Roussez, Akila Welihinda

Distributed training infrastructure
Trevor Cai, Yunxing Dai, Chris Hesse, Brandon Houghton, Yongjik Kim, Łukasz Kondraciuk, Hyeonwoo Noh, Mikhail Pavlov, Raul Puri, Nikolas Tezak, Amin Tootoonchian, Tianhao Zheng

Hardware correctness
Oleg Boiko, Trevor Cai, Michael Petrov, Alethea Power

Data
Jong Wook Kim, David Mély, Reiichiro Nakano, Hyeonwoo Noh, Long Ouyang, Raul Puri, Pranav Shyam, Tao Xu

Alignment Data
Long Ouyang

Training run babysitting
Trevor Cai, Kyle Kosic, Daniel Levy, David Mély, Reiichiro Nakano, Hyeonwoo Noh, Mikhail Pavlov, Raul Puri, Amin Tootoonchian

Deployment & post-training
Ilge Akkaya, Mark Chen, Jamie Kiros, Rachel Lim, Reiichiro Nakano, Raul Puri, Jiayi Weng

Core contributors
Greg Brockman Core infrastructure author
Arka Dhar Human data product manager
Liam Fedus Data flywheel lead
Tarun Gogineni Model creativity
Rapha Gontijo-Lopes Synthetic data
Joshua Gross Data collection engineering co-lead
Johannes Heidecke Refusals & model safety co-lead
Joost Huizinga Initial fine-tuning derisking
Teddy Lee Human data product manager
Jan Leike Alignment co-lead
Ryan Lowe Alignment co-lead
Luke Metz Infrastructure lead, ChatML format lead
Long Ouyang IF data collection lead
John Schulman Overall lead
Jerry Tworek Code lead
Carroll Wainwright IF data infrastructure lead
Jonathan Ward Data collection engineering co-lead
Jiayi Weng RL Infrastructure author
Sarah Yoo Human data operations manager
Wojciech Zaremba Human data lead
Chong Zhang Refusals & model safety co-lead
Shengjia Zhao Reward model lead
Barret Zoph Overall training lead

Dataset contributions
Diogo Almeida, Mo Bavarian, Juan Felipe Cerón Uribe, Tyna Eloundou, Liam Fedus, Tarun Gogineni, Rapha Gontijo-Lopes, Jonathan Gordon, Joost Huizinga, Shawn Jain, Roger Jiang, Łukasz Kaiser, Christina Kim, Jan Leike, Chak Li, Stephanie Lin, Ryan Lowe, Jacob Menick, Luke Metz, Pamela Mishkin, Tong Mu, Oleg Murk, Ashvin Nair, Long Ouyang, Alex Passos, Michael (Rai) Pokorny, Vitchyr Pong, Shibani Santurkar, Daniel Selsam, Sarah Shoker,, Carroll Wainwright, Matt Wiethoff, Jeff Wu, Kai Xiao, Kevin Yu, Marvin Zhang, Chong Zhang, William Zhuk, Barret Zoph

Data infrastructure
Irwan Bello, Lenny Bogdonoff, Juan Felipe Cerón Uribe, Joshua Gross, Shawn Jain, Haozhun Jin, Christina Kim, Aris Konstantinidis, Teddy Lee, David Medina, Jacob Menick, Luke Metz, Ashvin Nair,Long Ouyang, Michael (Rai) Pokorny, Vitchyr Pong, John Schulman, Jonathan Ward, Jiayi Weng, Matt Wiethoff, Sarah Yoo, Kevin Yu, Wojciech Zaremba, William Zhuk, Barret Zoph

ChatML format
Ilge Akkaya, Christina Kim, Chak Li, Rachel Lim, Jacob Menick, Luke Metz, Andrey Mishchenko, Vitchyr Pong, John Schulman, Carroll Wainwright, Barret Zoph

Model safety
Josh Achiam, Steven Adler, Juan Felipe Cerón Uribe, Hyung Won Chung, Tyna Eloundou, Rapha Gontijo-Lopes, Shixiang Shane Gu, Johannes Heidecke, Joost Huizinga, Teddy Lee, Jan Leike, Stephanie Lin, Ryan Lowe, Todor Markov, Luke Metz, Tong Mu, Shibani Santurkar, John Schulman, Andrea Vallone, Carroll Wainwright, Jason Wei, Lilian Weng, Kai Xiao, Chong Zhang, Marvin Zhang, Barret Zoph

Refusals
Juan Felipe Cerón Uribe, Tyna Eloundou, Johannes Heidecke, Joost Huizinga, Jan Leike, Stephanie Lin, Ryan Lowe, Pamela Mishkin, Tong Mu, Carroll Wainwright, Lilian Weng, Kai Xiao, Chong Zhang, Barret Zoph

Foundational RLHF and InstructGPT work
Diogo Almeida, Joost Huizinga, Roger Jiang, Jan Leike, Stephanie Lin, Ryan Lowe, Pamela Mishkin, Dan Mossing, Long Ouyang, Katarina Slama, Carroll Wainwright, Jeff Wu, Kai Xiao, Marvin Zhang

Flagship training runs
Greg Brockman, Liam Fedus, Johannes Heidecke, Joost Huizinga, Roger Jiang, Kyle Kosic, Luke Metz, Ashvin Nair, Jiayi Weng, Chong Zhang, Shengjia Zhao, Barret Zoph

Code capability
Ilge Akkaya, Mo Bavarian, Jonathan Gordon, Shawn Jain, Haozhun Jin, Teddy Lee, Chak Li, Oleg Murk, Ashvin Nair, Vitchyr Pong, Benjamin Sokolowsky, Jerry Tworek, Matt Wiethoff, Sarah Yoo, Kevin Yu, Wojciech Zaremba, William Zhuk

Core contributors
Sandhini Agarwal System Card co-lead
Lama Ahmad Expert red teaming & adversarial testing program lead
Mo Bavarian Capability prediction co-lead
Tyna Eloundou Safety evaluations co-lead
Andrew Kondrich OpenAI Evals open-sourcing co-lead
Gretchen Krueger System Card co-lead
Michael Lampe Privacy and PII evaluations lead
Pamela Mishkin Economic impact & overreliance evaluations lead
Benjamin Sokolowsky Capability prediction co-lead
Jack Rae Research benchmark execution lead
Chelsea Voss Eval execution lead
Alvin Wang OpenAI Evals lead
Kai Xiao Safety evaluations co-lead
Marvin Zhang OpenAI Evals open-sourcing co-lead

OpenAI Evals library
Shixiang Shane Gu, Angela Jiang, Logan Kilpatrick, Andrew Kondrich, Pamela Mishkin, Jakub Pachocki, Ted Sanders, Jessica Shieh, Alvin Wang, Marvin Zhang

Model-graded evaluation infrastructure
Liam Fedus, Rapha Gontijo-Lopes, Shixiang Shane Gu, Andrew Kondrich, Michael (Rai) Pokorny, Wojciech Zaremba, Chong Zhang, Marvin Zhang, Shengjia Zhao, Barret Zoph

Acceleration forecasting
Alan Hickey, Daniel Kokotajlo, Cullen O’Keefe, Sarah Shoker

ChatGPT evaluations
Juan Felipe Cerón Uribe, Hyung Won Chung, Rapha Gontijo-Lopes, Liam Fedus, Luke Metz, Michael Rai Pokorny, Jason Wei, Shengjia Zhao, Barret Zoph

Capability evaluations
Sully Chen, Tyna Eloundou, Shengli Hu, Roger Jiang, Jamie Kiros, Teddy Lee, Scott Mayer McKinney, Jakub Pachocki, Alex Paino, Giambattista Parascandolo, Boris Power, Raul Puri, Jack Rae, Nick Ryder, Ted Sanders, Szymon Sidor, Benjamin Sokolowsky, Chelsea Voss, Alvin Wang, Rowan Zellers, Juntang Zhuang

Coding evaluations
Ilge Akkaya, Mo Bavarian, Jonathan Gordon, Shawn Jain, Chak Li, Oleg Murk, Vitchyr Pong, Benjamin Sokolowsky, Jerry Tworek, Kevin Yu, Wojciech Zaremba

Real-world use case evaluations
Andrew Kondrich, Joe Palermo, Boris Power, Ted Sanders

Contamination investigations
Adrien Ecoffet, Roger Jiang, Ingmar Kanitscheider, Scott Mayer McKinney, Alex Paino, Giambattista Parascandolo, Jack Rae, Qiming Yuan

Instruction following and API evals
Diogo Almeida, Carroll Wainwright, Marvin Zhang

Novel capability discovery
Filipe de Avila Belbute Peres, Kevin Button, Fotis Chantzis, Mike Heaton, Wade Hickey, Xin Hu, Andrew Kondrich, Matt Knight, Andrew Mayne, Jake McNeil, Vinnie Monaco, Joe Palermo, Joel Parish, Boris Power, Bob Rotsted, Ted Sanders

Vision evaluations
Shixiang Shane Gu, Shengli Hu, Jamie Kiros, Hyeonwoo Noh, Raul Puri, Rowan Zellers

Economic impact evaluation
Tyna Eloundou, Sam Manning, Aalok Mehta, Pamela Mishkin

Non-proliferation, international humanitarian law & national security red teaming
Sarah Shoker

Overreliance analysis
Miles Brundage, Michael Lampe, Pamela Mishkin

Privacy and PII evaluations
Michael Lampe, Vinnie Monaco, Ashley Pantuliano

Safety and policy evaluations
Josh Achiam, Sandhini Agarwal, Lama Ahmad, Jeff Belgum, Tyna Eloundou, Johannes Heidecke, Shengli Hu, Joost Huizinga, Jamie Kiros, Gretchen Krueger, Michael Lampe, Stephanie Lin, Ryan Lowe, Todor Markov, Vinnie Monaco, Tong Mu, Raul Puri, Girish Sastry, Andrea Vallone, Carroll Wainwright, CJ Weinmann, Lilian Weng, Kai Xiao, Chong Zhang

OpenAI adversarial testers
Josh Achiam, Steven Adler, Lama Ahmad, Shyamal Anadkat, Red Avila, Gabriel Bernadett-Shapiro, Anna-Luisa Brakman, Tim Brooks, Miles Brundage, Chelsea Carlson, Derek Chen, Hyung Won Chung, Jeremiah Currier, Daniel Kokotajlo, David Dohan, Adrien Ecoffet, Juston Forte, Vik Goel, Ryan Greene, Johannes Heidecke, Alan Hickey, Shengli Hu, Joost Huizinga, Janko, Tomer Kaftan, Ali Kamali, Nitish Shirish Keskar, Tabarak Khan, Hendrik Kirchner, Daniel Kokotajlo, Gretchen Krueger, Michael Lampe, Teddy Lee, Molly Lin, Ryan Lowe, Todor Markov, Jake McNeil, Pamela Mishkin, Vinnie Monaco, Daniel Mossing, Tong Mu, Oleg Murk, Cullen O’Keefe, Joe Palermo, Giambattista Parascandolo, Joel Parish, Boris Power, Alethea Power, Cameron Raymond, Francis Real, Bob Rotsted, Mario Salterelli, Sam Wolrich, Ted Sanders, Girish Sasty, Sarah Shoker, Shyamal Anadkat, Yang Song, Natalie Staudacher, Madeleine Thompson, Elizabeth Tseng, Chelsea Voss, Jason Wei, Chong Zhang

System card & broader impacts analysis
Steven Adler, Sandhini Agarwal, Lama Ahmad, Janko Altenschmidt, Jeff Belgum, Gabriel Bernadett-Shapiro, Miles Brundage, Derek Chen, Tyna Eloundou, Liam Fedus, Leo Gao, Vik Goel, Johannes Heidecke, Alan Hickey, Shengli Hu, Joost Huizinga, Daniel Kokotajlo, Gretchen Krueger, Michael Lampe, Jade Leung, Stephanie Lin, Ryan Lowe, Kim Malfacini, Todor Markov, Bianca Martin, Aalok Mehta, Pamela Mishkin, Tong Mu, Richard Ngo, Cullen O’Keefe, Joel Parish, Rai Pokorny, Bob Rotsted, Girish Sastry, Sarah Shoker, Andrea Vallone, Carroll Wainwright, CJ Weinmann, Lilian Weng, Dave Willner, Kai Xiao, Chong Zhang

Core contributors
Steven Adler Early stage program management lead
Sandhini Agarwal Launch safety lead
Derek Chen Monitoring & response lead
Atty Eleti GPT‑4 API co-lead
Joanne Jang GPT‑4 product co-lead
Angela Jiang GPT‑4 product co-lead
Tomer Kaftan Inference infrastructure & deployment lead
Rachel Lim GPT‑4 API co-lead
Kim Malfacini Usage policy lead
Bianca Martin Release program management lead
Evan Morikawa Engineering lead
Henrique Ponde de Oliveira Pinto Inference workflow lead
Heather Schmidt GPT‑4 infrastructure management
Maddie Simens Design lead
Felipe Petroski Such Inference optimization & reliability lead
Andrea Vallone Detection & refusals policy lead
Lilian Weng Applied research lead
Dave Willner Trust & safety lead
Michael Wu Inference research lead

Inference research
Paul Baltescu, Scott Gray, Yuchen He, Arvind Neelakantan, Michael Wu

GPT‑4 API & ChatML deployment
Greg Brockman, Brooke Chan, Chester Cho, Atty Eleti, Rachel Lim, Andrew Peng, Michelle Pokrass, Sherwin Wu

GPT‑4 web experience
Valerie Balcom, Lenny Bogdonoff, Jason Chen, Dave Cummings, Noah Deutsch, Mike Heaton, Paul McMillan, Rajeev Nayak, Joel Parish, Adam Perelman, Eric Sigler, Nick Turley, Arun Vijayvergiya, Chelsea Voss

Inference infrastructure
Brooke Chan, Scott Gray, Chris Hallacy, Kenny Hsu, Tomer Kaftan, Rachel Lim, Henrique Ponde de Oliveira Pinto, Raul Puri, Heather Schmidt, Felipe Petroski Such

Reliability engineering
Haiming Bao, Madelaine Boyd, Ben Chess, Damien Deville, Yufei Guo, Vishal Kuo, Ikai Lan, Michelle Pokrass, Carl Ross, David Schnurr, Jordan Sitkin, Felipe Petroski Such

Trust & safety engineering
Jeff Belgum, Madelaine Boyd, Vik Goel

Trust & safety monitoring and response
Janko Altenschmidt, Anna-Luisa Brakman, Derek Chen, Florencia Leoni Aleman, Molly Lin, Cameron Raymond, CJ Weinmann, Dave Willner, Samuel Wolrich

Trust & safety policy
Rosie Campbell, Kim Malfacini, Andrea Vallone, Dave Willner

Deployment compute
Peter Hoeschele, Evan Morikawa

Product management
Jeff Harris, Joanne Jang, Angela Jiang

Sam Altman, Katie Mayer, Bob McGrew, Mira Murati, Ilya Sutskever, Peter Welinder

Blog post & paper content
Sandhini Agarwal, Greg Brockman, Miles Brundage, Adrien Ecoffet, Tyna Eloundou, David Farhi, Johannes Heidecke, Shengli Hu, Joost Huizinga, Roger Jiang, Gretchen Krueger, Jan Leike, Daniel Levy, Stephanie Lin, Ryan Lowe, Tong Mu, Hyeonwoo Noh, Jakub Pachocki, Jack Rae, Kendra Rimbach, Shibani Santurkar, Szymon Sidor, Benjamin Sokolowsky, Jie Tang, Chelsea Voss, Kai Xiao, Rowan Zellers, Chong Zhang, Marvin Zhang

Communications
Ruby Chen, Cory Decareaux, Thomas Degry, Steve Dowling, Niko Felix, Elie Georges, Anna Makanju, Andrew Mayne, Aalok Mehta, Elizabeth Proehl, Kendra Rimbach, Natalie Summers, Justin Jay Wang, Hannah Wong

Compute allocation support
Theresa Lopez, Elizabeth Tseng

Contracting, revenue, pricing & finance support
Brooke Chan, Denny Jin, Billie Jonn, Patricia Lue, Kyla Sheppard, Lauren Workman

Launch partners & product operations
Filipe de Avila Belbute Peres, Brittany Carey, Simón Posada Fishman, Isabella Fulford, Teddy Lee, Yaniv Markovski, Tolly Powell, Toki Sherbakov, Jessica Shieh, Natalie Staudacher, Preston Tuggle

Legal
Jake Berdine, Che Chang, Sheila Dunning, Ashley Pantuliano

Security & privacy engineering
Kevin Button, Fotis Chantzis, Wade Hickey, Xin Hu, Shino Jomoto, Matt Knight, Jake McNeil, Vinnie Monaco, Joel Parish, Bob Rotsted

System administration & on-call support
Morgan Grafstein, Francis Real, Mario Saltarelli

Authorship & credit attribution
David Farhi

We also acknowledge and thank every OpenAI team member not explicitly mentioned above, including the amazing people on the executive assistant, finance, go to market, human resources, legal, operations and recruiting teams. From hiring everyone in the company, to making sure we have an amazing office space, to building the administrative, HR, legal, and financial structures that allow us to do our best work, everyone at OpenAI has contributed to GPT‑4.

We thank Microsoft for their partnership, especially Microsoft Azure for supporting model training with infrastructure design and management, and the Microsoft Bing team and Microsoft’s safety teams for their partnership on safe deployment.

We are grateful to our expert adversarial testers and red teamers who helped test our models at early stages of development and informed our risk assessments as well as the system card. Participation in this red teaming process is not an endorsement of the deployment plans of OpenAI or OpenAI’s policies: Steven Basart, Sophie Duba, Cèsar Ferri, Heather Frase, Gavin Hartnett, Jake J. Hecla, Dan Hendrycks, Jose Hernandez-Orallo, Alice Hunsberger, Rajiv W. Jain, Boru Gollo Jattani, Lauren Kahn, Dan Kaszeta, Sara Kingsley, Noam Kolt, Nathan Labenz, Eric Liddick, Andrew J. Lohn, Andrew MacPherson, Sam Manning, Mantas Mazeika, Anna Mills, Yael Moros, Jimin Mun, Aviv Ovadya, Roya Pakzad, Yifan Peng, Ciel Qi, Alex Rosenblatt, Paul Röttger, Maarten Sap, Wout Schellaert, George Shih, Muhammad Shoker, Melanie Subbiah, Bryan West, Andrew D. White, Anna Katariina Wisakanto, Akhila Yerukola, Lexin Zhou, Xuhui Zhou.

Contributors listed in alphabetized order.

我需要继续查找关于Jack Rae在Gopher项目中的角色和贡献，特别是关于"将人类考试水平差距缩小一半"的说法。

Jack Rae Gopher DeepMind halves gap human expert performance

智源社区

微博

智源社区

今日头条

还有那个叫Jack Rae的，以前在DeepMind管着Gopher和Chinchilla，都是些听着就厉害的模型。小扎这是把人家的精英一锅端了。有人开玩笑说，这手笔太大了。当年皇家马德里花大价钱买球星C罗，现在小扎这是花了天价从OpenAI把人给“买”过来了。这可不是小打小闹，是真金白银地砸钱抢人啊！小扎自己也说了，他觉得一个新时代就要来了，他要拼尽全力让Meta公司站在这股浪潮的最前面。他把这个事儿看得很重，认为是人类文明的新开始。这个新成立的实验室...

机器之心

qbitai.com

Jack Rae：Gemini预训练技术负责人，负责Gemini 2.5的推理工作，曾在DeepMind领导Gopher和Chinchilla早期大语言模型研发工作。Johan Schalkwyk：前谷歌研究员，是Sesame的早期贡献者，也是Maya的技术负责人。值得注意的是，首批成员名单中不包含Meta从OpenAI苏黎世挖的“三人组”，也就是ViT的核心作者Lucas Beyer、Alexander Kolesnikov和翟晓华。MSL，要做什么？...

腾讯云

storage.googleapis.com

In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales — from models with ...

View all 15 sources

我需要查找Gopher论文的原文，以确认Jack Rae的具体角色和"将人类考试水平差距缩小一半"的说法。

网页解析

https://storage.googleapis.com/deepmind-media/research/language-research/Training Gopher.pdf

Title:
URL Source: https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf

Published Time: Wed, 08 Dec 2021 13:14:20 GMT

Markdown Content:
2021-12-08

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu and Geoffrey Irving
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales — from models with tens of millions of parameters up to a 280 billion parameter model called
Gopher . These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model’s behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.
Keywords: Natural Language Processing, Language Models, Deep Learning
Contents
1 Introduction 32 Background 53 Method 5

3.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.4 Training Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Results 7

4.1 Task Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Comparisons with State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Corresponding authors: jwrae@deepmind.com, geoffreyi@deepmind.com
© 2021 DeepMind. All rights reserved Scaling Language Models: Methods, Analysis & Insights from Training Gopher
4.3 Performance Improvements with Scale . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5Toxicity and Bias Analysis 13

5.1 Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Distributional Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6Dialogue 17

6.1 Prompting For Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.2 Fine-tuning for Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.3 Dialogue & Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7Discussion 20

7.1 Towards Efficient Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2 Challenges in Toxicity and Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3 Safety benefits and safety risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8Conclusion 23 9Acknowledgements 23 10 Contributions 24 AMassiveText 39

A.1 Dataset Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 A.2 Dataset Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.3 Dataset Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 A.4 Text normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A.5 MassiveText Datasheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

BGopher Model Card 49 CLessons Learned 53

C.1 Adafactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 C.2 Lower-Precision Training with bfloat16 . . . . . . . . . . . . . . . . . . . . . . . . . 53

DResults 55

D.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 D.2 Pile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 D.3 Language Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 D.4 Filtering Test-Set Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 D.5 Scaling Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 D.6 Scaling Context Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 D.7 MMLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 D.8 BIG-bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2Scaling Language Models: Methods, Analysis & Insights from Training Gopher

D.9 TriviaQA & NaturalQuestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 D.10 TruthfulQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 D.11 Reading Comprehension: RACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 D.12 Fact-Checking: FEVER & MultiFC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 D.13 Common Sense: PIQA, WinoGrande, SocialIQA, HellaSwag . . . . . . . . . . . . . . . 86

EToxicity and Bias Analysis 88

E.1 Toxic Generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 E.2 Classifying Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 E.3 Distributional Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

FCompute Usage 101 GReducing Inference and Training Costs 102

G.1 Efficient Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 G.2 Reducing Inference Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 G.3 Reducing Training Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 G.4 Future Work for Efficient Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

HDialogue-Prompted Gopher Details 111

H.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 H.2 Dialogue Dataset Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 H.3 Comparison Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 H.4 RTP in a Dialogue Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 H.5 Selected Transcripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Introduction
Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling — predicting the future of a text sequence from its past — provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original applica-tion: transferring text over a limited-bandwidth communication channel. Shannon’s Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measur-ing its compression rate. Shannon fit early language models to real data via precomputed tables of

3Scaling Language Models: Methods, Analysis & Insights from Training Gopher

text statistics (Dewey, 1923) relating model complexity to improved text compression alongside more realistic text generation. 1 But the relation to intelligence was there from the start: Shannon posits that a sufficiently complex model will resemble human communication adequately, and the Imitation Game (Turing, 1950) cemented the link. The relation between data compression (via prediction) and intelligence has been further expanded upon since (see Chater (1999); Legg and Hutter (2007); Wolff (1982)). A key driver towards better language models has been modern computing. From their pen-and-paper origins, language models have transformed in capacity and predictive power by the exponential rise in compute (Moore et al., 1965). In the 1990s and 2000s, 푛-gram models saw increases in scale and better smoothing approaches (Ney et al., 1994), including a 300 billion 푛-gram model trained on two trillion tokens of text (Brants et al., 2007). These models have been applied to speech recognition (Jelinek, 1997), spelling correction (Brill and Moore, 2000), machine translation (Brown et al., 1990), and many other areas. However 푛-gram models become statistically and computationally inefficient as the context length is increased, which limits the richness of language they can model. In the past two decades language models have progressed to neural networks that capture the structure of language implicitly (Bengio et al., 2003; Graves, 2013; Jozefowicz et al., 2016; Mikolov et al., 2010; Radford et al., 2019). Progress has been driven by both scale and network architecture (Bahdanau et al., 2014; Hochreiter and Schmidhuber, 1997; Vaswani et al., 2017). Kaplan et al. (2020) found power laws relating cross entropy loss to model and training dataset size for Transformer models. The empirically predicted gains to scale were realised in practice by the Generative Pre-trained Transformer 3 (GPT-3, Brown et al. (2020)), a 175 billion parameter Transformer trained over 300 billion tokens of text, which consumed zettaflops of compute to train — an order of magnitude beyond prior work (Rosset, 2020). GPT-3 demonstrated unprecedented generation quality alongside generalist capabilities across many Natural Language Processing (NLP) tasks — notably when prompted with examples (termed few-shot prompting). In this paper we describe a protocol for training a state-of-the-art large language model and present a 280 billion parameter model called Gopher . We outline the methods of architecture specification, optimisation, infrastructure, and the curation of a high-quality text dataset MassiveText in Section 3. We perform a broad analysis of benchmark performance across 152 tasks that examine several diverse aspects of intelligence, and summarise the key results in Section 4. We see that Gopher lifts the performance over current state-of-the-art language models across roughly 81% of tasks containing comparable results, notably in knowledge-intensive domains such as fact checking and general knowledge. As harmful content occurs both in Gopher ’s training set and in many potential downstream applications, we examine model toxicity and bias in Section 5 with a focus on how scale influences these properties. We find larger models are more likely to generate toxic responses when provided with toxic prompts, but they can also more accurately classify toxicity. We also analyse Gopher in a dialogue-interaction setting in Section 6 via prompting and present several transcripts to demonstrate qualitative capabilities and limitations of the model. Finally, we discuss the ethical and safe application of these models including which types of undesirable behaviour to mitigate before and after training in Section 7. We discuss application-driven safety and the potential for language models to accelerate research towards safer intelligent technology.

1A sample from Shannon’s word-pair model: “the head and in frontal attack on an english writer that the character of this point is therefore another method for the letters that the time of who ever told the problem for an unexpected.”
4Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Background
Language modelling refers to modelling the probability of text 푃(푆) where 푆 can be a sentence, paragraph, or document depending on the application. This is done by tokenizing the string: mapping it to a sequence of integer-valued tokens : 푔(푆) = 푋 = (푋1, 푋 2, . . . , 푋 푛) ∈ 푉푛 where 푉 is the vocabulary (a finite set of positive integers) and 푛 is the resulting sequence length, and modelling 푋. Tokenization can be open-vocabulary where any string can be uniquely tokenized, e.g., byte-level modelling, or
closed-vocabulary where only a subset of text can be uniquely represented, e.g., a list of words and a singular out-of-vocabulary token. We employ open-vocabulary tokenization via a mixture of byte-pair encoding (BPE) with a backoff to UTF-8 bytes in the style of Radford et al. (2018). The typical way to model the token sequence 푋 is via the chain rule 푃(푋) = 푃(푋1, 푋 2, . . . , 푋 푛) =

Œ푛푖=1 푃(푋푖 |푋<푖 ). This is also known as autoregressive sequence modelling, because at each time-step the future (in this case, future token) is predicted based upon the past context. Whilst there are other objectives towards modelling a sequence, such as modelling masked tokens given bi-directional context (Devlin et al., 2019; Mikolov et al., 2013) and modelling all permutations of the sequence (Yang et al., 2019) we focus on autoregressive modelling due to its strong performance and simplicity. We shall refer to language models hereon as the function approximators to perform next-token prediction. A class of neural networks known as Transformers (Vaswani et al., 2017) have demonstrated state-of-the-art language model performance in recent years (Dai et al., 2019; Radford et al., 2018, 2019) and this is the architecture we focus on in this paper. There has been a trend of scaling the combination of training data, model size (measured in parameters) and training computation to obtain models with improved performance across academic and industrial benchmarks. Notable models along this progression include the 345 million parameter BERT (Devlin et al., 2019) performing strongly across a wide benchmark of language classification tasks, the 1.5 billion parameter GPT-2 (Radford et al., 2018) and 8.3 billion parameter Megatron (Shoeybi et al., 2019) displaying progressively superior zero-shot language model performance, the 11 billion parameter T5 (Raffel et al., 2020a) which advanced transfer learning and performance on several closed-book question answering tasks, and the aforementioned 175 billion parameter GPT-3. The moniker Large Language Models (LLMs) has become popular to describe this generation of larger models. Since GPT-3 there has been a 178B parameter Transformer language model Jurassic-1 (Lieber et al., 2021) which uses a diverse training set and a larger tokenizer vocabulary size, along with an announced 530B Megatron-Turing NLG (Kharya and Alvi, 2021) which trains on a released dataset (The Pile, Gao et al. (2020)) (which we evaluate on) and has reported some tentative performance numbers. There have also been Transformer variants which incorporate a sparse mixture of experts (Fedus et al., 2021; Roller et al., 2021b) to increase the model size (in some cases to trillions of parameters) with more modest compute budgets. Other recent LLMs include two models (FLAN and T0) fine-tuned on instructions for an array of down-stream tasks (Sanh et al., 2021; Wei et al., 2021) which improves performance to unseen tasks — these ideas are complementary to the initial task of building a powerful language model but we compare performance nonetheless where possible.
3. Method
3.1. Models
In this paper we present results on six Transformer language models ranging from 44 million to 280 billion parameters, with the architectural details displayed in Table 1. We refer to the largest as
Gopher and the entire set of models as the Gopher family.

5Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Model Layers Number Heads Key/Value Size dmodel Max LR Batch Size
44M 8 16 32 512 6 × 10 −4 0.25M 117M 12 12 64 768 6 × 10 −4 0.25M 417M 12 12 128 1,536 2 × 10 −4 0.25M 1.4B 24 16 128 2,048 2 × 10 −4 0.25M 7.1B 32 32 128 4,096 1.2 × 10 −4 2M
Gopher 280B 80 128 128 16,384 4 × 10 −5 3M → 6M Table 1 | Model architecture details. For each model, we list the number of layers, the key/value size, the bottleneck activation size d model , the maximum learning rate, and the batch size. The feed-forward size is always 4 × dmodel .We use the autoregressive Transformer architecture detailed in Radford et al. (2019) with two modifications: we use RMSNorm (Zhang and Sennrich, 2019) instead of LayerNorm (Ba et al., 2016), and we use the relative positional encoding scheme from Dai et al. (2019) rather than absolute positional encodings. Relative encodings permit us to evaluate on longer sequences than we trained on, which improves the modelling of articles and books as shown in Section D.6. We tokenize the text using SentencePiece (Kudo and Richardson, 2018) with a vocabulary of 32,000 and use a byte-level backoff to support open-vocabulary modelling. The Gopher model card (Mitchell et al., 2019) is included in Appendix B.
3.2. Training
We train all models for 300 billion tokens with a 2048 token context window, using the Adam (Kingma and Ba, 2014) optimiser. We warm-up the learning rate from 10 −7 to the maximum learning rate over the first 1500 steps, and then decay it 10 × using a cosine schedule. As we increase model size, we decrease the maximum learning rate and increase the number of tokens in each batch, as shown in Table 1. Furthermore, we increase Gopher ’s batch size from three to six million tokens per batch during training. We clip gradients based on the global gradient norm using a clipping value of 1. However, for the 7.1B model and for Gopher we reduce this to 0.25 for improved stability. We incorporate the bfloat16 numerical format to reduce memory and increase training through-put. Models smaller than 7.1B are trained with mixed precision float32 parameters and bfloat16
activations (Micikevicius et al., 2018), while 7.1B and 280B use bfloat16 activations and parameters.
bfloat16 parameters are updated using stochastic rounding to ...

我需要查找更多关于Gopher在阅读理解方面的具体成就，特别是关于"将人类考试水平差距缩小一半"的说法。

Gopher DeepMind almost halves the accuracy gap from GPT-3 to human expert performance

机器之心

增大模型依然有用，DeepMind用2800亿参数的Gopher，测试语言系统极限 DeepMind 连发三篇论文，全面阐述大规模语言模型依然在进展之中，能力也在继续增强。近年来，国内外各大 AI 巨头的大规模语言模型（large language model，LLM）一波接着一波，如 OpenAI 的 GPT-3、智源研究院的悟道 2.0 等。大模型已然成为社区势不可挡的发展趋势。然而，当前语言模型存在着一些问题，比如逻辑推理较弱。那么，...

搜狐网

GopherCite 模型开发的背景，事实上与 DeepMind 公司的工程师们在 2021 年发表的一系列有关大型语言模型（LLM）的论文有关，这些论文中有几篇论文专门分析了 DeepMind 公司的大型语言模型 Gopher 的各个方面。目前，Gopher 语言模型已经不仅是由 DeepMind 公司开发，其他的几个实验室和公司也参与了 Gopher 的开发，并有望对它的相关应用程序进行尤其是可信...

m.bilibili.com

https://www.youtube.com/watch?v=aPiHhJjN3hIDeepMind Gopher 2800亿参数语言模型根据DeepMind发布的数据,DeepMind的语言模型Gopher，它在许多任务上比现有的超大型语言模型要准确得多，尤其是在回答科学和人文等专业学科的问题上，在其他方面与它们持平或接近持平，比如逻辑推理和数学。事实就是这样，尽管Gopher比一些超大,视频播放量 1093、弹...

搜狐网

继续看论文我们可以发现，原来DeepMind用大量的计算资源训练了一系列不同规模的Transformer语言模型来进行语言模型的探索和开发，其中参数量最小的只有4400万个，最大的有2800亿，并将最大的模型命名为Gopher。DeepMind通过调查研究这些不同规模的模型的优点和缺点发现，部分任务，例如阅读理解的性能确实可以通过增加模型的规模来进一步提升，例如阅读理解。而在诸如逻辑推理和常识性任务中，模型参数量的提升并不能明显地改善输出的结果，只会拖慢...

个人图书馆

360百科

DeepMind,DeepMind，位于英国伦敦，是由人工智能程序师兼神经科学家戴密斯·哈萨比斯(Demis Hassabis)等人联合创立的Google旗下前沿人工智能企业。其将机器学习和系统神经科学的最先进技术结合起来，建立强大的通用学习算法。最初成果主要应用于模拟、电子商务、游戏开发等商业领域。4月20日，谷歌母公司Alphabet表示合并旗下两个主要的人工智能研究部门—Google Brain（谷歌大脑）和DeepMind。

微博

4.DeepMind的Gopher Gopher是DeepMind开发的大模型，虽然DeepMind并未公开太多关于Gopher的详细信息，但它在AI领域的影响力不容小觑。5.华为的鹏程盘古大模型虽然华为是中国公司，但其在国际上也有广泛的影响力。鹏程盘古大模型包括CV和NLP两类大模型，其中盘古NLP大模型是业界首个千亿级中文NLP大模型。6.商汤科技的“书生”大模型 “书生”是商汤科技发布的多模态...

m.bilibili.com

http://bing.com Gopher Explained: 280 BILLION Parameter Model Beats GPT-3 字幕版之后会放出，敬请持续关注欢迎加入人工智能机器学习群：556910946，公众号:AI基地，会有视频，资料放送。公众号中输入视频地址或视频ID就可以自助查询对应的字幕版本

View all 16 sources

已核查结论清单

已核查内容1：Jack Rae 的职业经历可以分为以下几个阶段：...DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）

初步结论：基本属实。已确认Jack Rae是一位真实存在的AI研究员，其职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta。时间点与各方报道基本吻合。
信源支撑要点：
1. OpenReview个人资料显示其2014-2017年在DeepMind任职。
2. 多篇2023年的文章和演讲摘要（如Stanford MLSys Seminar）证实其在OpenAI任职并担任团队负责人。
3. 多篇2025年2-3月的访谈和报道（如搜狐网、36氪）证实其在Google DeepMind担任首席科学家/研究总监，负责Gemini项目。
4. 多篇2025年6-7月的新闻报道（如智东西、新浪微博）及据称为扎克伯格的内部备忘录内容显示，他已于近期加入Meta。
证据：
1. 原文摘录：Research Engineer. DeepMind (google.com). 2014 – 2017.
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
2. 原文摘录：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
3. 原文摘录：最新爆料，谷歌DeepMind首席研究员Jack Rae已经确认转投。
  链接：https://m.36kr.com/p/3333286223030536
  信源名称：36氪
  信源发布时间：2025-06-12
4. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
信源结论一致性：一致
是否需复核：否

已核查内容2：博士：英国伦敦大学学院（UCL），计算机科学方向

初步结论：属实。
信源支撑要点：
1. 伦敦大学学院（UCL）的在线知识库中可以找到署名为 Jack William Rae 的博士论文，题为《Towards Lifelong Reasoning with Sparse and Compressive Memory》。
证据：
1. 原文摘录：I, Jack William Rae confirm that the work presented in this thesis is my own.
  链接：https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf
  信源名称：UCL Discovery
  信源发布时间：unknown
信源结论一致性：一致
是否需复核：否

已核查内容3：主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半

初步结论：基本属实，但表述存在简化。Jack W. Rae是Gopher论文的第一作者，确为项目主导者。论文中明确提到Gopher“几乎将从GPT-3到人类专家表现的准确性差距缩小了一半”，但这一描述是与GPT-3对比，而非泛指所有人类考试。
信源支撑要点：
1. Gopher论文《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》将Jack W. Rae列为第一作者和通讯作者。
2. 该论文在第11页的“Performance Improvements with Scale”章节中明确陈述了这一成果。
证据：
1. 原文摘录：Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song... Corresponding authors: jwrae@deepmind.com, geoffreyi@deepmind.com
  链接：https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf
  信源名称：Gopher Paper (storage.googleapis.com)
  信源发布时间：2021-12-08
2. 原文摘录：We find Gopher almost halves the accuracy gap from GPT-3 to human expert performance and exceeds forecaster expectations.
  链接：https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf
  信源名称：Gopher Paper (storage.googleapis.com)
  信源发布时间：2021-12-08
信源结论一致性：一致
是否需复核：否

已核查内容4：Chinchilla：提出“Chinchilla Scaling Laws”...2022年NeurIPS最佳论文

初步结论：部分属实，部分不属实。Jack Rae 是 Chinchilla 论文的作者之一，但非第一作者，将其个人描述为“提出者”可能不准确。此外，该论文没有获得 NeurIPS 2022 最佳论文奖。
信源支撑要点：
1. arXiv上的论文预印本列出 Jack W. Rae 为作者之一（共22位作者，Rae为倒数第三位）。
2. NeurIPS 2022 官方博客发布的获奖论文名单中，不包含该论文。
证据：
1. 原文摘录：Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre
  链接：https://arxiv.org/abs/2203.15556
  信源名称：arXiv.org
  信源发布时间：2022-03-29
2. 原文摘录：We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ...
  链接：https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/
  信源名称：The NeurIPS Blog
  信源发布时间：2022-11-21
信源结论一致性：一致
是否需复核：否

已核查内容5：GPT-4 核心开发者之一，参与预训练与推理系统设计

初步结论：部分属实，但角色描述不准确。Jack Rae 确实是 GPT-4 的贡献者之一，但其官方认证的角色是“研究基准测试执行负责人”（Research benchmark execution lead），而非报告中所述的“参与预训练与推理系统设计”。
信源支撑要点：
1. OpenAI 官方发布的 GPT-4 贡献者名单中，明确列出了 Jack Rae 的姓名和具体职责。
证据：
1. 原文摘录：Core contributors ... Jack Rae Research benchmark execution lead
  链接：https://openai.com/contributions/gpt-4/
  信源名称：OpenAI
  信源发布时间：unknown
信源结论一致性：部分一致
是否需复核：否

已核查内容6：Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人

初步结论：基本属实。多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色。
信源支撑要点：
1. 引用自Wired获取的扎克伯格备忘录的微博文章明确指出其为“Gemini的预训练技术负责人，负责Gemini 2.5的推理工作”。
2. 2025年2月的搜狐网访谈报道称其为“Gemini 推理项目的主要负责人之一”。
证据：
1. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
2. 原文摘录：我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。
  链接：https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a
  信源名称：搜狐网
  信源发布时间：2025-02-27
信源结论一致性：一致
是否需复核：否

已核查内容7：背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一

初步结论：基本属实。多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。
信源支撑要点：
1. 智东西、今日头条等多家媒体在2025年6月的报道中均提及此事，并引用彭博社作为信源。
证据：
1. 原文摘录：据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。
  链接：https://m.zhidx.com/p/485447.html
  信源名称：智东西
  信源发布时间：2025-06-12
信源结论一致性：一致
是否需复核：否

待补充核查清单

待核查内容1：提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能

核查原因：
1. 已确认 Jack Rae 是该理论在LLM时代的重要倡导者，但他是否为理论的“提出者”存疑。“压缩即智能”的思想在信息论和AI领域有更长的历史渊源，将其归因于单一人物可能存在归因谬误。
所需查明信息：
1. “压缩即智能”或类似概念（如柯尔莫哥洛夫复杂度、最小描述长度原则）的学术源头和关键发展人物。
2. 准确界定 Jack Rae 在此理论上的具体贡献是“首次提出”、“重要发展者”还是“在LLM时代的杰出倡导者和实践者”。
必要信源：
1. 信息论、算法信息论相关的经典学术论文或综述（如 Ray Solomonoff, Jürgen Schmidhuber 的相关工作）。
2. 对该理论发展史的学术回顾文章。
核查优先级：中

待核查内容2：本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）

核查原因：
1. 目前仅有一个信源（OpenReview）提及此信息，且其时间标注存在明显歧义，可靠性不足。
所需查明信息：
1. Jack Rae 确切的本科和硕士毕业院校及时间。
必要信源：
1. Jack Rae 的个人领英（LinkedIn）页面。
2. 其他官方或半官方的个人简介（如个人网站、学术会议介绍）。
核查优先级：低

待核查内容3：职位：超级智能实验室（Meta Superintelligence Labs）核心成员

核查原因：
1. “Meta Superintelligence Labs”这一实验室名称在部分新闻报道中出现，但需要确认这是否为Meta官方公布的正式名称。
所需查明信息：
1. Meta 官方是否发布过关于成立“Meta Superintelligence Labs”的公告。
必要信源：
1. Meta 官方新闻中心（Meta Newsroom）。
2. Mark Zuckerberg 的官方社交媒体账号公告。
核查优先级：低

行动计划

理论归因核查：进行专题检索，调查“压缩即智能”理论的学术发展史，追溯其关键提出者（如Solomonoff, Kolmogorov, Chaitin, Schmidhuber）和里程碑式论文，以准确界定 Jack Rae 在该理论发展中的位置和贡献。
履历细节完善：通过搜索 Jack Rae 的个人职业社交页面（如LinkedIn）或其他学术平台上的个人简介，来交叉验证并澄清其本科和硕士教育背景。
机构名称确认：检索 Meta 官方渠道（官网新闻、扎克伯格公开声明），确认“Meta Superintelligence Labs”是否为官方正式名称。

让我首先搜索关于"压缩即智能"理论的信息，以确定Jack Rae在这一理论中的具体贡献。

compression is intelligence theory history Solomonoff Kolmogorov Schmidhuber Jack Rae

youdao.com

发音：英 [kəmˈpreʃ(ə)n] 美 [kəmˈpreʃ(ə)n] 解释：n. 压紧，压缩

抓鸟词典

compression的解释是：浓缩,压缩,压榨,压缩,密集…该页还为英语学习者提供：compression的中文翻译、英英详解、单词音标、在线发音、中英文例句等。

海词

海词词典，最权威的学习词典，专业出版compression是什么意思，compression的用法，compression翻译和读音等详细讲解。海词词典：学习变容易，记忆很深刻。

学术百科

详情>>Journal of Earth Science 2023年04期 moraine compression creep in-situ mechanical properties AI辅助阅读 4.BACKGROUND:Cardiopulmonary resuscitation(CPR)is a kind of emergency treatment for cardiop.详情>>World Journal of Emergency Me...

百度百科

数据压缩是通过编码机制减少数据存储空间与传输时间的技术体系，核心分为无损压缩与有损压缩两大类别。无损压缩通过消除统计冗余实现数据精确还原，典型算法包括Huffman编码、LZ77系列算法，广泛应用于文本、程序代码等场景；有损压缩则通过心理声学模型或视觉冗余去除实现更高压缩率，主导JPEG、MP3等多媒体标准。在数字电影制作领域，H.264视频编码标准结合帧间预测与离散余弦变换实现高压缩比，RLE算法优化动画帧重复数据存储，而压缩技术与CPU处理效率、网络传输带宽的协同优化成为行业关键技术。

bi0.cn

compression的意思什么，compression怎么读？比邻词典提供compression的意思、compression网络解释、发音、读音、音标、用法、compression英文翻译和用法例句等。

沪江网校

发音：英音: [kəm'preʃn] 美音: [kəm'preʃn] 解释：名词压缩；压榨；浓缩

趣词词典

compression(n.)c.1400,from Middle French compression(14c.),from Latin compressionem(nominative compressio)"a pressing together,"noun of action from past participle stem of comprimere"to squeeze"(see compress(v.)).Related:Compressional.Com...

View all 16 sources

我需要继续搜索关于"压缩即智能"理论的更多信息，特别是关于Jack Rae在这一理论中的贡献。

Jack Rae Compression is Intelligence Stanford MLSys Seminar YouTube

nxrte.com

背景：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型(LLM)研究组。演讲主题深入思考基础模型的训练目标思考我们正在做什么，为什么这样做是有道理的，局限性是什么要点找到解决感知问题的最小描述长度生成模型是无损压缩器大语言模型是 SOTA 的无损文本压缩器现有压缩方法的局限性最小描述长度想象一个电脑软件需要把英文翻译成中文，如果它是通过查找字典把所有可能的词组翻译成中文，那么我...

腾讯

微博

xie.infoq.cn

腾讯

CSDN技术社区

稀土掘金

Compression for AGI-Jack Rae|Stanford MLSys#76 token，是NPL语言模型中文本的最小单位，可以是单词，字母、字符等等 AGI（aritifical generate intellifence）通用人工智能 Sota state-of-the-art 描述机器学习中取得某个任务上当前最优结果的模型更小更短的有效方法描述，代表了对任务更好的理解，当对有效方法无损压缩到最小长度时，就得到了最优解。数据集不可能越大越好，压缩的目...

m.bilibili.com

View all 16 sources

我需要继续搜索关于"压缩即智能"理论的历史渊源，特别是Solomonoff和Kolmogorov的贡献。

Solomonoff Kolmogorov compression theory algorithmic information theory history

文件

[PDF] Algebraic Barriers to Halving Algorithmic Information Quantities in Correlated Strings

文件

[PDF] Kolmogorov Complexity and Information Content

掌桥科研

5.Kolmogorov Superposition Theorem and Wavelet Decomposition for Image Compression 机译：KOLMogorov叠加定理和小波分解图像压缩作者：Pierre-Emmanuel Leni;Yohan D.Fougerolle;Frederic Truchetet 会议名称：《International Conference on Advanced Concepts for I...

link.springer.com

The Kolmogorov-Arnold-Moser(KAM)and Nekhoroshev Theorems with Arbitrary Time Dependence Chapter © 2016 A KAM Theorem for Two Dimensional Completely Resonant Reversible Schrödinger Systems Article 21 January 2021 KAM theory for the reversible pertu...

当当网

The research presented in Aspects of Kolmogorov Complexity addresses the fundamental standard of defining randomness as measured by a Martin-Lof level of randomness as found in random sequential binary strings.A classical study of statistics that ...

博客

Kolmogorov-Smirnov检验（KS检验）是一种非参数统计检验方法，用于比较一个样本的分布是否与某个特定的理论分布（如正态分布）相一致，或者比较两个样本是否来自同一分布。KS检验在统计学中具有广泛应用，尤其是在样本量较小的情况下。而2DKS则是KS检验在二维数据情况下的扩展。2DKS（二维Kolmogorov-Smirnov拟合优度检验）是对传统的Kolmogorov-Smirnov测试的一种二维扩展。它主要用于比较点数据集与某个分布或者两个点...

link.springer.com

Acronym from the names of Kolmogorov(Andrey Nikolaevich Kolmogorov,1903–1987),Arnold(Vladimir Igorevich Arnold,1937)and Moser(Jürgen K.Moser,1928–1999),whose results,in the 1950's and 1960's,in Hamiltonian dynamics,gave rise to the theory...

掌桥科研

Although this might seem improbable,a series of papers has recently provided evidence that this may be the case.In particular,it is known that there is a class of problems$cal C$defined in terms of polynomial-time truth-table reducibility to$RK$(the set of...

https://s2.zimgs.cn/ims?kt=url&at=smstruct&key=aHR0cHM6Ly9jZG4uc20uY24vdGVtcC8yMDIyMTExNTE3NTExOS14MmU5ajg4NzQzZjg1bmUxZmJjZm1kMWprNDRwMmV0ei5KUEc=&sign=yx:25eVRv3jhdS-9uH82lYpqHiHejo=&tv=400_400

View all 18 sources

我需要查找更多关于Jack Rae的教育背景信息，以及Meta Superintelligence Labs的相关信息。

Jack Rae education background Bristol University CMU LinkedIn

openreview.net

PhD student. University College London (ucl.ac.uk). 2016 ; Research Engineer. DeepMind (google.com). 2014 ; MS student. University of Bristol (bristol.ac.uk).

豆瓣

年份评价控方证人 1957 9.6 演员(饰 Doctor) 迷离时空(原版)第一季 1959 9.4 演员梅森探案集第一季 1957 8.6 演员金玉盟 1957 8.2 演员悲惨世界 1952 7.9 演员大叔局特工第一季 1964 7.8 演员艾米丽的美国化 1964 7.6 演员唐人街繁华梦 1929 7.4 演员杜立德医生 1967 7.2 演员弗林特 1966 7.1 演员狂想曲 1954 6.9 演员六月六日登陆日 1956 6.8 演员慕理小镇 1955 6.5...

百度百科

Jack Raine，演员，代表作品有《大地风云》、《飞天万能床》等。

m.bilibili.com

视频播放量 114、弹幕量 0、点赞数 4、投硬币枚数 0、收藏人数 0、转发人数 0,视频作者 Jack的六块腹肌,作者简介本账号是Jack日常直播切片粉丝制作分享直播号，随直播不定时更新，相关视频：Jack叔叔开导不要为自己先天存在而担忧，【Jack的六块腹肌】最新视频上线，求关注！我去医院打耳洞前做了很多攻略，结果…，Jack叔叔人生翻命三步完整版，时长20分钟，【Jack直播切片】吃亏是福？...

掌桥科研

摘要：After R J Mitchell's untimely death in 1937,the task of overseeing the production and development of the Spitfire fell to Mitchell's Chief Draughtsman,Joseph Smith.Initially he was appointed Design Manager and it was not until 1941 that he was confirmed...

m.bilibili.com

2025年6月6日直播,视频播放量 3773、弹幕量 1、点赞数 485、投硬币枚数 22、收藏人数 580、转发人数 18,视频作者叶烁要加油,作者简介分享jack叔叔直播干货，感谢关注！希望能给你带来积极向上的能量，给你充能！视频来源:jack要加油，和jack叔叔互关了^_^，相关视频：【jack】学习一定是很痛苦的，所以学习的过程中一定要找乐子，人生是一场游戏，就是要找乐子！【jack】真正学习好的人很少财...

中国科学院知识服务平台

Background There has long been debate around the definition of the field of education,research and practice known as global health.In this article we step back from attempts at definition and instead ask what current definitions tell us a...

View all 16 sources

已核查结论清单

已核查内容1：Jack Rae 的职业经历可以分为以下几个阶段：...DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）

初步结论：基本属实。已确认Jack Rae是一位真实存在的AI研究员，其职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta。时间点与各方报道基本吻合。
信源支撑要点：
1. OpenReview个人资料显示其2014-2017年在DeepMind任职。
2. 多篇2023年的文章和演讲摘要（如Stanford MLSys Seminar）证实其在OpenAI任职并担任团队负责人。
3. 多篇2025年2-3月的访谈和报道（如搜狐网、36氪）证实其在Google DeepMind担任首席科学家/研究总监，负责Gemini项目。
4. 多篇2025年6-7月的新闻报道（如智东西、新浪微博）及据称为扎克伯格的内部备忘录内容显示，他已于近期加入Meta。
证据：
1. 原文摘录：Research Engineer. DeepMind (google.com). 2014 – 2017.
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
2. 原文摘录：Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
3. 原文摘录：最新爆料，谷歌DeepMind首席研究员Jack Rae已经确认转投。
  链接：https://m.36kr.com/p/3333286223030536
  信源名称：36氪
  信源发布时间：2025-06-12
4. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
信源结论一致性：一致
是否需复核：否

已核查内容2：本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）；博士：英国伦敦大学学院（UCL），计算机科学方向

初步结论：属实。
信源支撑要点：
1. 伦敦大学学院（UCL）的在线知识库中可以找到署名为 Jack William Rae 的博士论文。
2. OpenReview网站的个人资料页列出了其在布里斯托大学和卡内基梅隆大学的教育经历。
3. MSN网站的文章也交叉验证了其本科和硕士教育背景。
证据：
1. 原文摘录：I, Jack William Rae confirm that the work presented in this thesis is my own.
  链接：https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf
  信源名称：UCL Discovery
  信源发布时间：unknown
2. 原文摘录：MS student University of Bristol (bristol.ac.uk) 2008 – 2017; MS student Carnegie Mellon University (cmu.edu) 2013 – 2014
  链接：https://openreview.net/profile?id=~Jack_William_Rae1
  信源名称：OpenReview
  信源发布时间：unknown
3. 原文摘录：Education: Bachelor of Science (BS), University of Bristol, UK; Master of Science (MS), Carnegie Mellon University (CMU), USA.
  链接：https://www.msn.com/en-in/money/news/who-are-the-11-ai-experts-hired-by-mark-zuckerberg-s-meta-fun-fact-all-of-them-are-immigrants/ar-AA1HT8Du
  信源名称：MSN
  信源发布时间：unknown
信源结论一致性：一致
是否需复核：否

已核查内容3：主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半

初步结论：基本属实，但表述存在简化。Jack W. Rae是Gopher论文的第一作者，确为项目主导者。论文中明确提到Gopher“几乎将从GPT-3到人类专家表现的准确性差距缩小了一半”，但这一描述是与GPT-3对比，而非泛指所有人类考试。
信源支撑要点：
1. Gopher论文《Scaling Language Models: Methods, Analysis & Insights from Training Gopher》将Jack W. Rae列为第一作者和通讯作者。
2. 该论文在第11页的“Performance Improvements with Scale”章节中明确陈述了这一成果。
证据：
1. 原文摘录：Jack W. Rae, Sebastian Borgeaud, Trevor Cai... Corresponding authors: jwrae@deepmind.com, geoffreyi@deepmind.com
  链接：https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf
  信源名称：Gopher Paper (storage.googleapis.com)
  信源发布时间：2021-12-08
2. 原文摘录：We find Gopher almost halves the accuracy gap from GPT-3 to human expert performance and exceeds forecaster expectations.
  链接：https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf
  信源名称：Gopher Paper (storage.googleapis.com)
  信源发布时间：2021-12-08
信源结论一致性：一致
是否需复核：否

已核查内容4：Chinchilla：提出“Chinchilla Scaling Laws”...2022年NeurIPS最佳论文

初步结论：部分属实，部分不属实。Jack Rae 是 Chinchilla 论文的作者之一，但非第一作者，将其个人描述为“提出者”可能不准确。此外，该论文没有获得 NeurIPS 2022 最佳论文奖。
信源支撑要点：
1. arXiv上的论文预印本列出 Jack W. Rae 为作者之一（共22位作者，Rae为倒数第三位）。
2. NeurIPS 2022 官方博客发布的获奖论文名单中，不包含该论文。
证据：
1. 原文摘录：Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre
  链接：https://arxiv.org/abs/2203.15556
  信源名称：arXiv.org
  信源发布时间：2022-03-29
2. 原文摘录：We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ...
  链接：https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/
  信源名称：The NeurIPS Blog
  信源发布时间：2022-11-21
信源结论一致性：一致
是否需复核：否

已核查内容5：GPT-4 核心开发者之一，参与预训练与推理系统设计

初步结论：部分属实，但角色描述不准确。Jack Rae 确实是 GPT-4 的贡献者之一，但其官方认证的角色是“研究基准测试执行负责人”（Research benchmark execution lead），而非报告中所述的“参与预训练与推理系统设计”。
信源支撑要点：
1. OpenAI 官方发布的 GPT-4 贡献者名单中，明确列出了 Jack Rae 的姓名和具体职责。
证据：
1. 原文摘录：Core contributors ... Jack Rae Research benchmark execution lead
  链接：https://openai.com/contributions/gpt-4/
  信源名称：OpenAI
  信源发布时间：unknown
信源结论一致性：部分一致
是否需复核：否

已核查内容6：提出“压缩即智能”理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能

初步结论：归因不当。将该理论归功于Jack Rae个人是错误的。“压缩即智能”的核心思想源于20世纪60年代由Ray Solomonoff、Andrey Kolmogorov等人独立提出的算法信息论。Jack Rae是这一经典理论在大型语言模型时代的重要倡导者和实践者，他通过在斯坦福大学的演讲等活动，有力地推广了“将语言建模视为压缩”的观点，但并非该理论的“提出者”。
信源支撑要点：
1. 多篇学术论文和历史回顾将算法信息论的创立归功于Solomonoff、Kolmogorov和Chaitin。
2. 多篇关于Jack Rae的报道和演讲回顾，均是围绕其在斯坦福MLSys研讨会上题为《Compression for AGI》的演讲，这表明他的角色是杰出的倡导者和诠释者。
证据：
1. 原文摘录：Algorithmic complexity was introduced independently by Solomonoff (1964) Kolmogorov (1965) and Chaitin (1966).
  链接：https://arxiv.org/pdf/2504.18568
  信源名称：arXiv.org
  信源发布时间：2025-04-22
2. 原文摘录：Ray was the first inventor of Algorithmic Information Theory which deals with the shortest effective description length of objects and is commonly designated by the term “Kolmogorov complexity.”
  链接：https://www.mdpi.com/1999-4893/3/3/260
  信源名称：MDPI
  信源发布时间：2010-07-20
3. 原文摘录：在最近OpenAI的学术分享会中，Jack Rae提出了一个重要的论断：大语言模型的本质，其实是一个性能强大的数据无损压缩器。
  链接：https://tech.ifeng.com/c/8OsTZsn6mGZ
  信源名称：凤凰网
  信源发布时间：2023-04-10
4. 原文摘录：2月28日，OpenAI 的核心研发人员 Jack Rae 在参加 Stanford MLSys Seminar 的访谈时进行了一个名为 Compression for AGI 的主题分享...
  链接：https://weibo.com/ttarticle/p/show?id=2309634884017727013209
  信源名称：微博
  信源发布时间：2023-03-27
信源结论一致性：一致
是否需复核：否

已核查内容7：Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人

初步结论：基本属实。多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色。
信源支撑要点：
1. 引用自Wired获取的扎克伯格备忘录的微博文章明确指出其为“Gemini的预训练技术负责人，负责Gemini 2.5的推理工作”。
2. 2025年2月的搜狐网访谈报道称其为“Gemini 推理项目的主要负责人之一”。
证据：
1. 原文摘录：杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。
  链接：https://m.weibo.cn/status/Pz1RNf6gX
  信源名称：微博（引用Wired获取的扎克伯格备忘录）
  信源发布时间：2025-07-01
2. 原文摘录：我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。
  链接：https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a
  信源名称：搜狐网
  信源发布时间：2025-02-27
信源结论一致性：一致
是否需复核：否

已核查内容8：背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一

初步结论：基本属实。多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。
信源支撑要点：
1. 智东西、今日头条等多家媒体在2025年6月的报道中均提及此事，并引用彭博社作为信源。
证据：
1. 原文摘录：据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。
  链接：https://m.zhidx.com/p/485447.html
  信源名称：智东西
  信源发布时间：2025-06-12
信源结论一致性：一致
是否需复核：否

待补充核查清单

待核查内容1：职位：超级智能实验室（Meta Superintelligence Labs）核心成员

核查原因：
1. “Meta Superintelligence Labs”这一实验室名称在部分新闻报道中出现，但需要确认这是否为Meta官方公布的正式名称，以确保职位描述的准确性。
所需查明信息：
1. Meta 官方是否发布过关于成立“Meta Superintelligence Labs”的公告或声明。
必要信源：
1. Meta 官方新闻中心（Meta Newsroom）。
2. Mark Zuckerberg 的官方社交媒体账号公告。
核查优先级：低

行动计划

检索Meta官方渠道（如Meta Newsroom、Mark Zuckerberg的Facebook或Threads账号），查找关于新成立的人工智能研究部门的官方公告，以核实“Meta Superintelligence Labs”这一名称的准确性。

让我搜索关于Meta Superintelligence Labs的官方信息。

Meta Superintelligence Labs official announcement Zuckerberg

东方财富网

美东时间周一，Meta公司CEO扎克伯格表示，他正在创建Meta超级智能实验室（Meta Superintelligence Labs）。该实验室将由公司最近聘用的AI精英们领导，包括Scale

cls.cn

**财联社7月1日讯（编辑刘蕊）**美东时间周一，Meta公司CEO扎克伯格表示，他正在创建Meta超级智能实验室（Meta Superintelligence Labs）。该实验室将由公司最近

百度百科

Meta超级智能实验室是2025年6月建立的实验室，由马克·扎克伯格重组建立，机构由Alexandr Wang领导，担任首席人工智能官。

腾讯

据彭博社报道，Meta CEO扎克伯格宣布，对公司人工智能部门进行全面重组，成立“Meta超级智能实验室”（Meta Superintelligence Labs，简称MSL），目标是研发能够匹敌甚至超越人类的AI系统。他在内部备忘录中表示：“超级智能的实现已经开始进入视野。我相信这将是人类新纪元的开始。MSL的负责人是Alexandr Wang，他曾是数据标注初创企业Scale AI的创始人兼CEO。就在本月早些时候，Meta刚刚向S...

17173游戏网

当地时间6月30日消息，Meta首席执行官马克·扎克伯格宣布对公司人工智能业务进行重大重组，成立全新机构Meta超级智能实验室（Meta Superintelligence Labs），并全力押注“超级智能”技术—即超越人类能力的人工智能系统。Meta超级智能实验室将整合 Meta现有的基础模型、AI产品及FAIR（Fundamental AI Research，基础AI研究）团队，同时设立专门研发下一代AI模型的...

虎嗅网

Meta正式将AI资源重组为超级智能实验室（Superintelligence Lab），核心战略从短期产品功能开发转向长期人工通用智能（AGI）攻关。该实验室将专注于科学突破和远见型目标，摒弃过往“AI渐进主义”路线，直接对标OpenAI等纯AI实验室的研发模式[1]。这一转型体现在三大资源重分配：1）计算资源上，停止现有产品线算力扩张，将新数据中心全部专用于AGI训练；2）人才配置上，抽调50%的AI产品团队转向基础模型研发；3）数据策略上，建立跨平台数据融合机制，突...

搜狐网

据外媒报道，Meta CEO马克・扎克伯格宣布对公司人工智能部门进行重组，成立Meta Superintelligence Labs（简称MSL），该实验室将专注于开发能够比肩甚至超越人类能力的“超级智能”人工智能系统。扎克伯格在本周一发布的全员信中表示，将整合公司人工智能资源并纳入全新的MSL部门。数据标签初创公司Scale AI前首席执行官亚历山德拉・王（Alexandr Wang）将担任首席人工智能官，负责领导这一团队，他将与GitHub前首席执行官纳特・弗里德曼（...

腾讯

近期，一则关于 Meta 旗下超级智能实验室（Superintelligence Labs）人员构成的消息引发了广泛关注。据内部消息人士透露，该实验室已招募 44 人，其中 50%的员工来自中国，40%曾在 OpenAI 任职。这一独特的人员构成背后，究竟隐藏着 Meta 怎样的战略布局？豪华阵容初现：多元背景的人才汇聚 Meta 超级智能实验室自成立以来，便开启了一场声势浩大的人才招募行动。从目前已汇聚的 44 名顶尖人才来看，其团队...

View all 16 sources

我需要继续查找关于Meta Superintelligence Labs的官方信息，特别是Mark Zuckerberg的官方声明。

Mark Zuckerberg Meta Superintelligence Labs official announcement memo

腾讯

在经历了数周的激烈人才争夺战后，Meta CEO 马克·扎克伯格（Mark Zuckerberg）于 7 月 1 日正式对外宣布成立 Meta 超级智能实验室（Meta Superintelligence Labs，简称 MSL），并在内部备忘录中详细介绍了这支从 OpenAI、Anthropic、Google 等竞争对手处挖来的豪华研发团队，标志着 Meta 在 AI 领域的重大战略调整正式落地。新实验室将由前 Sc...

东方财富网

野心勃勃的扎克伯格正在硅谷掀起一场疯狂的"抢人大战"。这位创始人兼首席执行官正亲自下场参与招募，以扩充Meta的AI团队，期望在这场激烈的人工智能竞赛中继续追赶最顶级的梯队。 7月1日，名为"Me

36氪

当地时间7月14日消息，Meta首席执行官马克·扎克伯格（Mark Zuckerberg）宣布，公司正在美国多地建设多个“吉瓦级”超级数据中心，以支撑其超级智能（superintelligence）研发计划，并表示将投入“数千亿美元”用于算力基础设施建设。扎克伯格在社交平台Threads发文称，首座命名为“普罗米修斯”（Prometheus）的数据中心建在俄亥俄州，计划于2026年投入使用。他补充说，公司正在同步建设多个被称为“泰坦集群”（Titan C...

今日头条

Meta CEO Mark Zuckerberg on Friday said Shengjia Zhao,the co-creator of OpenAI’s ChatGPT,will serve as the chief scientist of Meta Superintelligence Labs. Meta首席执行官马克·扎克伯格周五表示，openaiChatGPT的共同创造者赵胜佳将担任Meta超级智能实验...

IT之家

7 月 26 日消息，周五，Meta CEO 马克・扎克伯格（Mark Zuckerberg）正式宣布，前 OpenAI 的关键研究员赵晟佳，将出任公司新成立的核心部门—Meta 超级智能实验室（MSL）的首席科学家。赵晟佳是 OpenAI 多项里程碑式突破的关键人物，其履历贯穿了 ChatGPT、GPT-4，以及公司首个 AI 推理模型 o1 的研发。“我很高兴地宣布，赵晟佳将担任 Meta 超级智能实验室的首席科学家。扎克伯格在 T...

cnbc.com

Here is Zuckerberg's full internal memo released Monday: As the pace of AI progress accelerates, developing superintelligence is coming into ...

wired.com

Meta CEO Mark Zuckerberg told investors that his new research lab will focus on building AI models that can learn with minimal human input.

wired.com

Mark Zuckerberg notified Meta staff today to introduce them to the new superintelligence team. The memo, which WIRED obtained, ...

View all 13 sources

我需要查找更多关于Jack Rae在Meta Superintelligence Labs中的具体角色的信息。

Jack Rae Meta Superintelligence Labs role position

微博

DeepMind的Jack Rae和Pei Sun，曾负责Gemini模型和多模态推理系统；OpenAI语音与图像模型的重要推动者Huiwen Chang、Ji Lin 等。这些人才曾是OpenAI和Anthropic等机构的核心技术骨干，主导过GPT系列、Gemini系列等主流模型的关键技术领域。（综合APPSO、机器之心） 5 20 赞 25 小太阳就是这么美从团队阵容来看，简直堪称豪华。知未科技:配置拉满了，要钱给钱要人给人，还是公司大领导亲自重点抓的项目 7-1 16:19 来自...

搜狐网

马克・扎克伯格宣布成立 Meta 超级智能实验室（Meta Superintelligence Labs），该实验室将由其公司近期招聘的部分核心人才负责运营。这个新的人工智能超级智能项目由 Scale AI 前首席执行官亚历山大·王(Alexandr Wang)和前 GitHub 首席执行官纳特・弗里德曼（Nat Friedman）等高管领导，将整合多个致力于基础模型研发的团队。面对 OpenAI 和谷歌等竞争对手的激烈竞争，扎克伯格一直在大举招聘 A...

今日头条

其中包括前DeepMind研究员Jack Rae与Pei Sun，多位OpenAI科学家Jiahui Yu、Shuchao Bi、Shengjia Zhao和Hongyu Ren，以及曾在Meta工作十余年的Anthropic软件工程师Joel Pobar。这些来自OpenAI、Anthropic和Google的科学家和工程师，几乎涵盖了当前AI最核心模型的研发力量。Jack Rae来自DeepMind，曾主导Gopher和Chinchilla等早期LL...

17173游戏网

DeepMind的孙沛（Pei Sun）和杰克·雷伊（Jack Rae）也加入团队，共同推动前沿技术研发。Meta在AI领域的大手笔投入引发市场热情。公司计划未来数年投入“数千亿美元”用于AI研发。近期，Meta斥资143亿美元收购Scale AI 49%的股权，并积极洽谈语音AI初创企业PlayAI等合作项目。这一重组举措也获得了资本市场的积极响应。受此消息推动，Meta股价当日盘中创出747.90美元历史新高。至收盘时，Me...

新浪

在长达数周的高强度“挖角”之后，Meta今天凌晨宣布正式成立超级智能实验室（Meta Superintelligence Labs，简称MSL）。Meta CEO 马克·扎克伯格在当时时间周一发布的一封内部信中透露，MSL将整合公司现有的基础AI研究（FAIR）、大语言模型开发以及AI产品团队，并组建一个专门研发下一代AI模型的新实验室。多家媒体报道称，该实验室未来将成为Meta人工智能战略的核心。此次组织架构调整的最大看点，是引入多位...

搜狐网

内部备忘录显示，扎克伯格周一致信员工说，Meta的人工智能业务将划归名为Meta Superintelligence Labs的新团队，而该部门将由初创公司Scale AI前首席执行官Alexandr Wang领衔。此人被扎克伯格称为“他这一代人中最令人刮目相看的创始人”，将担任Meta的首席人工智能官。扎克伯格还表示，Github前首席执行官Nat Friedman将“与Alex合作领导”该团队，并负责Meta的AI产品和应用研究工作...

腾讯

据了解，在此次调整中，Meta方面整合现有大型语言模型团队，以及AI产品和基础AI研究团队（FAIR）等，成立全新的Meta超级智能实验室（Meta Superintelligence Labs，MSL），致力于开发能够比肩甚至超越人类能力的超级智能AI系统，为每个人打造专属的超级智能。对此扎克伯格在内部备忘录中写到，“随着AI进步的速度加快，开发超级智能已经触手可及。我相信这将是人类新时代的开始，我本人将全力以赴，致力于让Meta在这...

搜狐网

新入职人员有DeepMind前研究员Jack Rae和Pei Sun，来自OpenAI的特拉皮特·班萨尔（Trapit Bansal）以及Huiwen Chang、Ji Lin、Jiahui Yu、Shuchao Bi、Shengjia Zhao和Hongyu Ren等6位OpenAI华人前员工，此外还有Anthropic前员工乔尔·波巴（Joel Pobar）等。扎克伯格希望新实验室能够加速通用人工智能（AGI）的研发，并助力Meta的AI应用程序、智能眼镜等业务创造新...

View all 16 sources

现在我已经收集了足够的信息，可以提供一个全面的核查结果。

核查｜Jack Rae的职业经历与研究贡献【部分属实，需谨慎引用】

核查结果

本次核查共涉及10项主要陈述，其中5项基本属实，3项部分属实，2项不属实。不属实内容主要包括Chinchilla论文获得NeurIPS 2022最佳论文奖的说法，以及对Jack Rae在GPT-4开发中角色的不准确描述。部分属实内容主要涉及对其在Chinchilla项目中角色的夸大，以及将"压缩即智能"理论归功于他个人的不准确说法。此外，关于Meta Superintelligence Labs的具体名称已得到官方确认，但Jack Rae在该实验室中的具体职位描述仍需更多官方信息佐证。

用户报告中关于Jack Rae职业经历的总体框架是准确的，他确实横跨了DeepMind、OpenAI和Meta三大顶尖AI机构。然而，报告中对其具体贡献和角色的描述存在一定程度的夸大或不准确，建议在引用相关信息时谨慎对待，特别是关于其在特定项目中的领导角色和理论贡献方面。

核查过程

原文陈述	可靠性分析	证据
本科：英国布里斯托大学（University of Bristol）；硕士：美国卡内基梅隆大学（CMU）；博士：英国伦敦大学学院（UCL），计算机科学方向	属实多个独立信源证实Jack Rae的教育背景，包括UCL的官方知识库中的博士论文，以及其在OpenReview和MSN上的个人资料。	1. "I, Jack William Rae confirm that the work presented in this thesis is my own." https://discovery.ucl.ac.uk/10126220/1/Jack%20Rae%20-%20Final%20Thesis.pdf (未知日期) 2. "MS student University of Bristol (bristol.ac.uk) 2008 – 2017; MS student Carnegie Mellon University (cmu.edu) 2013 – 2014" https://openreview.net/profile?id=~Jack_William_Rae1 (未知日期) 3. "Education: Bachelor of Science (BS), University of Bristol, UK; Master of Science (MS), Carnegie Mellon University (CMU), USA." https://www.msn.com/en-in/money/news/who-are-the-11-ai-experts-hired-by-mark-zuckerberg-s-meta-fun-fact-all-of-them-are-immigrants/ar-AA1HT8Du (未知日期)
DeepMind（约2015–2023）...OpenAI（2023–2024）...Google DeepMind（2024–2025）...Meta（2025年6月起）	基本属实多个独立信源证实Jack Rae的职业路径确实横跨了DeepMind、OpenAI、Google DeepMind（回归）和Meta，时间点与各方报道基本吻合。	1. "Research Engineer. DeepMind (google.com). 2014 – 2017." https://openreview.net/profile?id=~Jack_William_Rae1 (未知日期) 2. "Jack Rae 是 OpenAI 的团队负责人，主要研究大型语言模型和远程记忆。此前，他在 DeepMind 工作了 8 年，领导大型语言模型 (LLM) 研究组。" https://weibo.com/ttarticle/p/show?id=2309634884017727013209 (2023-03-27) 3. "最新爆料，谷歌DeepMind首席研究员Jack Rae已经确认转投。" https://m.36kr.com/p/3333286223030536 (2025-06-12) 4. "杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。曾在DeepMind领导早期的Gopher和Chinchilla大语言模型项目。" https://m.weibo.cn/status/Pz1RNf6gX (2025-07-01)
主导项目：Gopher：2800亿参数的语言模型，首次将人类考试水平差距缩小一半	基本属实，但表述存在简化 Jack W. Rae是Gopher论文的第一作者和通讯作者，确为项目主导者。论文中确实提到Gopher"几乎将从GPT-3到人类专家表现的准确性差距缩小了一半"，但这一描述是与GPT-3对比，而非泛指所有人类考试。	1. "Jack W. Rae, Sebastian Borgeaud, Trevor Cai... Corresponding authors: jwrae@deepmind.com, geoffreyi@deepmind.com" https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf (2021-12-08) 2. "We find Gopher almost halves the accuracy gap from GPT-3 to human expert performance and exceeds forecaster expectations." https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf (2021-12-08)
Chinchilla：提出"Chinchilla Scaling Laws"	部分属实 Jack Rae 是 Chinchilla 论文《Training Compute-Optimal Large Language Models》的作者之一，该论文确实提出了关于计算最优训练的扩展法则。但将他个人描述为"提出者"不准确，该论文有22位作者，他是资深作者之一（倒数第三位），而非第一作者。	"Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, ... Jack W. Rae, Oriol Vinyals, Laurent Sifre" https://arxiv.org/abs/2203.15556 (2022-03-29)
2022年NeurIPS最佳论文	不属实 NeurIPS 2022 官方博客发布的获奖论文名单中，并不包含《Training Compute-Optimal Large Language Models》（Chinchilla论文）。	"We are excited to announce the award-winning papers for NeurIPS 2022! ... Outstanding Papers: Is Out-of-distribution Detection Learnable? ..." https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/ (2022-11-21)
Gemini 系列：Gemini 1.0/2.0 的预训练负责人，Gemini 2.5 的推理技术负责人	基本属实多方信源，包括据称为扎克伯格的内部备忘录，均指认 Jack Rae 在 Gemini 项目中担任关键领导角色，负责预训练和推理技术。	1. "杰克·雷（Jack Rae）：Gemini的预训练技术负责人，负责Gemini 2.5的推理工作。" https://m.weibo.cn/status/Pz1RNf6gX (2025-07-01) 2. "我们邀请到了 Google DeepMind 的首席科学家 Jack Rae，他也是 Gemini 推理项目的主要负责人之一。" https://m.sohu.com/a/864740570_122105141/?pvid=000115_3w_a (2025-02-27)
GPT-4 核心开发者之一，参与预训练与推理系统设计	部分属实，但角色描述不准确 Jack Rae 确实是 GPT-4 的贡献者之一，但其官方认证的角色是"研究基准测试执行负责人"（Research benchmark execution lead），而非报告中所述的"参与预训练与推理系统设计"。	"Core contributors ... Jack Rae Research benchmark execution lead" https://openai.com/contributions/gpt-4/ (未知日期)
提出"压缩即智能"理论：认为大模型训练本质是对数据的无损压缩，压缩越高效，模型越智能	归因不当将该理论归功于Jack Rae个人是错误的。"压缩即智能"的核心思想源于20世纪60年代由Ray Solomonoff、Andrey Kolmogorov等人独立提出的算法信息论。Jack Rae是这一经典理论在大型语言模型时代的重要倡导者和实践者，他通过在斯坦福大学的演讲等活动，有力地推广了"将语言建模视为压缩"的观点，但并非该理论的"提出者"。	1. "Algorithmic complexity was introduced independently by Solomonoff (1964) Kolmogorov (1965) and Chaitin (1966)." https://arxiv.org/pdf/2504.18568 (2025-04-22) 2. "Ray was the first inventor of Algorithmic Information Theory which deals with the shortest effective description length of objects and is commonly designated by the term 'Kolmogorov complexity.'" https://www.mdpi.com/1999-4893/3/3/260 (2010-07-20) 3. "在最近OpenAI的学术分享会中，Jack Rae提出了一个重要的论断：大语言模型的本质，其实是一个性能强大的数据无损压缩器。" https://tech.ifeng.com/c/8OsTZsn6mGZ (2023-04-10) 4. "2月28日，OpenAI 的核心研发人员 Jack Rae 在参加 Stanford MLSys Seminar 的访谈时进行了一个名为 Compression for AGI 的主题分享..." https://weibo.com/ttarticle/p/show?id=2309634884017727013209 (2023-03-27)
职位：超级智能实验室（Meta Superintelligence Labs）核心成员	基本属实多家媒体报道证实Meta确实成立了名为"Meta Superintelligence Labs"的新部门，且Jack Rae是被招募的11位AI顶尖人才之一。但关于他在该实验室中的具体职位，目前公开信息有限。	1. "美东时间周一，Meta公司CEO扎克伯格表示，他正在创建Meta超级智能实验室（Meta Superintelligence Labs）。" https://finance.eastmoney.com/a/202507013444586284.html (2025-07-01) 2. "在过去几周里，Meta从OpenAI、Anthropic和Google等竞争对手手中共招募了11位AI顶尖人才，几乎覆盖了当前主流大模型的全部研发脉络：...DeepMind的Jack Rae和Pei Sun，曾负责Gemini模型和多模态推理系统；" https://k.sina.cn/article_6890756658_19ab89a3204001fk8q.html (2025-07-01) 3. "Meta超级智能实验室是2025年6月建立的实验室，由马克·扎克伯格重组建立，机构由Alexandr Wang领导，担任首席人工智能官。...机构人员：Jack Rae、Pei Sun、特拉皮特·班萨尔、Huiwen Chang、Ji Lin、Jiahui Yu、Shuchao Bi、Shengjia ZhaoHongyu Ren、乔尔·波巴（Joel Pobar）等。" https://baike.baidu.com/item/Meta%E8%B6%85%E7%BA%A7%E6%99%BA%E8%83%BD%E5%AE%9E%E9%AA%8C%E5%AE%A4/65831818 (2025-07-21)
背景：被扎克伯格亲自挖角，成为Meta组建AGI团队的首批关键人物之一	基本属实多家科技媒体报道了此事，并归因于扎克伯格的亲自招募。	"据外媒彭博社今日报道，Meta公司正从谷歌、Sesame AI等科技公司挖走顶尖工程师，谷歌DeepMind的首席研究员杰克·雷（Jack Rae）已离开谷歌加入Meta。" https://m.zhidx.com/p/485447.html (2025-06-12)