何恺明Transformer与生成模型新突破

2025/07/24·StepFun can make mistakes, check important info.

何凯明新文章

Reason

来源

重磅论文！何恺明、Yann LeCun等改造Transformer，CVPR 2025已收录

“残差神经网络发明人”、麻省理工学院副教授何恺明与图灵奖得主、“卷积网络之父”Yann LeCun的最新合作论文提出“无需归一化的Transformer”，目前已入选CVPR 2025，该发现有望进一步改进AI模型性能，给开发者们带来新思路。重新定义“归一化” 什么是无需归一化的Transformer？简单来说，归一化层在现代神经网络中无处不在，长期以来一直被认为是必不可少的。归一化层的主要作用是使网络的输入数据分布更加稳定，减少内部协变量偏移（Internal Cova...

https://s2.zimgs.cn/ims?kt=url&at=smstruct&key=aHR0cHM6Ly9ndy5hbGljZG4uY29tL0wxLzcyMy8xNTU5MDk2OTU3LzNjL2M0LzJhLzNjYzQyYTVlZjVlYzNkMzRiOWYwYmUwZjFjNDVhYzZiLmljbw==&sign=yx:4ux0Ov3IaZW0IBYarBCuQnARQ74=&tv=400_400

腾讯

1步碾压250步！何恺明新作祭出MeanFlow，一行公式让 FID 狂降近 70%

2025 年 5 月的一篇论文中，何恺明与 CMU、MIT 联合团队提出了一种全新的一步生成框架 MeanFlow。论文标题是：《Mean Flows for One-step Generative Modeling》。从论文的实验曲线来看，只跑1 步，图像质量居然能甩开跑 250 步的老牌扩散模型（2021 年前后的经典扩散模型，如 ADM）。这篇论文给一度陷入瓶颈的“一步生成”领域重新点燃了希望：作者们将传统 Flow Matching 中“瞬时速度”视角，替换成...

https://s2.zimgs.cn/ims?kt=url&at=smstruct&key=aHR0cDovL20uMTYzLmNvbS9mYXZpY29uLmljbw==&sign=yx:oT1erCQejExBeYcppA0kvJkAl_Y=&tv=400_400

网易

何恺明、LeCun领衔，用“动态Tanh”重塑AI底层设计，Transformer架构迎颠覆性革新！

然而，Meta AI华人科学家刘壮团队联合何恺明、Yann LeCun等大牛，用一篇CVPR 2025论文彻底打破这一铁律：只需一个复古的 tanh函数，就能替代所有归一化层！更令人惊叹的是，新方法 Dynamic Tanh（DyT）不仅性能更强，训练速度还提升30%，代码仅需 9行即可实现。一、归一化层为何成了“必须品”？1.深度学习的“稳定器” 自2015年批量归一化（BatchNorm）诞生以来，归一化层逐渐成为神经网络的标配。它的核心逻辑是：对每一层的输出做标...

腾讯云

8 web pages

思考过程

Hmm, the user is asking about new papers by Kaiming He. Let me analyze the search results carefully.

From the web search outputs, I can see several recent papers by Kaiming He:

The first one is about "Dynamic Tanh" (DyT) for Transformer architecture 1. This paper proposes replacing normalization layers with a scaled tanh function, showing comparable or better performance across various tasks. It was accepted to CVPR 2025.
Another significant paper is about MeanFlow for one-step generative modeling 2. This work introduces a new perspective using average velocity instead of instantaneous velocity, achieving impressive FID scores on ImageNet.
There's also mention of his lab at MIT with promising students working on generative models and other CV topics 7.

The user seems interested in AI research, possibly a researcher or student in the field. They might be looking for cutting-edge developments to incorporate into their own work. The concise query suggests they value efficiency.

I should focus on summarizing these key contributions clearly while highlighting their significance. The DyT and MeanFlow papers appear most impactful - one challenges normalization assumptions while the other advances generative modeling. The MIT lab news shows he's actively mentoring too.

I'll organize this chronologically from most recent publications, emphasizing how they push AI boundaries. The technical depth suggests the user would appreciate specific details about the innovations.

何恺明近期有新论文主要围绕Transformer架构优化和生成模型突破两大方向，以下是关键成果：

一、颠覆性 Transformer 改进：动态双曲正切（DyT）1 5

核心创新：用极简的 动态双曲正切函数（DyT） 替代传统归一化层（如LayerNorm）。
DyT 公式： $DyT (x) = tanh (α \cdot x)$ ，其中 $α$ 为可学习参数。
突破点：
- 观察到 LayerNorm 的输出天然呈现 S 形曲线（类似 tanh），而 DyT 通过动态缩放参数 $α$ 模拟归一化效果。
- 无需计算均值/方差，计算效率显著提升（H100 GPU 上比 RMSNorm 更快）。
实测表现：
在视觉（ViT）、语言（LLaMA）、生成模型（DiT）等 7 大类任务中，性能与标准 Transformer 持平或更优，且无需调整超参数。
意义：打破“归一化层是神经网络必需品”的十年认知。

二、生成模型新范式：MeanFlow 一步生成2

核心思想：用 平均速度视角 替代传统 Flow Matching 的“瞬时速度”，直击扩散模型（如 DDPM）需数百步的痛点。
关键公式：通过恒等式 $v (t) + \frac{\partial v}{\partial t} \cdot (t - r) = \overset{v}{ˉ}$ 将平均速度转化为可监督目标。
效果：
- 仅 1 步前向推理 在 ImageNet 256×256 上实现 FID=3.43，较此前最佳（Shortcut-XL 的 10.60）提升近 70%。
- 训练端仅多 20% 计算，推理端省去 249 次前向，且无需课程学习或蒸馏。

三、其他动态4 7

MIT 实验室：何恺明 MIT 团队新纳入多位顶尖成员（如 IMO/IOI 双料金牌得主邓明扬），聚焦生成模型与表示学习。
研究方向：团队持续探索自回归图像生成（如 Fluid 模型）、多模态预训练等。

这些工作体现何恺明团队“底层创新驱动上层突破”的风格——用简洁数学重新定义 AI 架构的可行性边界。

一、颠覆性 Transformer 改进：动态双曲正切（DyT）15

二、生成模型新范式：MeanFlow 一步生成2

三、其他动态47

一、颠覆性 Transformer 改进：动态双曲正切（DyT）1 5

三、其他动态4 7