InterSyn 中文解读

TL;DR

InterSyn 关注 dynamic motion synthesis in the wild：真实动作不是孤立骨架序列，而是和场景、物体、其他动态元素持续互相影响。如果先生成人体动作，再后处理环境约束，很容易出现接触不稳、运动不合时宜、上下文不一致。 InterSyn 的关键是 interleaved learning：让 motion 与 context 在训练过程中交替提供约束，使模型学到“动作如何响应环境”，也学到“环境如何限制动作”。

1. 问题：真实动态场景不是静态背景

传统 motion synthesis 往往假设场景条件固定，或者把 context 当作一次性输入。但真实场景里，人在移动，物体可能也在变化，交互关系会随时间更新。这种动态性让单向条件生成很难保持长期合理。

2. 核心思路：交错学习动作与上下文

Interleaved learning 的直觉是：动作生成和上下文理解不应该分成两个互不相干的阶段。训练中让两类信号交替介入，可以减少“动作看起来像，但放进场景就不对”的问题。

3. Key Insights：动作生成需要环境节奏

好的 motion 不只由骨架轨迹决定，也由它在何时何地发生决定。 InterSyn 的 insight 是把 context 当成动态约束，而不是静态标签。对虚拟人、机器人和世界模型来说，这种耦合学习比单独提升骨架质量更接近真实需求。

English Summary

InterSyn targets dynamic motion synthesis in the wild, where human motion must remain coherent with changing contexts rather than a fixed background.

Problem

Generating motion first and applying context constraints afterward often leads to unstable contacts, mismatched timing, or physically implausible interactions.

Core Idea

Use interleaved learning so that motion and context constrain each other during training. The model learns not only how humans move, but also how motion should respond to dynamic surroundings.

Practical Takeaways

Motion quality should be evaluated together with contextual consistency. For embodied agents and world models, dynamic constraints are part of the motion, not an external cleanup step.

Links

Paper Project arXiv

InterSyn：面向真实动态场景的交错式动作合成