LaxMotion 的关键观点是:3D human motion generation 不一定需要逐帧、逐关节、逐坐标的强监督。 人体动作天然存在多解性,同一个 2D observation 或文本意图可以对应多种合理 3D motion;过细的监督反而会把模型锁进单一解,削弱生成多样性与泛化。 论文通过放宽 supervision granularity,让模型在结构一致、运动合理的范围内学习,而不是被每一个局部细节强行对齐。
1. 问题:强监督并不总是更强
在 3D motion 里,精确坐标标签成本高,而且未必唯一。 如果训练目标要求生成结果严格贴合某个 single ground truth,模型可能学到的是标注集的窄分布,而不是动作空间的真实可行范围。
2. 核心思路:把监督从“点”放宽到“结构”
LaxMotion 关注监督粒度:哪些地方必须对齐,哪些地方可以只要求语义、姿态或动态结构一致。 这种更宽松的监督让模型保留合理自由度,同时仍然受到 2D observation 与 motion prior 的约束。
3. Key Insights:生成任务需要容纳多解性
对 recognition 来说,越细的 label 往往越有用;但对 generation 来说,过细 label 可能会误把一个可能答案当成唯一答案。 LaxMotion 的 insight 是把 supervision granularity 当成建模选择:监督太粗会漂,太细会僵,合适的“laxness”反而更接近真实 motion 分布。
English Summary
LaxMotion revisits how much supervision is actually needed for 3D human motion generation. The paper argues that overly fine-grained supervision can be counterproductive because human motion is inherently multi-modal.
Problem
A single 2D view or intent can correspond to many plausible 3D motions. Forcing the model to match one exact trajectory may reduce diversity and generalization, especially when precise 3D labels are expensive or ambiguous.
Core Idea
Relax supervision from exact coordinates toward structural and motion-level consistency. This keeps the model constrained by meaningful cues while preserving freedom to generate plausible alternatives.
Practical Takeaways
For generative motion tasks, supervision should match the uncertainty of the problem. The right granularity can improve learning more than simply adding stronger labels.