Generative Video Motion Editing with 3D Point Tracks

1Adobe Research    2Adobe    3University of Maryland College Park

Our Edit-by-Track framework enables precise video motion editing via 3D point tracks. Explore our applications below:

I. Joint Camera & Object Motion Editing

II. Shape Deformation

III. Object Removal & Duplication

IV. Handling Partial Track Inputs



We introduce a novel method for editing both camera and object motions in a given video, a task that remains challenging for existing approaches. Explore the sections below for more details:

Baseline Comparisons

Our Edit-by-Track Framework

Given a video, we first estimate camera poses and 3D tracks using off-the-shelf models. Users then edit the estimated poses and 3D tracks to specify the desired camera and object motions.

We project both the original (source) and edited (target) 3D tracks into 2D screen coordinates using their respective camera parameters, aligning them with the video frames. These projected 3D tracks provide sparse correspondences, guiding our model to transfer visual context from the source video onto the target motion.

Our model builds on a pretrained text-to-video generation model, further fine-tuned with LoRAs and an additional 3D track conditioner for precise motion control. To preserve the original visual context, we encode the input source video into source video tokens and concatenate them with noisy target video tokens. The 3D track conditioner transforms the projected 3D tracks into paired track tokens, which are added the corresponding video tokens to guide the motion editing (see our paper for details).

Training Data

3D Control: Depth Order and Occlusion Handling

Model Analysis

Failure Cases

References

Motion-controlled image-to-video (I2V) generation methods Camera-controlled video-to-video (V2V) methods Relevant Video Motion editing

Societal Impact

We recognize that powerful video editing tools, including ours, may raise ethical considerations depending on context. While the work aims to augment human creativity and professional workflows, such capabilities could potentially be misused. We encourage responsible use aligned with community guidelines and maintain transparency about the editing applied.

Acknowledgements

We are grateful for the valuable feedback and insightful discussions provided by Yihong Sun, Linyi Jin, Yiran Xu, Quynh Phung, Dekel Galor, Chun-Hao Paul Huang, Tianyu (Steve) Wang, Ilya Chugunov, Jiawen Chen, Marc Levoy, Wei-Chiu Ma, Ting-Hsuan Liao, Hadi Alzayer, Yi-Ting Chen, Vinayak Gupta, Yu-Hsiang Huang, and Shu-Jung Han.

BibTeX


@article{lee2025editbytrack,
  author    = {Lee, Yao-Chih and Zhang, Zhoutong and Huang, Jiahui and Wang, Jui-Hsien and Lee, Joon-Young and Huang, Jia-Bin and Shechtman, Eli and Li, Zhengqi},
  title     = {Generative Video Motion Editing with 3D Point Tracks},
  journal   = {arXiv preprint arXiv:2512.02015},
  year      = {2025},
}