Versatile Editing of Video Content, Actions, and Dynamics without Training

1 Google DeepMind | 2 Technion - Israel Institute of Technology | 3 The Weizmann Institute of Science
* Work done during an internship at Google DeepMind.

Abstract

Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in real-world videos, remains a major challenge. Existing trained models struggle with complex edits, likely due to the difficulty of collecting relevant training data. Similarly, existing training-free methods are inherently restricted to structure- and motion-preserving edits and do not support modification of motion or interactions. Here, we introduce DynaEdit, a training-free editing method that unlocks versatile video editing capabilities with pretrained text-to-video flow models. Our method relies on the recently introduced inversion-free approach, which does not intervene in the model internals, and is thus model-agnostic. We show that naively attempting to adapt this approach to general unconstrained editing results in severe low-frequency misalignment and high-frequency jitter. We explain the sources for these phenomena and introduce novel mechanisms for overcoming them. Through extensive experiments, we show that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks, including modifying actions, inserting objects that interact with the scene, and introducing global effects.

Method

Current inversion-free approaches struggle to perform general non-structure-preserving edits. In particular, when tuning their hyperparameters to allow significant modifications, they generate videos whose low frequencies are unnecessarily misaligned with the source video and whose high-frequencies suffer from jitter. We introduce two novel mechanisms to counter these phenomena: Similarity Guided Aggregation (SGA) and Annealed Noise Correlation (ANC). The SGA mechanism improves low-frequency global alignment to the source video, so that edited dynamics stay consistent with the scene layout. The ANC mechanism suppresses high-frequency jitter and stabilizes local appearance across frames.

Input Video

FlowEdit [1] (No SGA and ANC)

+ SGA

+ ANC

DynaEdit (With SGA and ANC)

BibTeX

@inproceedings{yourkey2026,
  title={Your Paper Title},
  author={First Author and Second Author and Third Author},
  booktitle={Conference Name},
  year={2026}
}

Competing Methods References