Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in real-world videos, remains a major challenge. Existing trained models struggle with complex edits, likely due to the difficulty of collecting relevant training data. Similarly, existing training-free methods are inherently restricted to structure- and motion-preserving edits and do not support modification of motion or interactions. Here, we introduce DynaEdit, a training-free editing method that unlocks versatile video editing capabilities with pretrained text-to-video flow models. Our method relies on the recently introduced inversion-free approach, which does not intervene in the model internals, and is thus model-agnostic. We show that naively attempting to adapt this approach to general unconstrained editing results in severe low-frequency misalignment and high-frequency jitter. We explain the sources for these phenomena and introduce novel mechanisms for overcoming them. Through extensive experiments, we show that DynaEdit achieves state-of-the-art results on complex text-based video editing tasks, including modifying actions, inserting objects that interact with the scene, and introducing global effects.
Our method allows for dynamic object insertion, where a new object is added to the scene and interacts with the existing elements.
Our method allows for swapping of objects within the scene, such that the scene is adapted or interacting with the new object.
Our method enables the modification of actions performed by entities in the scene.
Our method can apply global effects to the entire scene, such as lighting or atmospheric changes.
@inproceedings{yourkey2026,
title={Your Paper Title},
author={First Author and Second Author and Third Author},
booktitle={Conference Name},
year={2026}
}