UniEdit-Flow for image inversion and editing. Our approach proposes a highly accurate and efficient, model-agnostic, training and tuning-free sampling strategy for flow models to tackle image inversion and editing problems. Cluttered scenes are difficult for inversion and reconstruction, leading to failure results on various methods. Our Uni-Inv achieves exact reconstruction even in such complex situations (1st line). Furthermore, existing flow editing always maintain undesirable affects, out region-aware sampling-based Uni-Edit showcases excellent performance for both editing and background preservation (2nd line).
Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them.
The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions.
In this work, we introduce a predictor-corrector-based framework for inversion and editing in flow models.
First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction.
Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach.
Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions.
A long short haired cat
with blue eyes looking up at something.
Two origami birds sitting on a branch.
A clown in pixel art style with colorful hair.
A young rider wearing full protective gear, including a black helmet and motocross-style outfit, is navigating a BMX bike motorcycle over a series of sandy dirt bumps on a track enclosed by a fence...
A koala cat with thick gray fur is captured mid-motion as it reaches out with its front paws to climb or move between tree branches, surrounded by lush green leaves and dappled sunlight in a forested area.
Per-step error of the velocities and samples of vanilla inversions.
We first synthesis an image \(\boldsymbol{Z}_0\), then conduct vanilla inversion to get inverted noises \(\boldsymbol{Z}_1\) with per-step velocity of \(\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i-1})\) (\(\blacklozenge\)) and \(\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i})\) (\(\blacksquare\)), respectively.
We plot the per-step local error of samples (\(\Delta \boldsymbol{Z}\)) velocities (\(\Delta \boldsymbol{v}\)).
The right shows the visualization of various \(\boldsymbol{Z}_1\), while their border colors correspond to different conditions (black for the initial noise).
Demonstration of various sampling-based image editing methods (dog \(\xrightarrow{}\) lion).
Directly utilizing \(\boldsymbol{c}^T\) as condition leads to an undue editing.
Leveraging delayed injection, which is widely used in diffusion-based methods, inevitably results in an inchoate performance when using deterministic models.
Our Uni-Edit mitigates early steps obtained components that are not conducive to editing, ultimately achieving satisfying results.
Visualization of Uni-Edit process.
The guidance mask of each denoising step is shown at the upper right of the image.
We also demonstrate the "Sphinx" phenomenon that existing latent fusion approaches may cause at the lower left of the figure.
@misc{jiao2025unieditflowunleashinginversionediting,
title={UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models},
author={Guanlong Jiao and Biqing Huang and Kuan-Chieh Wang and Renjie Liao},
year={2025},
eprint={2504.13109},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.13109},
}