UniEdit-Flow
Unleashing Inversion and Editing
in the Era of Flow Models

Guanlong Jiao1,3  Biqing Huang1  Kuan-Chieh Wang2  Renjie Liao3 

1Tsinghua University  2Snap Inc.  3The University of British Columbia 



UniEdit-Flow for image inversion and editing. Our approach proposes a highly accurate and efficient, model-agnostic, training and tuning-free sampling strategy for flow models to tackle image inversion and editing problems. Cluttered scenes are difficult for inversion and reconstruction, leading to failure results on various methods. Our Uni-Inv achieves exact reconstruction even in such complex situations (1st line). Furthermore, existing flow editing always maintain undesirable affects, out region-aware sampling-based Uni-Edit showcases excellent performance for both editing and background preservation (2nd line).


Overview

Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them. The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions.
In this work, we introduce a predictor-corrector-based framework for inversion and editing in flow models. First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction. Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach. Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions.



An overview of our proposed Uni-Inv and Uni-Edit (bird → red bird). (a) indicates that vanilla flow inversion is incapable for both exact image inversion and controllable editing. (b) demonstrates our proposed Uni-Inv and Uni-Edit, which perform efficient and effective inversion and editing.

Feature: Text-driven Image / Video Editing

FLUX 🎨
Stable Diffusion 3 🎨
Stable Diffusion XL 🎨

A long short haired cat with blue eyes looking up at something.

Two origami birds sitting on a branch.

A clown in pixel art style with colorful hair.

Wan 🎥 Flow-based Video Generation Model

A young rider wearing full protective gear, including a black helmet and motocross-style outfit, is navigating a BMX bike motorcycle over a series of sandy dirt bumps on a track enclosed by a fence...

A koala cat with thick gray fur is captured mid-motion as it reaches out with its front paws to climb or move between tree branches, surrounded by lush green leaves and dappled sunlight in a forested area.

Method


Uni-Inv

The motivation of Uni-Inv is to conduct an accurate inversion capable of inverting ODE solutions back to the initial value for particular deterministic samplers. Intuitively, Uni-Inv introduces a correction procedure before performing the inversion step. It first transitions to the high-noise step and estimates the velocity by simulating a denoising procedure. Then it returns to the original low-noise step and performs inversion using the latest ''denoising-like'' velocity.

Per-step error of the velocities and samples of vanilla inversions. We first synthesis an image \(\boldsymbol{Z}_0\), then conduct vanilla inversion to get inverted noises \(\boldsymbol{Z}_1\) with per-step velocity of \(\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i-1})\) (\(\blacklozenge\)) and \(\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i})\) (\(\blacksquare\)), respectively. We plot the per-step local error of samples (\(\Delta \boldsymbol{Z}\)) velocities (\(\Delta \boldsymbol{v}\)). The right shows the visualization of various \(\boldsymbol{Z}_1\), while their border colors correspond to different conditions (black for the initial noise).

Uni-Edit

Delayed Injection

Previous works have introduced delayed injection, a simple yet effective technique that helps maintain image consistency during editing. Due to the non-linear and intersecting sampling trajectories of diffusion models, modifying conditions midway allows for trajectory transitions, facilitating more flexible and localized image editing. However, flow models exhibit straight-line and non-intersecting trajectories, which fundamentally hinder the effectiveness of delayed injection, particularly in image editing.

Delayed injection, which retains the source condition during the early denoising steps and introduces the edit condition at a middle timestep (illustrated in the bottom part), is a widely used technique in diffusion-based editing (top row). However, when applied to flow models (second row), it is ineffective. While flow-based editing exhibits a mild tendency toward the target edit, it fails to produce sufficiently strong effects.

Uni-Edit

Instead of simply injecting conditions earlier, Uni-Edit injects editing conditions earlier while mitigating excessive modifications using information from the current latent state. We propose additional correction steps during the editing procedure solely based on the current latent \(\boldsymbol{\widetilde{Z}}_{t_i}\). We further leverage the velocity difference \(\boldsymbol{v}_i^{-}\) to construct a mask \(\boldsymbol{m}_i = \texttt{MASK}(\boldsymbol{v}_i^{-})\), which serves as regional guidance for correction and velocity prediction, thereby improving the controllability of Uni-Edit.

Demonstration of various sampling-based image editing methods (dog \(\xrightarrow{}\) lion). Directly utilizing \(\boldsymbol{c}^T\) as condition leads to an undue editing. Leveraging delayed injection, which is widely used in diffusion-based methods, inevitably results in an inchoate performance when using deterministic models. Our Uni-Edit mitigates early steps obtained components that are not conducive to editing, ultimately achieving satisfying results.

Editing Procedure

Early steps focus on broader areas with stronger editing intensity to eliminate original concepts, while later steps refine details, reducing the influence of \(\boldsymbol{m}_i\).

Visualization of Uni-Edit process. The guidance mask of each denoising step is shown at the upper right of the image. We also demonstrate the "Sphinx" phenomenon that existing latent fusion approaches may cause at the lower left of the figure.

BibTeX

@misc{jiao2025unieditflowunleashinginversionediting,
    title={UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models}, 
    author={Guanlong Jiao and Biqing Huang and Kuan-Chieh Wang and Renjie Liao},
    year={2025},
    eprint={2504.13109},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2504.13109}, 
}