Skip to content

SP-FM (MixFlow)

Shortest-Path Flow Matching with Mixture-Conditioned Bases for Out-of-Distribution Generalization to Unseen Conditions.

Home | Installation | Running SP-FM



(A) Vanilla CFM

(B) SP-FM

Overview

SP-FM (MixFlow) is a conditional flow-matching framework for descriptor-controlled generation. Instead of relying on a single Gaussian base distribution, SP-FM learns a mixture base and a descriptor-conditioned flow jointly, trained via shortest-path flow matching. This joint modeling is designed to extrapolate smoothly to unseen conditions and improve out-of-distribution generalization across tasks.

Publication

This repository accompanies the arXiv manuscript:

  • Title: Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions
  • Authors: Andrea Rubbi, Amir Akbarnejad, Mohammad Vali Sanian, Aryan Yazdan Parast, Hesam Asadollahzadeh, Arian Amani, Naveed Akhtar, Sarah Cooper, Andrew Bassett, Lassi Paavolainen, Pietro Liò, Sattar Vakili, Mo Lotfollahi
  • arXiv: 2601.11827v2 [cs.LG] (11 Feb 2026)

    Paper link: https://arxiv.org/html/2601.11827v2

Datasets

Synthetic Data

We construct a synthetic benchmark of letter populations, where each condition corresponds to a letter and a specific rotation. Each descriptor encodes the letter identity and rotation, and SP-FM learns a mixture base distribution per condition. This setup allows us to test extrapolation to unseen letters and rotation angles.

Morphological Perturbations

We evaluate SP-FM on high-content imaging data in feature space. Cells (from BBBC021 and RxRx1) are embedded with a vision backbone, and the model is trained to generate unseen phenotypic responses from compound descriptors alone.

Perturbation Datasets

For transcriptomic perturbations, we use Chemical- or CRISPR-based single-cell datasets (Norman, Combosciplex, Replogle and iAstrocytes). Conditions correspond to perturbations' embeddings from pretrained models, and SP-FM is trained to model the distribution of perturbed cells.

Gene embeddings from GPT-4 were sourced from GenePert/GenePT data deposit.

URL: https://zenodo.org/records/10833191