Run FLUX Models Locally on Your Mac!

2024-11-07

MFLUX is essentially a port of the FLUX models using Apple’s MLX framework.

I’ve been playing around with it on my Mac, let me walk you through what I’ve learned.

No more fluff, let’s see how good it is!

Running FLUX models locally

What’s MFLUX All About?

MFLUX is basically a line-by-line port of the FLUX implementation from the Huggingface Diffusers library into Apple’s MLX.

It’s designed to be super minimal and explicit.

There are no convoluted config files (except for the tokenizers), and the network architectures are hardcoded.

The idea is to keep the codebase tiny and focused solely on expressing these models, which means fewer abstractions to deal with.

Even though MFLUX prioritizes readability over generality and performance, it’s surprisingly fast — and even faster when you use quantization.

It’s like having a lean, mean, image-generating machine right on your Mac.

Let me walk you through the repository and explain what I have found out along the way!

Installing MFLUX

Getting MFLUX up and running is pretty straightforward.

If you have uvinstalled, which is an extremely fast Python package and project manager, written in Rust, you can just run:

uv tool install --upgrade mflux

This will give you the mflux-generate command-line tool and everything you need to get started.

If you’re more of a virtual environment person, you can do it the classic way:

mkdir -p mflux && cd mflux && python3 -m venv .venv && source .venv/bin/activate  
pip install -U mflux

OK, ready to test!

Generating Your First Image

Alright, let’s get to the fun part — generating images!

With MFLUX installed, you can use the mflux-generate command to create images based on your prompts.

Here’s a simple example using the schnell model with quantization set to 8-bit and just 2 inference steps:

mflux-generate --model schnell --prompt "A serene landscape with mountains and a lake at sunrise" --steps 2 --seed 2 -q 8

Note: The first time you run this, it’ll download the model weights, which are about 34GB. Yeah, it’s hefty, but totally worth it.

Let’s ee what we have got:

Beautiful!

And if you want to use the more powerful dev model, you can run:

Heads Up: The FLUX.1-dev model might require access permissions. If you run into issues, check out this troubleshooting guide.

mflux-generate --model dev --prompt "A futuristic cityscape with flying cars and neon lights" --steps 25 --seed 2 -q 8

By the way, I’m running this with the following spec:

If you want to save on space and maybe speed things up, you can check out the quantization section below.

Also, by default, the models are cached in your home directory under .cache. If you want to change this, you can set the HF_HOME environment variable. More details are in the Hugging Face documentation.

Command-Line Arguments

There are a bunch of options you can use with mflux-generate. Here are some of the most useful ones:

--prompt: The text description of the image you want to generate.
--model or -m: Choose between schnell or dev.
--steps: Number of inference steps (more steps usually mean better quality).
--seed: Set a seed for random number generation if you want reproducible results.
--height and --width: Set the dimensions of your output image.
--quantize or -q: Use quantization to speed things up and save memory (options are 4 or 8).
--lora-paths: Path(s) to LoRA weights if you're using adapters.
--metadata: Exports a .json file with metadata about your image.

There’s more, but these are the ones I’ve found most handy. For a full list, you can refer to the documentation or use --help.

Next cohort will start soon! Reserve your spot for building full-stack GenAI SaaS applications

Running via Python Script

If you prefer working in Python scripts, you can generate images as following:

from mflux import Flux1, Config  
  
# Load the model  
flux = Flux1.from_alias(  
    alias="schnell",  # "schnell" or "dev"  
    quantize=8,       # 4 or 8  
)  
# Generate an image  
image = flux.generate_image(  
    seed=2,  
    prompt="A futuristic cityscape with flying cars and neon lights",  
    config=Config(  
        num_inference_steps=2,  
        height=1024,  
        width=1024,  
    )  
)  
image.save(path="image.png")

How Fast Is It?

I was curious about the performance, so I did some digging and testing.

The image generation speed varies depending on your Mac’s hardware.

For instance:

On an M3 Max, it’s around 20 seconds.
On an M2 Ultra, it’s under 15 seconds.
My 2023 M3 Pro (36GB) takes about 80 seconds with full precision.

If you want to test your own machine, you can run:

time mflux-generate \  
--prompt "A futuristic cityscape with flying cars and neon lights" \  
--model schnell \  
--steps 2 \  
--seed 2 \  
--height 1024 \  
--width 1024

Keep in mind that the first run might be slower due to model loading and other overheads.

Matching Diffusers Implementation

One of the things I love about MFLUX is that it can produce images identical to the Huggingface Diffusers implementation if you use the same initial latent array and parameters.

Comparison from https://github.com/filipstrand/mflux

Let’s Talk Quantization

Quantization is a nifty way to speed up the model and reduce memory usage by compressing the weights. MFLUX supports 4-bit and 8-bit quantization.

Here’s how you can use it:

mflux-generate \  
    --model schnell \  
    --steps 2 \  
    --seed 2 \  
    --quantize 8 \  
    --height 1920 \  
    --width 1024 \  
    --prompt "Your prompt here"

In my experience, using 8-bit quantization almost halves the image generation time on my M3 Pro, and the quality difference is negligible.

Model Sizes with Quantization

Here’s how the model sizes stack up:

4-bit: ~9.85GB
8-bit: ~18.16GB
16-bit (Original): ~33.73GB

Saving Quantized Models

If you want to save a quantized model to disk (so you don’t have to quantize at runtime every time), you can do:

mflux-save \  
    --path "/Your/Desired/Path/schnell_8bit" \  
    --model schnell \  
    --quantize 8

Loading Quantized Models

To use your saved quantized model:

mflux-generate \  
    --path "/Your/Desired/Path/schnell_8bit" \  
    --model schnell \  
    --steps 2 \  
    --seed 2 \  
    --prompt "Your prompt here"

You don’t need to specify the -q flag when loading from a saved quantized model.

Image-to-Image Generation

MFLUX also supports image-to-image generation. You can provide an initial image and let the model generate variations based on it. Here’s how:

mflux-generate \  
--prompt "Your prompt here" \  
--init-image-path "path/to/your/image.png" \  
--init-image-strength 0.3 \  
--model dev \  
--steps 20 \  
--seed 43 \  
--guidance 4.0 \  
--quantize 8 \  
--height 1024 \  
--width 1024

The --init-image-strength controls how much the initial image influences the output. Values between 0.0 and 1.0 are accepted, where higher values mean more influence.

Using LoRA Adapters

LoRA adapters let you fine-tune the model with additional weights. You can use them like this:

mflux-generate --prompt "Your prompt" --model dev --steps 20 --seed 43 -q 8 --lora-paths "path/to/lora.safetensors"

You can even combine multiple LoRAs:

mflux-generate \  
  --prompt "Your prompt" \  
  --model dev \  
  --steps 20 \  
  --seed 43 \  
  --lora-paths lora1.safetensors lora2.safetensors \  
  --lora-scales 1.0 0.5 \  
  -q 8

Just make sure the LoRA weights are compatible with the model you’re using.

ControlNet Integration

ControlNet allows for even finer control by using a reference image to guide the generation.

You can use it like this:

mflux-generate-controlnet \  
  --prompt "Your prompt" \  
  --model dev \  
  --steps 20 \  
  --seed 1727047657 \  
  -q 8 \  
  --lora-paths "path/to/lora.safetensors" \  
  --controlnet-image-path "path/to/reference.png" \  
  --controlnet-strength 0.5 \  
  --controlnet-save-canny

This is especially powerful when combined with LoRA adapters.

Note: ControlNet will download additional weights (~3.58GB) the first time you use it. Also, it’s currently optimized for the dev model, but it can work with schnell.

Keep in mind that MFLUX is still a work in progress, so here are some things to be aware of:

It generates images one at a time.
Negative prompts aren’t supported yet.
LoRA weights only support the transformer part of the network.
Some LoRA adapters might not work.
ControlNet currently only supports the canny version.

Workflow Tips

Here are some things I’ve found helpful:

To hide model fetching progress bars, you can run export HF_HUB_DISABLE_PROGRESS_BARS=1.
Use config files to save complex job parameters instead of typing them out every time.
Set up shell aliases for common commands. For example:
alias mflux-dev='mflux-generate --model dev'
alias mflux-schnell='mflux-generate --model schnell --metadata'

What’s Next?

There’s still a lot on the roadmap for MFLUX:

LoRA fine-tuning support.
Frontend support using Gradio or Streamlit.
Potential integration with tools like ComfyUI.
Support for PuLID and depth-based ControlNet.

Feel free to reach out if you have any questions or run into any issues.

Happy generating!

Bonus Content : Building with AI

And don’t forget to have a look at some practitioner resources that we published recently:

Llama 3.2-Vision for High-Precision OCR with Ollama

LitServe: FastAPI on Steroids for Serving AI Models — Tutorial with Llama 3.2 Vision

GOT-OCR2.0 in Action: Optical Character Recognition Applications and Code Examples

Thank you for stopping by, and being an integral part of our community.

Happy building!

Back to All Posts