apps/pruna/flux-dev-lora

flux-dev-lora

Text-to-image and image-to-image generation with custom LoRA weights from HuggingFace

run with your agent
# install belt
$curl -fsSL https://cli.inference.sh | sh
# view schema & details
$belt app get pruna/flux-dev-lora
# run
$belt app run pruna/flux-dev-lora

There's a moment in every image generation project where the base model stops being enough. You've got FLUX Dev producing clean, coherent images from text prompts, but you need something more specific. A particular illustration style your brand uses. A character that needs to appear consistently across dozens of scenes. An aesthetic that sits somewhere between photorealism and painterly abstraction, and no amount of prompt engineering gets you there.

That's the gap LoRA adapters fill. And FLUX Dev LoRA on inference.sh is essentially FLUX Dev with a style knob bolted on - one that can dial into thousands of community-trained fine-tunes without you ever provisioning a GPU or managing model weights.

how lora adapters actually work

LoRA stands for Low-Rank Adaptation, and the concept is elegant in its efficiency. Instead of fine-tuning all the billions of parameters in a diffusion model, a LoRA trains a small set of additional weights that modify specific layers. The result is a file - usually between 10MB and 200MB - that encodes a style, subject, or concept as a lightweight overlay on the base model.

When you pass a LoRA URL to FLUX Dev LoRA, the system loads those weights at inference time and applies them to the generation process. The base model still handles all the heavy lifting: physical coherence, spatial reasoning, composition, lighting fundamentals. The LoRA steers the output toward whatever visual territory it was trained on. Think of it less like swapping engines and more like adding a lens filter - except the filter can fundamentally reshape how the model interprets your prompt.

The practical implication is that you get access to an enormous ecosystem of community-created adapters. HuggingFace hosts thousands of them. Civitai has thousands more. People have trained LoRAs on everything from specific anime characters to 1970s Kodachrome film grain to architectural rendering styles. Any of these work with FLUX Dev LoRA as long as they're compatible with the FLUX.1 Dev architecture.

what makes this different from just running flux dev

The base FLUX Dev model is excellent for general-purpose generation. It handles natural language prompts well, produces physically plausible images, and delivers consistent quality across a wide range of subjects. But it has a particular look - a default aesthetic that emerges from its training data. You can push against that with careful prompting, but you're always working within the model's natural tendencies.

LoRA support changes the equation. Instead of fighting the model to produce a specific style, you load a small adapter that reshapes its output distribution toward exactly what you want. The difference between prompt engineering a style and loading a LoRA trained on that style is often the difference between "sort of close" and "nailed it."

I find this particularly valuable for character consistency. One of the hardest problems in image generation is producing the same character across multiple scenes. Prompt descriptions of a character's appearance are inherently imprecise - the model interprets them slightly differently each time. A character-trained LoRA encodes the actual visual features, producing far more consistent results across different poses, lighting conditions, and compositions.

The scale parameter gives you fine control over how strongly the LoRA influences the output. At full strength (1.0), the adapter dominates the aesthetic. Dial it back to 0.3 or 0.4, and you get a subtle stylistic influence while preserving more of the base model's natural output. This is useful when you want a hint of a style without going all-in.

combining multiple loras

One of the more interesting capabilities is loading multiple LoRAs simultaneously. You might combine a style LoRA (say, a specific illustration technique) with a subject LoRA (a trained character) to get that character rendered in that style. Or layer a lighting LoRA with a texture LoRA for more nuanced control over the final image.

This works because each LoRA modifies different aspects of the generation process. They're additive in nature, so their effects stack. The practical advice here is to reduce individual scale weights when combining. Two LoRAs each at 1.0 will often produce over-saturated or incoherent results. Dropping them to 0.4-0.6 each tends to give cleaner combinations where both influences are visible but neither dominates.

There are limits to this, naturally. Combining LoRAs that target the same aspects of the model - two competing style LoRAs, for instance - can produce muddy or contradictory results. It's a tool that rewards experimentation.

image-to-image with style transfer

Beyond text-to-image generation, FLUX Dev LoRA supports image-to-image workflows. You provide a source image along with your prompt and LoRA configuration, and the model transforms that image according to both the text description and the loaded adapter's style.

The strength parameter controls how aggressively the model departs from your input. At low values (0.2-0.4), you get subtle stylistic adjustments - the composition and major elements stay intact while the rendering style shifts. At higher values (0.7-0.9), the model takes significant creative liberties, using your input more as a compositional suggestion than a strict reference.

This opens up interesting workflows. You can sketch a rough composition in any drawing tool, then use image-to-image with a polished style LoRA to turn that sketch into a finished piece. Or take existing photography and re-render it in a specific artistic style while maintaining the original framing and subject placement.

the economics of not hosting your own model

Running your own FLUX inference server with LoRA support means provisioning GPU instances, managing model loading pipelines, handling the LoRA weight swapping, and paying for idle time between requests. For teams generating images sporadically - even hundreds per day - the per-image API cost is almost certainly cheaper than maintaining dedicated infrastructure.

The per-megapixel pricing model means you're not overpaying for small images or getting surprised by costs on larger ones. Most teams don't have the volume to justify self-hosting, and even those that do often prefer the operational simplicity of an API.

The other hidden cost with self-hosting is LoRA management. Loading different LoRAs requires either keeping them all in memory (expensive) or swapping them in and out (slow). The API handles this transparently - you just pass a URL and it works.

where the loras come from

The FLUX LoRA ecosystem has grown rapidly since the model's release in August 2024. Training a custom LoRA typically requires 10-50 reference images and a few hours of GPU time. The community has produced adapters for an enormous range of purposes.

HuggingFace is the most straightforward source - you can link directly to .safetensors files hosted there. Civitai has a larger collection but tends toward more niche artistic styles. Several smaller communities have emerged around specific use cases like product photography, architectural visualization, and game asset creation.

You can also train your own. Tools like kohya_ss and ai-toolkit make the process accessible even without deep ML expertise. Train on your brand's photography style, your product line, or your specific artistic vision, host the resulting .safetensors file anywhere with a public URL, and it works with the API immediately.

The important constraint: the LoRA must be trained for the FLUX.1 Dev architecture specifically. LoRAs trained for Stable Diffusion 1.5, SDXL, or other architectures won't work here. The community usually tags these clearly, but it's worth verifying before assuming compatibility.

tuning generation parameters

Beyond LoRA selection, FLUX Dev LoRA exposes the full set of diffusion parameters for fine-tuning your results. FLUX Dev is a guidance-distilled model, so the guidance parameter works differently from traditional diffusion CFG. The default of 3.5 is well-calibrated - lower values around 2.0-3.0 give the model creative latitude, while higher values of 4.0-6.0 enforce stricter prompt adherence at the cost of some naturalness. I generally start at 3.5 and adjust from there based on whether the output feels too rigid or too loose.

Inference steps determine how many denoising passes the model makes. The practical range for FLUX Dev is 20-30 steps. Below 20, quality drops noticeably. Above 30, you're paying for generation time with diminishing returns. For production work, 25-28 is the sweet spot.

Seed values make outputs reproducible. Set a specific seed and you'll get the same image every time (given identical parameters). This is invaluable for iterative refinement - change one parameter at a time while holding the seed constant to see exactly what effect each adjustment has.

Output dimensions are fully flexible. You're not locked to preset aspect ratios. Set width and height to whatever your use case requires. Just be aware that larger images cost more, and extremely unusual aspect ratios can sometimes produce less coherent results since the model was primarily trained on standard proportions.

honest tradeoffs

FLUX Dev LoRA is not the answer to every image generation problem. The LoRA dependency means your output quality is bounded by the quality of the LoRA you're loading. A poorly trained adapter will produce poor results regardless of how well you tune other parameters. There's no quality guarantee when loading community-created LoRAs - some are excellent, many are mediocre, and testing is part of the workflow.

Generation speed is slightly slower than base FLUX Dev since the model needs to load and apply the LoRA weights. For single images this is negligible. For batch generation at scale, it's worth considering.

The model also doesn't support the most recent architectural advances like native video generation or multi-image coherence. It does one thing well: generating individual images with style customization. If you need video, inpainting, or multi-view consistency, those are different tools.

And while combining multiple LoRAs is powerful, it's not always predictable. Two LoRAs that work beautifully independently might produce garbage when combined. The only way to know is to test, and that testing costs time and (small amounts of) money.

faq

where can I find flux-compatible loras?

HuggingFace and Civitai are the two largest repositories. On HuggingFace, search for "flux lora" or "flux.1 dev lora" and look for .safetensors files with direct download links. Civitai has a dedicated FLUX section with community ratings and sample images. You can also train your own using kohya_ss or ai-toolkit with as few as 10-20 reference images and a few hours of GPU compute. The key requirement is FLUX.1 Dev architecture compatibility - adapters trained for other model families will not work.

how do I choose the right lora scale value?

Start at 0.8 for a single LoRA and evaluate the output. If the style is too dominant or the image looks over-processed, reduce to 0.5-0.6. If the LoRA influence is too subtle, increase toward 1.0. When combining multiple LoRAs, keep individual scales between 0.3-0.6 to prevent visual conflicts. The scale is linear - 0.5 gives roughly half the stylistic influence of 1.0. Some LoRAs are trained at higher effective strengths and work better at lower scale values, so experimentation is necessary.

what resolution should I generate at?

For web use and social media, 768x1024 or 1024x768 offers good quality at reasonable cost. For print or high-detail work, 1024x1024 or 1344x768 provides more detail. Going above 1.5 megapixels increases cost without proportional quality improvement for most use cases, since the model's training resolution caps the effective detail regardless of output size. Match your output dimensions to your actual display context rather than defaulting to maximum resolution.

api reference

about

text-to-image and image-to-image generation with custom lora weights from huggingface

1. calling the api

install the client

the client provides a convenient way to interact with the api.

bash
1pip install inferencesh

setup your api key

set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.

bash
1export INFERENCE_API_KEY="inf_your_key"

run and get result

submit a request and wait for the final result. best for batch processing or when you don't need progress updates.

python
1from inferencesh import inference23client = inference()456result = client.run({7        "app": "pruna/flux-dev-lora",8        "input": {}9    })1011print(result["output"])

stream live updates

get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.

python
1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8        "app": "pruna/flux-dev-lora",9        "input": {}10    }, stream=True):11    if update.get("progress"):12        print(f"progress: {update['progress']}%")13    if update.get("output"):14        print(f"output: {update['output']}")

2. authentication

the api uses api keys for authentication. see the authentication docs for detailed setup instructions.

3. files

file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.

automatic upload

the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.

python
1# local file paths are automatically uploaded2result = client.run({3    "app": "pruna/flux-dev-lora",4    "input": {5        "image": "/path/to/local/image.png",  # detected & uploaded6        "audio": "https://example.com/audio.mp3",  # url passed through7    }8})

manual upload

you can also upload files manually and use the returned url.

python
1# upload and get a hosted URL2file = client.files.upload("/path/to/file.png")3print(file.uri)  # https://cloud.inference.sh/...

4. webhooks

get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.

python
1result = client.run({2    "app": "pruna/flux-dev-lora",3    "input": {},4    "webhook": "https://your-server.com/webhook"5}, wait=False)

webhook payload

your endpoint receives a JSON POST with the task result:

json
1{2  "id": "task_abc123",3  "status": 9,4  "output": { ... },5  "error": "",6  "session_id": null,7  "created_at": "2024-01-15T10:30:00Z",8  "updated_at": "2024-01-15T10:30:05Z"9}
idstringtask id
statusnumberterminal status (9=completed, 10=failed, 11=cancelled)
outputobjecttask output (when completed)
errorstringerror message (when failed)
session_idstringsession id (if using sessions)
created_atstringiso timestamp
updated_atstringiso timestamp

5. schema

input

promptstring*

text description of the desired output.

lorastring

huggingface lora url (e.g., 'owner/model-name').

lora_scalenumber

lora application strength.

default: 1min:-1max:3
extra_lorastring

second lora url for combining styles.

extra_lora_scalenumber

second lora strength.

default: 1min:-1max:3
imagestring(file)

input image for img2img mode.

prompt_strengthnumber

how much to transform input image.

default: 0.8min:0max:1
num_outputsinteger

number of images to generate.

default: 1min:1max:4
num_inference_stepsinteger

number of denoising steps.

default: 28min:1max:50
guidancenumber

guidance scale.

default: 3min:0max:10
seedinteger

random seed.

aspect_ratiostring

aspect ratio.

default: "1:1"
options:"1:1""16:9""21:9""3:2""2:3""4:5""5:4""3:4""4:3""9:16""9:21"
megapixelsstring

output resolution in megapixels.

default: "1"
options:"1""0.25"
speed_modestring

speed optimization.

default: "Juiced 🧃"
options:"Base Model (compiled)""Lightly Juiced 🍊""Juiced 🧃""Extra Juiced 🔥"
output_formatstring

output format.

default: "jpg"
options:"jpg""png""webp"
output_qualityinteger

quality for jpg/webp.

default: 80min:1max:100

output

imagesarray*

generated image files.

output_metaobject

structured metadata about inputs/outputs for pricing calculation

ready to run flux-dev-lora?

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.