apps/falai/flux-dev-lora

flux-dev-lora

Text-to-image and image-to-image generation with FLUX.1 [dev] LoRA support. Custom style adaptation and fine-tuned model variations from Black Forest Labs.

run with your agent
# install belt
$curl -fsSL https://cli.inference.sh | sh
# view schema & details
$belt app get falai/flux-dev-lora
# run
$belt app run falai/flux-dev-lora

There's a moment in every image generation project where the base model stops being enough. You've got FLUX Dev producing clean, coherent images from text prompts, but you need something more specific. A particular illustration style your brand uses. A character that needs to appear consistently across dozens of scenes. An aesthetic that sits somewhere between photorealism and painterly abstraction, and no amount of prompt engineering gets you there.

That's the gap LoRA adapters fill. And FLUX Dev LoRA on inference.sh is essentially FLUX Dev with a style knob bolted on - one that can dial into thousands of community-trained fine-tunes without you ever provisioning a GPU or managing model weights.

how lora adapters actually work

LoRA stands for Low-Rank Adaptation, and the concept is elegant in its efficiency. Instead of fine-tuning all the billions of parameters in a diffusion model, a LoRA trains a small set of additional weights that modify specific layers. The result is a file - usually between 10MB and 200MB - that encodes a style, subject, or concept as a lightweight overlay on the base model.

When you pass a LoRA URL to FLUX Dev LoRA, the system loads those weights at inference time and applies them to the generation process. The base model still handles all the heavy lifting: physical coherence, spatial reasoning, composition, lighting fundamentals. The LoRA steers the output toward whatever visual territory it was trained on. Think of it less like swapping engines and more like adding a lens filter - except the filter can fundamentally reshape how the model interprets your prompt.

The practical implication is that you get access to an enormous ecosystem of community-created adapters. HuggingFace hosts thousands of them. Civitai has thousands more. People have trained LoRAs on everything from specific anime characters to 1970s Kodachrome film grain to architectural rendering styles. Any of these work with FLUX Dev LoRA as long as they're compatible with the FLUX.1 Dev architecture.

what makes this different from just running flux dev

The base FLUX Dev model is excellent for general-purpose generation. It handles natural language prompts well, produces physically plausible images, and delivers consistent quality across a wide range of subjects. But it has a particular look - a default aesthetic that emerges from its training data. You can push against that with careful prompting, but you're always working within the model's natural tendencies.

LoRA support changes the equation. Instead of fighting the model to produce a specific style, you load a small adapter that reshapes its output distribution toward exactly what you want. The difference between prompt engineering a style and loading a LoRA trained on that style is often the difference between "sort of close" and "nailed it."

I find this particularly valuable for character consistency. One of the hardest problems in image generation is producing the same character across multiple scenes. Prompt descriptions of a character's appearance are inherently imprecise - the model interprets them slightly differently each time. A character-trained LoRA encodes the actual visual features, producing far more consistent results across different poses, lighting conditions, and compositions.

The scale parameter gives you fine control over how strongly the LoRA influences the output. At full strength (1.0), the adapter dominates the aesthetic. Dial it back to 0.3 or 0.4, and you get a subtle stylistic influence while preserving more of the base model's natural output. This is useful when you want a hint of a style without going all-in.

combining multiple loras

One of the more interesting capabilities is loading multiple LoRAs simultaneously. You might combine a style LoRA (say, a specific illustration technique) with a subject LoRA (a trained character) to get that character rendered in that style. Or layer a lighting LoRA with a texture LoRA for more nuanced control over the final image.

This works because each LoRA modifies different aspects of the generation process. They're additive in nature, so their effects stack. The practical advice here is to reduce individual scale weights when combining. Two LoRAs each at 1.0 will often produce over-saturated or incoherent results. Dropping them to 0.4-0.6 each tends to give cleaner combinations where both influences are visible but neither dominates.

There are limits to this, naturally. Combining LoRAs that target the same aspects of the model - two competing style LoRAs, for instance - can produce muddy or contradictory results. It's a tool that rewards experimentation.

image-to-image with style transfer

Beyond text-to-image generation, FLUX Dev LoRA supports image-to-image workflows. You provide a source image along with your prompt and LoRA configuration, and the model transforms that image according to both the text description and the loaded adapter's style.

The strength parameter controls how aggressively the model departs from your input. At low values (0.2-0.4), you get subtle stylistic adjustments - the composition and major elements stay intact while the rendering style shifts. At higher values (0.7-0.9), the model takes significant creative liberties, using your input more as a compositional suggestion than a strict reference.

This opens up interesting workflows. You can sketch a rough composition in any drawing tool, then use image-to-image with a polished style LoRA to turn that sketch into a finished piece. Or take existing photography and re-render it in a specific artistic style while maintaining the original framing and subject placement.

the economics of not hosting your own model

Running your own FLUX inference server with LoRA support means provisioning GPU instances, managing model loading pipelines, handling the LoRA weight swapping, and paying for idle time between requests. For teams generating images sporadically - even hundreds per day - the per-image API cost is almost certainly cheaper than maintaining dedicated infrastructure.

The per-megapixel pricing model means you're not overpaying for small images or getting surprised by costs on larger ones. Most teams don't have the volume to justify self-hosting, and even those that do often prefer the operational simplicity of an API.

The other hidden cost with self-hosting is LoRA management. Loading different LoRAs requires either keeping them all in memory (expensive) or swapping them in and out (slow). The API handles this transparently - you just pass a URL and it works.

where the loras come from

The FLUX LoRA ecosystem has grown rapidly since the model's release in August 2024. Training a custom LoRA typically requires 10-50 reference images and a few hours of GPU time. The community has produced adapters for an enormous range of purposes.

HuggingFace is the most straightforward source - you can link directly to .safetensors files hosted there. Civitai has a larger collection but tends toward more niche artistic styles. Several smaller communities have emerged around specific use cases like product photography, architectural visualization, and game asset creation.

You can also train your own. Tools like kohya_ss and ai-toolkit make the process accessible even without deep ML expertise. Train on your brand's photography style, your product line, or your specific artistic vision, host the resulting .safetensors file anywhere with a public URL, and it works with the API immediately.

The important constraint: the LoRA must be trained for the FLUX.1 Dev architecture specifically. LoRAs trained for Stable Diffusion 1.5, SDXL, or other architectures won't work here. The community usually tags these clearly, but it's worth verifying before assuming compatibility.

tuning generation parameters

Beyond LoRA selection, FLUX Dev LoRA exposes the full set of diffusion parameters for fine-tuning your results. FLUX Dev is a guidance-distilled model, so the guidance parameter works differently from traditional diffusion CFG. The default of 3.5 is well-calibrated - lower values around 2.0-3.0 give the model creative latitude, while higher values of 4.0-6.0 enforce stricter prompt adherence at the cost of some naturalness. I generally start at 3.5 and adjust from there based on whether the output feels too rigid or too loose.

Inference steps determine how many denoising passes the model makes. The practical range for FLUX Dev is 20-30 steps. Below 20, quality drops noticeably. Above 30, you're paying for generation time with diminishing returns. For production work, 25-28 is the sweet spot.

Seed values make outputs reproducible. Set a specific seed and you'll get the same image every time (given identical parameters). This is invaluable for iterative refinement - change one parameter at a time while holding the seed constant to see exactly what effect each adjustment has.

Output dimensions are fully flexible. You're not locked to preset aspect ratios. Set width and height to whatever your use case requires. Just be aware that larger images cost more, and extremely unusual aspect ratios can sometimes produce less coherent results since the model was primarily trained on standard proportions.

honest tradeoffs

FLUX Dev LoRA is not the answer to every image generation problem. The LoRA dependency means your output quality is bounded by the quality of the LoRA you're loading. A poorly trained adapter will produce poor results regardless of how well you tune other parameters. There's no quality guarantee when loading community-created LoRAs - some are excellent, many are mediocre, and testing is part of the workflow.

Generation speed is slightly slower than base FLUX Dev since the model needs to load and apply the LoRA weights. For single images this is negligible. For batch generation at scale, it's worth considering.

The model also doesn't support the most recent architectural advances like native video generation or multi-image coherence. It does one thing well: generating individual images with style customization. If you need video, inpainting, or multi-view consistency, those are different tools.

And while combining multiple LoRAs is powerful, it's not always predictable. Two LoRAs that work beautifully independently might produce garbage when combined. The only way to know is to test, and that testing costs time and (small amounts of) money.

faq

where can I find flux-compatible loras?

HuggingFace and Civitai are the two largest repositories. On HuggingFace, search for "flux lora" or "flux.1 dev lora" and look for .safetensors files with direct download links. Civitai has a dedicated FLUX section with community ratings and sample images. You can also train your own using kohya_ss or ai-toolkit with as few as 10-20 reference images and a few hours of GPU compute. The key requirement is FLUX.1 Dev architecture compatibility - adapters trained for other model families will not work.

how do I choose the right lora scale value?

Start at 0.8 for a single LoRA and evaluate the output. If the style is too dominant or the image looks over-processed, reduce to 0.5-0.6. If the LoRA influence is too subtle, increase toward 1.0. When combining multiple LoRAs, keep individual scales between 0.3-0.6 to prevent visual conflicts. The scale is linear - 0.5 gives roughly half the stylistic influence of 1.0. Some LoRAs are trained at higher effective strengths and work better at lower scale values, so experimentation is necessary.

what resolution should I generate at?

For web use and social media, 768x1024 or 1024x768 offers good quality at reasonable cost. For print or high-detail work, 1024x1024 or 1344x768 provides more detail. Going above 1.5 megapixels increases cost without proportional quality improvement for most use cases, since the model's training resolution caps the effective detail regardless of output size. Match your output dimensions to your actual display context rather than defaulting to maximum resolution.

api reference

about

text-to-image and image-to-image generation with flux.1 [dev] lora support. custom style adaptation and fine-tuned model variations from black forest labs.

1. calling the api

install the client

the client provides a convenient way to interact with the api.

bash
1pip install inferencesh

setup your api key

set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.

bash
1export INFERENCE_API_KEY="inf_your_key"

run and get result

submit a request and wait for the final result. best for batch processing or when you don't need progress updates.

python
1from inferencesh import inference23client = inference()456result = client.run({7        "app": "falai/flux-dev-lora",8        "input": {}9    })1011print(result["output"])

stream live updates

get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.

python
1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8        "app": "falai/flux-dev-lora",9        "input": {}10    }, stream=True):11    if update.get("progress"):12        print(f"progress: {update['progress']}%")13    if update.get("output"):14        print(f"output: {update['output']}")

2. authentication

the api uses api keys for authentication. see the authentication docs for detailed setup instructions.

3. files

file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.

automatic upload

the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.

python
1# local file paths are automatically uploaded2result = client.run({3    "app": "falai/flux-dev-lora",4    "input": {5        "image": "/path/to/local/image.png",  # detected & uploaded6        "audio": "https://example.com/audio.mp3",  # url passed through7    }8})

manual upload

you can also upload files manually and use the returned url.

python
1# upload and get a hosted URL2file = client.files.upload("/path/to/file.png")3print(file.uri)  # https://cloud.inference.sh/...

4. webhooks

get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.

python
1result = client.run({2    "app": "falai/flux-dev-lora",3    "input": {},4    "webhook": "https://your-server.com/webhook"5}, wait=False)

webhook payload

your endpoint receives a JSON POST with the task result:

json
1{2  "id": "task_abc123",3  "status": 9,4  "output": { ... },5  "error": "",6  "session_id": null,7  "created_at": "2024-01-15T10:30:00Z",8  "updated_at": "2024-01-15T10:30:05Z"9}
idstringtask id
statusnumberterminal status (9=completed, 10=failed, 11=cancelled)
outputobjecttask output (when completed)
errorstringerror message (when failed)
session_idstringsession id (if using sessions)
created_atstringiso timestamp
updated_atstringiso timestamp

5. schema

input

promptstring*

the prompt to generate an image from.

example: "Extreme close-up of a single tiger eye, direct frontal view. Detailed iris and pupil. Sharp focus on eye texture and color."
heightinteger

the height in pixels of the generated image.

default: 1024min:512max:2048
widthinteger

the width in pixels of the generated image.

default: 1024min:512max:2048
imagestring(file)

optional input image for image-to-image mode. when provided, the model will transform this image based on the prompt.

strengthnumber

how much to transform the input image (image-to-image only). 1.0 = full remake, 0.0 = preserve original.

default: 0.85min:0.01max:1
lorasarray

the loras to use for the image generation. you can use any number of loras and they will be merged together to generate the final image.

num_inference_stepsinteger

the number of inference steps to perform.

default: 28min:1max:50
guidance_scalenumber

the cfg (classifier free guidance) scale is a measure of how close you want the model to stick to your prompt when looking for a related image to show you.

default: 3.5min:0max:35
seedinteger

the same seed and the same prompt given to the same version of the model will output the same image every time.

num_imagesinteger

the number of images to generate.

default: 1min:1max:4
enable_safety_checkerboolean

if set to true, the safety checker will be enabled.

default: true
output_formatstring

the format of the generated image.

default: "jpeg"
options:"png""jpeg"

output

imagesarray*

the generated image(s).

output_metaobject

structured metadata about inputs/outputs for pricing calculation

ready to run flux-dev-lora?

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.