Qwen Image 2 Pro Generation on inference.sh

Alibaba's Qwen-Image-2.0 Pro is the enhanced version of their image generation model, specifically tuned for professional text rendering, fine-grained realism, and stronger semantic adherence. If you need AI-generated images with readable typography, infographic layouts, or photorealistic scenes with precise detail control, this is the model to reach for. Available at app.inference.sh/apps/alibaba/qwen-image-2-pro at $0.075 per image.

what it does

Qwen Image 2 Pro generates images from text prompts with exceptional fidelity to complex instructions. It also supports image editing through reference images - pass existing visuals alongside editing instructions and the model transforms them while preserving what matters.

The standout capability is text rendering. Where most image generators produce garbled or approximate text, Qwen Image 2 Pro renders clean, readable typography directly within generated images. This extends to full document layouts - infographics, presentation slides, posters, and comics with proper text hierarchy and alignment.

The Pro variant builds on the base Qwen Image 2 model with enhanced photorealism, better fine detail rendering (skin texture, fabric weave, material surfaces), and tighter adherence to prompt semantics. When your prompt says "three red apples and two green pears arranged in a line," the Pro model delivers exactly that count and arrangement.

key features

Professional text rendering - Generate images containing readable text: signs, labels, titles, body copy, calligraphy, and full document layouts. The model understands typographic hierarchy and alignment.

Fine-grained photorealism - Skin pores, fabric threads, water droplets, metal reflections - the Pro model renders micro-details that make images feel photographed rather than generated.

Negative prompts - Explicitly exclude unwanted elements from generations. Steer away from artifacts, styles, or content you do not want.

Prompt extension - Enable automatic prompt rewriting for more diverse and creative interpretations of your input. Useful for exploration when you want the model to elaborate on a brief concept.

Reference image editing - Pass up to 3 reference images for editing workflows. The model interprets your text instructions in context of the provided images.

Seed control - Lock the random seed for reproducible generations. Same seed plus same prompt equals same output, making iteration predictable.

Flexible dimensions - Output from 512 to 2048 pixels on both axes, with any width/height combination within that range.

use cases

Marketing materials with text - Social media graphics, banner ads, and promotional images that include headlines, taglines, or calls to action rendered directly in the image. No post-processing text overlay needed.

Infographic generation - Data visualizations, process diagrams, and informational graphics with proper labeling, legends, and explanatory text. Describe the information hierarchy and the model composes it visually.

Product photography - E-commerce listings, catalog shots, and packaging mockups with photorealistic material rendering. Fabric, leather, glass, metal, and food all render with commercial photography quality.

Editorial illustration - Magazine-quality portraits, lifestyle scenes, and narrative imagery with the fine detail and natural lighting that editorial work demands.

Comic and storyboard creation - Multi-panel layouts with dialogue bubbles, captions, and narrative text. The model handles panel composition alongside text placement.

Brand asset creation - Consistent visual content with embedded brand names, slogans, and messaging rendered in specified styles.

how to run

belt CLI

Basic text-to-image:

bash
1belt app run alibaba/qwen-image-2-pro --prompt "Professional real estate photography, modern kitchen with marble countertops, natural light from floor-to-ceiling windows, warm afternoon glow, architectural digest style"

With text rendering:

bash
1belt app run alibaba/qwen-image-2-pro --prompt "Minimalist coffee shop menu board, black background, chalk-style white text listing: Espresso $4, Latte $5.50, Cappuccino $5, Cold Brew $6. Hand-drawn coffee cup illustration in corner. Rustic aesthetic." --width 1024 --height 1536

Image editing with reference:

bash
1belt app run alibaba/qwen-image-2-pro --prompt "Change the season to autumn. Add warm orange and red foliage to the trees, scatter fallen leaves on the ground, make the lighting warmer and more golden." --reference_images '["https://example.com/park-scene.jpg"]'

With negative prompt and seed:

bash
1belt app run alibaba/qwen-image-2-pro --prompt "Photorealistic portrait of an elderly craftsman in his workshop, dramatic side lighting, shallow depth of field" --negative_prompt "blurry, low quality, cartoon, anime, oversaturated" --seed 42 --width 1536 --height 2048

API

python
1from inference import Client23client = Client()4result = client.run("alibaba/qwen-image-2-pro", {5    "prompt": "Corporate infographic titled 'Q4 Revenue Growth' with a bar chart showing monthly increases from October through December, clean modern design, blue and white color scheme, sans-serif typography, data labels on each bar",6    "width": 2048,7    "height": 1536,8    "num_images": 19})

Batch generation with prompt extension:

python
1result = client.run("alibaba/qwen-image-2-pro", {2    "prompt": "Luxury watch product shot, dramatic black background",3    "num_images": 4,4    "prompt_extend": True,5    "width": 2048,6    "height": 20487})

input parameters

ParameterTypeRequiredDescription
promptstringyesText description of the desired image or editing instruction
widthintegernoOutput width in pixels, 512-2048. Total pixels cannot exceed 2048x2048
heightintegernoOutput height in pixels, 512-2048. Total pixels cannot exceed 2048x2048
negative_promptstringnoContent to avoid in the generation (e.g., "blurry, low quality, text artifacts")
num_imagesintegernoNumber of images to generate, 1-6. Default is 1
reference_imagesarraynoReference images for editing, 1-3 image URLs
seedintegernoRandom seed for reproducibility, 0 to 2147483647
prompt_extendbooleannoEnable automatic prompt rewriting for creative diversity
watermarkbooleannoAdd "Qwen-Image" watermark to bottom-right corner

output

The app returns an images array containing generated images in PNG format. Each element is a file reference that can be downloaded or passed to other apps. The output_meta field contains processing metadata.

pricing

$0.075 per generated image, flat rate regardless of resolution or whether reference images are used. Generating 6 images in a single batch costs $0.45.

This positions Qwen Image 2 Pro in the mid-range - more expensive than FLUX Dev LoRA ($0.035/megapixel) for basic generation, but cheaper than Gemini 3 Pro ($0.15/image at equivalent resolution). The text rendering capability justifies the premium over diffusion-only models for workflows that need embedded typography.

when to use qwen image 2 pro vs alternatives

Choose Qwen Image 2 Pro when your images need readable text, infographic layouts, or document-style compositions. Also strong for photorealistic scenes requiring fine material detail.

Choose Gemini 3 Pro when you need image editing workflows, Google Search grounding, or 4K resolution output.

Choose FLUX Dev LoRA when you need custom style adaptation through LoRA weights or want the lowest per-pixel cost.

Choose Seedream 4.5 when you want simple, fast text-to-image at high resolution with minimal configuration.

FAQ

How good is the text rendering really?

Qwen Image 2 Pro renders text significantly better than diffusion-only models. Short to medium text strings (headlines, labels, signs) render cleanly and legibly in most cases. Full document layouts with body text are possible but may have occasional character-level errors on longer passages. For critical text, always verify the output.

What does prompt_extend do?

When enabled, the model rewrites your prompt internally to add creative detail and diversity before generating. This is useful when you have a brief concept and want the model to elaborate, but should be disabled when you need precise control over exactly what appears in the output.

Can I control the exact font style?

Not directly through a font parameter, but you can describe the desired typography in your prompt: "bold sans-serif," "elegant serif script," "hand-drawn chalk lettering," "monospace terminal font." The model interprets these descriptions and renders appropriate styles.

What is the maximum resolution?

2048x2048 pixels is the maximum. You can set any width and height independently within the 512-2048 range, giving you full control over aspect ratio. Common configurations include 2048x1536 (4:3), 1536x2048 (3:4), and 2048x1152 (16:9).

How does reference image editing work?

Pass 1-3 images in the reference_images array alongside a prompt describing the desired edit. The model interprets your instructions in context of the reference images. You can request style changes, element additions or removals, season or lighting changes, and compositional modifications.

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.