xAI's Grok Imagine Pro brings the image generation capabilities behind Grok to the inference.sh platform. It handles both text-to-image generation and image editing through a clean, minimal interface. The model produces high-quality outputs with strong prompt adherence and supports batch generation of up to 10 images per request. Available at app.inference.sh/apps/xai/grok-imagine-image-pro.
what it does
Grok Imagine Pro generates images from text prompts and edits existing images based on natural language instructions. The model is built by xAI - the same team behind the Grok language model - and reflects their approach to AI: capable outputs with minimal friction.
The interface is deliberately simple. You provide a prompt, optionally an input image for editing, choose an aspect ratio, and specify how many images you want. No complex parameter tuning, no guidance scales to adjust, no inference step counts to optimize. The model handles those decisions internally, which makes it fast to integrate and predictable to use.
For image editing, pass a source image alongside your prompt. The model understands what to change and what to preserve based on your text instructions. This works for style transfer, element modification, background replacement, and creative transformations.
key features
Simple interface - Four parameters total beyond the prompt. No diffusion-specific knobs to tune. The model makes quality decisions internally.
Batch generation - Generate up to 10 images per request. Useful for exploring variations, producing asset sets, or A/B testing creative directions.
Image editing - Pass an input image with editing instructions. The model handles object removal, style changes, element additions, and compositional modifications.
Multiple aspect ratios - Support for standard ratios including 1:1, 16:9, 9:16, 4:3, 3:4, and others. Choose the format that fits your output context.
Low input cost - Input images for editing cost only $0.002 each, making iterative editing workflows economical.
use cases
Rapid prototyping - The minimal parameter set means faster iteration. Describe what you want, get results, refine. No time spent tuning guidance scales or step counts.
Batch creative exploration - Generate 10 variations of a concept in a single call. Compare directions, pick winners, and iterate on the best results.
Image editing at scale - The low input image cost ($0.002) makes bulk editing workflows practical. Process product catalogs, batch-edit photos, or apply consistent transformations across image sets.
Social media content - Quick generation of post imagery, story graphics, and promotional visuals across multiple aspect ratios for different platforms.
Concept art and ideation - Fast visual exploration of ideas without parameter overhead. Describe scenes, characters, or environments and see them immediately.
how to run
belt CLI
Basic text-to-image:
1belt app run xai/grok-imagine-image-pro --prompt "Cyberpunk street market at night, neon signs in Japanese and English, steam rising from food stalls, reflective wet pavement, cinematic wide angle"With aspect ratio for social media:
1belt app run xai/grok-imagine-image-pro --prompt "Flat lay photography of artisan coffee setup, pour-over equipment, whole beans, linen texture background, morning light" --aspect_ratio "4:3"Batch generation for exploration:
1belt app run xai/grok-imagine-image-pro --prompt "Abstract geometric logo mark for a technology company, minimal, single color on white background, vector style" --n 10 --aspect_ratio "1:1"Image editing:
1belt app run xai/grok-imagine-image-pro --prompt "Transform into an oil painting in the style of the Dutch Golden Age, preserve the composition and lighting" --image "https://example.com/photograph.jpg"API
1from inference import Client23client = Client()4result = client.run("xai/grok-imagine-image-pro", {5 "prompt": "Architectural visualization, modern glass house cantilevered over a cliff edge, Pacific coast, dramatic sunset, photorealistic rendering",6 "aspect_ratio": "16:9",7 "n": 48})Image editing:
1result = client.run("xai/grok-imagine-image-pro", {2 "prompt": "Replace the sky with a dramatic thunderstorm, add lightning in the background, make the overall mood darker and more intense",3 "image": "https://example.com/landscape.jpg",4 "aspect_ratio": "16:9"5})input parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| prompt | string | yes | Text description of the desired image or editing instruction |
| aspect_ratio | string | no | Output aspect ratio. Options include 1:1, 16:9, 9:16, 4:3, 3:4 |
| image | string | no | Input image URL for editing workflows |
| n | integer | no | Number of images to generate, 1-10. Default is 1 |
output
The app returns an images array containing the generated image files. Each element is a downloadable file reference. The output_meta field provides structured metadata about the generation.
pricing
- Output images: $0.07 per image
- Input images (for editing): $0.002 per image
Generating 10 images costs $0.70. An editing workflow processing 100 input images at one output each costs $7.20 total ($0.20 for inputs + $7.00 for outputs).
Grok Imagine Pro sits in the mid-range for pricing - comparable to Qwen Image 2 Pro ($0.075) and roughly half the cost of Gemini 3 Pro at 1K/2K ($0.15). The batch limit of 10 images per request is the highest among comparable models on the platform.
when to use grok imagine pro vs alternatives
Choose Grok Imagine Pro when you want a simple interface without diffusion parameter tuning, need batch generation of up to 10 images per request, or want low-cost image editing workflows.
Choose Gemini 3 Pro when you need Google Search grounding, 4K resolution, or more sophisticated image editing with multiple reference images.
Choose FLUX Dev LoRA when you need custom style adaptation through LoRA weights or fine-grained control over the generation process.
Choose Qwen Image 2 Pro when your primary need is text rendering within images or infographic-style outputs.
Choose Seedream 4.5 when you want 4K resolution at the lowest cost per image ($0.04).
FAQ
How does the image quality compare to other models?
Grok Imagine Pro produces competitive results for general image generation. Its strength is consistency - the lack of exposed diffusion parameters means you get reliably good outputs without needing to tune settings. For specialized needs like text rendering or ultra-high resolution, dedicated models may perform better for those specific tasks.
Can I control the generation style more precisely?
The model does not expose guidance scale, inference steps, or seed parameters. Style control comes entirely through your prompt. Be specific about artistic style, medium, lighting, camera settings, and mood in your text description. The model responds well to detailed stylistic direction in the prompt itself.
What is the maximum batch size?
10 images per request. This is the highest batch limit among image generators on inference.sh. Useful for generating variations, exploring creative directions, or producing asset sets efficiently.
Does it support negative prompts?
No. The simplified interface does not include a negative prompt parameter. If you need to steer away from specific artifacts or styles, describe what you want positively in the prompt rather than what you want to avoid.
How does image editing work?
Pass an image URL in the image parameter alongside a text prompt describing the desired edit. The model interprets your instructions and modifies the image accordingly. It handles style transfers, element changes, background replacement, and creative transformations. The input image costs $0.002 in addition to the $0.07 per output image.