Google's Gemini 3.1 Flash Image Preview, also known as Nano Banana 2, is the most popular image generation model on inference.sh by a wide margin. With over 53,000 tasks completed and 138 paying users relying on it daily, it has become the default choice for developers who need fast, high-quality image generation with advanced editing capabilities. The model combines Google's world knowledge with production-ready image output, and it runs as a serverless app on inference.sh with no GPU provisioning required.

What sets this model apart from competitors is the combination of speed, text rendering accuracy, Google Search grounding, and multi-image editing support. You can generate fresh images from text prompts, edit existing images with natural language instructions, and scale resolution from 512px previews up to native 4K output — all through a single API endpoint.

what it does

Gemini 3.1 Flash Image Preview generates and edits images from text prompts. It accepts up to multiple input images alongside your prompt, enabling workflows like background replacement, style transfer, object removal, and composite generation. The model understands spatial relationships, text rendering, and real-world objects with high fidelity.

The "Flash" designation means this runs on Google's fastest inference architecture. You get near-Pro quality results at significantly lower latency and cost than the full Gemini Pro Image model. For most production workloads, the quality difference is negligible while the speed advantage is substantial.

key features

Text rendering — The model generates readable, accurate text within images. Marketing copy, product labels, signage, and UI mockups render legibly without the garbled artifacts that plague other generators.

Google Search grounding — Enable real-time web search to ground generations in current information. Generate images of real products, current events, or specific locations with factual accuracy rather than hallucinated details.

Multi-image input — Pass multiple reference images for editing, composition, and style matching. The model extracts visual elements from each input and combines them according to your prompt instructions.

Resolution scaling — Output at 512px for quick previews, 1K for standard use, 2K for high-quality assets, or 4K for print and production work. Pricing scales with resolution.

Safety controls — Configurable safety tolerance from strict blocking to more permissive generation, letting you tune the content filter for your specific use case.

Batch generation — Generate multiple images per request with the num_images parameter, useful for exploring variations or producing asset sets.

use cases

Marketing asset generation — Generate social media images, ad creatives, and promotional materials with accurate text overlays. Iterate quickly on concepts without waiting for design resources.

Product visualization — Create product mockups, lifestyle shots, and e-commerce imagery from reference photos. Edit backgrounds, lighting, and context without reshooting.

Content creation pipelines — Automate thumbnail generation, blog illustrations, and documentation images. The API-first approach integrates directly into publishing workflows.

Image editing at scale — Batch process images for background removal, style changes, or object editing using natural language instructions instead of manual Photoshop work.

Educational and informational content — Generate diagrams, infographics, and explanatory visuals with accurate text and factual grounding from Google Search.

how to run

belt CLI

Basic text-to-image generation:

bash

1belt app run google/gemini-3-1-flash-image-preview --input '{"prompt": "A minimalist product photo of a ceramic coffee mug on a wooden table, soft morning light, clean white background"}'

Generate at 4K resolution with a specific aspect ratio:

bash

1belt app run google/gemini-3-1-flash-image-preview --input '{"prompt": "Wide landscape photograph of autumn trees reflected in a still lake, golden hour lighting", "aspect_ratio": "16:9", "resolution": "4K"}'

Image editing with a reference image:

bash

1belt app run google/gemini-3-1-flash-image-preview --input '{"prompt": "Replace the background with a tropical beach at sunset", "images": ["./product-photo.png"]}'

Generate with Google Search grounding for accuracy:

bash

1belt app run google/gemini-3-1-flash-image-preview --input '{"prompt": "The Sagrada Familia cathedral in Barcelona, photographed from the park across the street, realistic", "enable_google_search": true}'

API

bash

1curl -X POST https://api.inference.sh/v1/apps/google/gemini-3-1-flash-image-preview/run \2  -H "Authorization: Bearer $INFERENCE_API_KEY" \3  -H "Content-Type: application/json" \4  -d '{5    "prompt": "A retro travel poster for Tokyo with bold typography reading TOKYO in art deco style",6    "aspect_ratio": "2:3",7    "resolution": "2K",8    "num_images": 2,9    "output_format": "png"10  }'

input parameters

Parameter	Type	Required	Description
`prompt`	string	yes	The text prompt describing what to generate or how to edit the input images. Be specific about composition, style, lighting, and subject matter.
`aspect_ratio`	enum	no	Output aspect ratio. Supports standard ratios (1:1, 4:3, 3:4, 16:9, 9:16) and extended options including auto-detection from input images.
`resolution`	enum	no	Output resolution tier: 1K, 2K, or 4K. Higher resolution costs more per image. Default is 1K.
`num_images`	integer	no	Number of images to generate per request. Default is 1.
`images`	array	no	Input images for editing workflows. Pass file paths or URLs to reference images the model should modify or use as context.
`output_format`	enum	no	Output file format for generated images. Options include PNG and JPEG.
`enable_google_search`	boolean	no	Enable Google Search grounding for factually accurate generations. Adds $0.015 per request.
`safety_tolerance`	enum	no	Safety filter strictness. Options: BLOCK_NONE, BLOCK_FEW, BLOCK_SOME, BLOCK_MOST. Adjust based on your content requirements.
`retry_count`	integer	no	Number of automatic retries on rate limit (429) errors. Useful for high-volume batch processing.

output

The app returns a JSON object with:

images — Array of generated image file URLs. Each image is hosted on inference.sh cloud storage and accessible via the returned URL.
description — Optional text response from the model, particularly useful when the prompt asks a question or requests information alongside the image.
output_meta — Structured metadata including resolution details, generation parameters used, and billing information.

pricing

Resolution	Price per image
512px	$0.06
1K	$0.08
2K	$0.12
4K	$0.16

Google Search grounding adds $0.015 per request when enabled. Multi-image generation multiplies the per-image cost by the number of images requested.

when to use this vs alternatives

Choose Gemini Flash Image when you need fast iteration, accurate text rendering, Google Search grounding, or image editing capabilities. It excels at production workflows where speed and consistency matter more than artistic style exploration.

Choose GPT Image 2 when you need OpenAI's specific aesthetic style, mask-based inpainting precision, or when your pipeline is already built around OpenAI APIs.

Choose FLUX Dev when cost is the primary concern ($0.005/image vs $0.08+) and you need pure text-to-image generation without editing features.

Choose Qwen Image 2 when you need extremely complex infographic generation or long-prompt document-style outputs with dense text layouts.

FAQ

How many images can I generate in a single request?

You can generate multiple images per request using the num_images parameter. Each image is billed separately at the resolution-based rate.

Can I use this for image editing or only generation?

Both. Pass one or more images in the images array alongside your prompt to edit existing images. The model supports background replacement, object removal, style transfer, and composite generation from multiple reference images.

What is Google Search grounding and when should I enable it?

Google Search grounding lets the model query current web information during generation. Enable it when you need images of real places, products, people, or events to be factually accurate rather than hallucinated from training data. It adds $0.015 per request.

What aspect ratios are supported?

The model supports standard ratios including 1:1, 4:3, 3:4, 16:9, 9:16, 2:3, 3:2, and additional extended options. An "auto" mode can detect the appropriate ratio from input images during editing workflows.

How does the safety filter work?

The safety_tolerance parameter controls how strictly the content filter is applied. BLOCK_MOST is the most restrictive, while BLOCK_NONE allows the widest range of content. The default strikes a balance suitable for most commercial applications.