Image Generation

Type @image-generation followed by a description to generate images directly in the conversation. The result renders inline — click to expand in a lightbox, or download to your filesystem.

Backends and Models

QARK connects to 4 image generation providers. Configure your default in Settings → Image Generation, or override per conversation. Each provider requires its own API key in Settings → Providers.

OpenAI

Model	Description	Quality Tiers	Price Range
GPT Image 1.5	Fastest OpenAI model — 4× faster generation, improved text rendering and edit consistency	Low / Medium / High	$0.009–$0.200
GPT Image 1	High quality output across multiple quality tiers	Low / Medium / High	$0.011–$0.250
GPT Image 1 Mini	Budget tier — 50–70% cheaper than GPT Image 1 for high-volume use	Low / Medium / High	$0.005–$0.078
DALL-E 3 (deprecated)	Being sunset May 12, 2026. Use GPT Image 1 or 1.5 instead	Standard / HD	$0.040–$0.120

Supported sizes: 1024×1024, 1024×1536, 1536×1024 (DALL-E 3 uses 1024×1792, 1792×1024).

Google Gemini

Model	Description	Price per Image
Imagen 4	High-quality output with strong prompt adherence and rich detail	$0.040
Imagen 4 Ultra	Enhanced realism, fine-grained details, superior text rendering	$0.060
Imagen 4 Fast	Speed-optimized variant for high-throughput use at reduced cost	$0.020
Nano Banana (Gemini 2.5 Flash Image)	Fast, efficient generation for high-volume, low-latency tasks	$0.039
Nano Banana 2 (Gemini 3.1 Flash Image)	Pro-level visual quality at Flash speed with advanced contextual understanding	$0.045–$0.151
Nano Banana Pro (Gemini 3 Pro Image)	Professional asset production with high-fidelity text rendering	$0.134–$0.240

Nano Banana 2 and Pro support resolutions up to 4096×4096. Imagen models output at 1024×1024. All Gemini models support aspect ratios including 1:1, 3:4, 4:3, 9:16, 16:9.

xAI

Model	Description	Price per Image
Grok Imagine Image	Diverse styles from ultra-realistic to anime, oil paintings, pencil sketches	$0.020
Grok Imagine Image Pro	Higher quality output with more detailed rendering	$0.070

Both support 1K and 2K resolution with aspect ratios 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3.

OpenRouter

OpenRouter routes to multiple image models from different providers through a single API key:

Model	Description	Price per Image
Nano Banana via OpenRouter	Gemini 2.5 Flash Image through OpenRouter	$0.039
Nano Banana 2 via OpenRouter	Gemini 3.1 Flash Image through OpenRouter	$0.045–$0.151
Nano Banana Pro via OpenRouter	Gemini 3 Pro Image through OpenRouter	$0.134–$0.240
Seedream 4.5 (ByteDance)	Excellent editing consistency, portrait refinement, small-text rendering	$0.040
FLUX.2 Klein 4B	Fastest and cheapest Flux model for high-throughput use	$0.014
FLUX.2 Flex	Excels at complex text, typography, and fine details	$0.060
FLUX.2 Pro	Frontier-level visual quality with strong prompt adherence	$0.030
FLUX.2 Max	Top-tier Flux model — highest image quality and prompt understanding	$0.070
Riverflow V2 Pro (Sourceful)	Top-tier control, perfect text rendering, integrated reasoning	$0.150–$0.330
Riverflow V2 Fast (Sourceful)	Fastest Sourceful model for latency-critical workflows	$0.020–$0.040

Configure Defaults and Overrides

Default backend: Set in Settings. All new conversations use this backend unless overridden.

Per-conversation override: Change the active backend within a conversation to use a different provider for specific tasks without altering your global default.

Image generation result inline in a conversation showing the generated image with lightbox preview, download button, and generation metadata

View Generated Images

Generated images render inline in the conversation with:

Lightbox preview — click the image to expand it to full resolution in an overlay
Download button — save the generated image to your local filesystem
Generation metadata — visible below the image: backend used, model, resolution, and generation time

Write Effective Prompts

Specify Visual Details

Instead of “a cat,” describe the scene: “A tabby cat sitting on a windowsill at golden hour, watercolor style, warm tones, soft lighting.”

Declare the Style

Name the artistic style explicitly: photograph, oil painting, pixel art, technical diagram, isometric illustration, charcoal sketch, 3D render.

Set the Composition

Describe framing and perspective: close-up, wide shot, bird’s-eye view, centered subject, rule of thirds, negative space.

State What to Exclude

If the model supports negative prompting, specify what you don’t want: “No text overlays, no watermarks, no borders.”

Iterate on Results

Generate a first version, then refine. Reference the previous result in your follow-up: “Same composition but change the background to a mountain landscape” or “Make the lighting more dramatic.”

Combine with Other Tools

Web search + image generation — search for visual references, then generate variations
Document search + image generation — extract descriptions from documents and generate corresponding visuals
Thinking + image generation — enable thinking to have the agent refine your prompt before generating, improving first-attempt quality