Skip to content
Download for Mac

Image Generation

Type @image-generation followed by a description to generate images directly in the conversation. The result renders inline — click to expand in a lightbox, or download to your filesystem.

QARK connects to 4 image generation providers. Configure your default in Settings → Image Generation, or override per conversation. Each provider requires its own API key in Settings → Providers.

ModelDescriptionQuality TiersPrice Range
GPT Image 1.5Fastest OpenAI model — 4× faster generation, improved text rendering and edit consistencyLow / Medium / High$0.009–$0.200
GPT Image 1High quality output across multiple quality tiersLow / Medium / High$0.011–$0.250
GPT Image 1 MiniBudget tier — 50–70% cheaper than GPT Image 1 for high-volume useLow / Medium / High$0.005–$0.078
DALL-E 3 (deprecated)Being sunset May 12, 2026. Use GPT Image 1 or 1.5 insteadStandard / HD$0.040–$0.120

Supported sizes: 1024×1024, 1024×1536, 1536×1024 (DALL-E 3 uses 1024×1792, 1792×1024).

ModelDescriptionPrice per Image
Imagen 4High-quality output with strong prompt adherence and rich detail$0.040
Imagen 4 UltraEnhanced realism, fine-grained details, superior text rendering$0.060
Imagen 4 FastSpeed-optimized variant for high-throughput use at reduced cost$0.020
Nano Banana (Gemini 2.5 Flash Image)Fast, efficient generation for high-volume, low-latency tasks$0.039
Nano Banana 2 (Gemini 3.1 Flash Image)Pro-level visual quality at Flash speed with advanced contextual understanding$0.045–$0.151
Nano Banana Pro (Gemini 3 Pro Image)Professional asset production with high-fidelity text rendering$0.134–$0.240

Nano Banana 2 and Pro support resolutions up to 4096×4096. Imagen models output at 1024×1024. All Gemini models support aspect ratios including 1:1, 3:4, 4:3, 9:16, 16:9.

ModelDescriptionPrice per Image
Grok Imagine ImageDiverse styles from ultra-realistic to anime, oil paintings, pencil sketches$0.020
Grok Imagine Image ProHigher quality output with more detailed rendering$0.070

Both support 1K and 2K resolution with aspect ratios 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3.

OpenRouter routes to multiple image models from different providers through a single API key:

ModelDescriptionPrice per Image
Nano Banana via OpenRouterGemini 2.5 Flash Image through OpenRouter$0.039
Nano Banana 2 via OpenRouterGemini 3.1 Flash Image through OpenRouter$0.045–$0.151
Nano Banana Pro via OpenRouterGemini 3 Pro Image through OpenRouter$0.134–$0.240
Seedream 4.5 (ByteDance)Excellent editing consistency, portrait refinement, small-text rendering$0.040
FLUX.2 Klein 4BFastest and cheapest Flux model for high-throughput use$0.014
FLUX.2 FlexExcels at complex text, typography, and fine details$0.060
FLUX.2 ProFrontier-level visual quality with strong prompt adherence$0.030
FLUX.2 MaxTop-tier Flux model — highest image quality and prompt understanding$0.070
Riverflow V2 Pro (Sourceful)Top-tier control, perfect text rendering, integrated reasoning$0.150–$0.330
Riverflow V2 Fast (Sourceful)Fastest Sourceful model for latency-critical workflows$0.020–$0.040

Default backend: Set in Settings. All new conversations use this backend unless overridden.

Per-conversation override: Change the active backend within a conversation to use a different provider for specific tasks without altering your global default.

Image generation result inline in a conversation showing the generated image with lightbox preview, download button, and generation metadata

Generated images render inline in the conversation with:

  • Lightbox preview — click the image to expand it to full resolution in an overlay
  • Download button — save the generated image to your local filesystem
  • Generation metadata — visible below the image: backend used, model, resolution, and generation time

Instead of “a cat,” describe the scene: “A tabby cat sitting on a windowsill at golden hour, watercolor style, warm tones, soft lighting.”

Name the artistic style explicitly: photograph, oil painting, pixel art, technical diagram, isometric illustration, charcoal sketch, 3D render.

Describe framing and perspective: close-up, wide shot, bird’s-eye view, centered subject, rule of thirds, negative space.

If the model supports negative prompting, specify what you don’t want: “No text overlays, no watermarks, no borders.”

Generate a first version, then refine. Reference the previous result in your follow-up: “Same composition but change the background to a mountain landscape” or “Make the lighting more dramatic.”

  • Web search + image generation — search for visual references, then generate variations
  • Document search + image generation — extract descriptions from documents and generate corresponding visuals
  • Thinking + image generation — enable thinking to have the agent refine your prompt before generating, improving first-attempt quality