Skip to content
Download for Mac

Embedding & Reranking

When you attach documents to a conversation and use @document-search, QARK runs a RAG (Retrieval-Augmented Generation) pipeline behind the scenes. Two specialized model types make this work: embedding models convert your documents into searchable vectors, and reranking models re-score results for higher accuracy.

You don’t need to understand the internals to use document search — attach files, ask questions, get cited answers. But if you want to tune retrieval quality, this page covers the providers and models available.

  1. Attach documents — drag files into a conversation or use the Artifacts tab. Supported: PDF, DOCX, XLSX, PPTX, Markdown, HTML, TXT, EPUB, source code.
  2. QARK indexes automatically — documents are chunked, embedded, and stored in a local vector database. You see a progress indicator.
  3. Ask questions with @document-search — QARK converts your query into a vector, finds the most relevant chunks, optionally reranks them, and passes them to the chat model as context.
  4. Get cited answers — responses include inline citations with source document, page number, and relevance score.

QARK picks search strategies automatically — semantic search, HyDE (hypothetical document embeddings), or step-back queries — depending on the question type. You can override this per conversation in the Info panel.

API key: dash.voyageai.com → API Keys Base URL: https://api.voyageai.com/v1

ModelDimensionsMax tokensUse case
voyage-3102432KFlagship, best overall quality
voyage-3-lite51232KCost-efficient, strong retrieval
voyage-code-3102432KCode and technical docs
voyage-finance-2102432KFinancial documents, SEC filings
voyage-multilingual-2102432K30+ languages, cross-lingual search
voyage-large-2153616KHigh-dimensional legacy model

Voyage AI consistently ranks at the top of retrieval benchmarks (MTEB). The same API key works for both embedding and reranking.

API key: jina.ai → Dashboard Base URL: https://api.jina.ai/v1

ModelDimensionsMax tokensUse case
jina-embeddings-v310248KMultilingual (89+ languages), Matryoshka dims
jina-clip-v210248KMultimodal — text + image in same embedding space
jina-embeddings-v2-base-en7688KEnglish-only, cost-effective

Jina CLIP v2 is notable — it embeds both text and images into a shared vector space, enabling cross-modal retrieval. The same API key works for both embedding and reranking.

These providers offer embedding models through their existing QARK integrations — no separate API key needed if you already have the provider connected.

ProviderNotable modelsPricing
OpenAItext-embedding-3-large (3072d), text-embedding-3-small (1536d)$0.02–0.13/M
Geminigemini-embedding-001 (3072d, 40+ languages), gemini-embedding-2-preview (3072d, multimodal — text, images, video, audio, PDFs)Free tier / $0.15–0.20/M
TogetherBGE Large EN v1.5 (1024d), Multilingual E5 Large (1024d, 100+ languages)$0.008/M
OpenRouter20+ models — OpenAI, Mistral Codestral Embed, Qwen3 Embedding (4096d), NVIDIA Nemotron VL, Sentence Transformers, and more$0.01–0.15/M
Ollamanomic-embed-text, mxbai-embed-large, snowflake-arctic-embed — pull any embedding model locallyFree (local)
LM StudioAny GGUF embedding model loaded in the local serverFree (local)

Gemini embedding models have a free tier (rate-limited to 1,500 requests/day) and support 3072 dimensions. On the paid tier, pricing is $0.15/M tokens for embedding-001 and $0.20/M for the multimodal embedding-2-preview. The preview model embeds images, video, audio, and PDFs directly — not just text.


Reranking is optional but significantly improves retrieval accuracy. QARK over-fetches 3x the candidates from vector search, then uses a cross-encoder reranker to re-score and keep only the most relevant chunks.

ModelMax tokensUse case
rerank-28KHighest accuracy reranking
rerank-2-lite8KCost-efficient, strong quality
ModelMax tokensUse case
jina-reranker-v2-base-multilingual1KMultilingual (100+ languages)
jina-reranker-v1-base-en512English-only, fast

Reranking uses the same API key as the corresponding embedding provider — no additional setup if you already have Voyage AI or Jina connected.

You can run reranking models locally through Ollama or LM Studio — zero cost, no API key, data stays on your machine. QARK doesn’t auto-detect local reranking models, so you need to select one manually in the RAG settings.

With Ollama, pull a reranking-capable model:

Terminal window
ollama pull bge-reranker-v2-m3

Other options: bge-reranker-large, jina-reranker-v2-base-multilingual (if available in the Ollama library).

With LM Studio, load any GGUF reranking model into the local server.

Once running, select the model in Settings → Tools & MCP → RAG → Reranker. Local rerankers are slower than cloud alternatives but free and fully private.

If you don’t need local reranking, you can skip it entirely — vector search without reranking still produces good results. Or use Voyage AI / Jina AI free tiers for cloud reranking at minimal cost.


RAG settings panel showing embedding provider/model selection, reranking toggle, and reranker model dropdown
  1. Open Settings → Tools & MCP → RAG.
  2. Select an embedding provider and model.
  3. Optionally enable reranking and select a reranker model.
  4. Save. New documents use the selected models. Existing documents need re-indexing if you change the embedding model (QARK prompts you).

You can also override RAG settings per conversation in the Info panel → Config tab — different embedding model, different reranker, different threshold.

PriorityEmbeddingReranking
Best accuracyVoyage voyage-3 or OpenAI text-embedding-3-largeVoyage rerank-2
Lowest costGemini embedding (free tier) or Voyage voyage-3-liteVoyage rerank-2-lite
MultilingualJina embeddings-v3 or Voyage multilingual-2Jina reranker-v2-multilingual
Code / technical docsVoyage voyage-code-3Any reranker
Multimodal (images + text)Gemini embedding-2-preview or Jina CLIP v2Any reranker
Offline / air-gappedOllama with nomic-embed-textNot available locally

Different models produce vectors in different dimensional spaces — they’re not interchangeable. If you switch embedding models after indexing documents, QARK prompts you to re-index. Re-indexing processes all documents through the new model and replaces the stored vectors.