Skip to content
Download for Mac

RAG Pipeline

QARK’s RAG (Retrieval-Augmented Generation) pipeline turns your documents into a searchable knowledge base. Drag in files, they get parsed, chunked, and embedded into a local vector index. Ask questions with @document-search and get answers with inline citations pointing to exact sources, pages, and relevance scores.

Everything processes locally through your configured providers. No cloud upload.


Not every document needs a full vector pipeline. QARK routes based on size:

Document sizeRouteWhat happens
Below thresholdDirect injectionFull text inserted into the prompt context. No chunking, no embedding. Status: direct
Above thresholdFull RAG pipelineParsed → chunked → embedded → indexed → searched at query time

The threshold is a percentage of the model’s context window (default: 30%). A 200-page PDF always goes through RAG. A 2-paragraph markdown file gets injected directly.

Configure in per-conversation RAG settings or globally in Settings → RAG → Direct Injection Threshold.


FormatExtensionsNotes
PDF.pdfText extraction + optional image extraction
Word.docxPreserves heading structure
Excel.xlsxSheet-aware parsing
PowerPoint.pptxSlide-by-slide extraction
Markdown.md, .mdxHeading-based chunking
HTML.html, .htmTag-aware parsing
Plain text.txt, .log, .csvLine-based chunking
EPUB.epubChapter-aware extraction

Three methods:

  1. Drag and drop — drag files or folders onto the conversation. Multiple files accepted simultaneously.
  2. File picker — click the attachment button in the composer.
  3. Recursive folder scanning — drop a folder. QARK scans recursively, preserving directory structure in the document panel.

Each document progresses through visible stages. The document panel shows real-time status per file:

StageDescription
pendingQueued for processing
parsingExtracting text from the file format
chunkingSplitting into semantically meaningful segments
embeddingGenerating vector embeddings for each chunk
indexingStoring vectors in the local search index
searchingBeing queried (visible during response generation)
readyIndexed and available for retrieval
errorProcessing failed — hover for details
skippedUnsupported file type or empty file
directSmall enough for direct injection — no RAG processing

QARK selects a retrieval strategy per query. Four are available:

Standard vector similarity. Your query is embedded and compared against chunk vectors using cosine similarity. Best when your query language closely matches the document language.

Generates a hypothetical answer first, embeds that, then searches for similar chunks. Closes the vocabulary gap — your query “What’s the refund policy?” matches chunks describing the policy without using the word “refund.”

Generates a broader, more abstract version of your query, then searches with both original and abstracted queries. “Why did Q3 revenue drop in EMEA?” becomes “What factors influenced EMEA regional revenue trends?”

QARK analyzes the query and picks the best strategy automatically. The selected strategy appears in the response metadata.


For source code files, QARK uses syntax-aware chunking:

  • Functions, methods, and classes kept as whole units
  • Code blocks in Markdown never split mid-block
  • Import statements and module headers stay attached to the first chunk

Retrieved code chunks are syntactically complete, not arbitrary text fragments cut at a character limit.


Reranking is optional but improves retrieval accuracy. The process:

  1. Over-fetch — vector search retrieves 3x the final chunk count as candidates
  2. Re-score — a cross-encoder reranking model (Voyage AI or Jina AI) scores each candidate against the query with full attention
  3. Select — top-scoring chunks after reranking become the final context

Configure the reranker in per-conversation RAG settings or globally. See Embedding & Reranking for available models.


Retrieved chunks rarely exist in isolation. QARK fetches ±1 adjacent chunks from the same document, providing surrounding context. If chunk #14 scores highest, the model also sees chunks #13 and #15.


Every claim from your documents includes an inline citation badge (e.g., [1], [2], [3]).

FieldContent
Source documentOriginal file name
Page numberFor paginated formats (PDF, DOCX, PPTX)
SectionHeading or chapter title when available
Chunk IDInternal reference to the exact chunk
Relevance scoreSimilarity/reranking score (0–1)

Hover a badge for full citation details. Click to scroll to the referenced chunk in the document panel.

RAG response with inline citation badges and relevance scores

For PDFs with diagrams, charts, or embedded images:

  • Enable in per-conversation RAG settings under Image Extraction
  • Configure which vision-capable model handles image description
  • Extracted images are converted to text descriptions and indexed as additional chunks
  • Useful for technical manuals, slide decks, and research papers with figures

Each conversation maintains independent RAG settings:

SettingOptions
Embedding provider & modelAny configured provider with embedding models
RAG generation modelModel for HyDE/step-back query generation
Reranker provider & modelVoyage AI or Jina AI reranking models
Direct injection threshold1%–100% of context window (default: 30%)
Search strategySemantic, HyDE, Step-back, Auto
Image extraction modelAny vision-capable model

Access from the Info Panel → Config tab or the RAG config section in the conversation sidebar.

RAG settings panel with embedding and reranking configuration
  1. Drag 5 research papers onto the conversation.
  2. Watch each file progress through parsing → chunking → embedding → indexing → ready.
  3. Ask: “What are the main differences in methodology between papers 2 and 4?”
  4. Receive a structured comparison with inline citation badges linking to specific pages and sections.
  5. Hover any citation to verify. Click to jump to the exact chunk.