Skip to content
Download for Mac

Skills and Documents

When a conversation has attached documents, QARK normally handles them through its document search pipeline — either via RAG (semantic search with embeddings) or File Tools mode (plain text files searchable with Unix commands). Skills can interact with this pipeline in two ways: they can work alongside it, or they can take over document handling entirely.

Section titled “Default Behavior — Skills Alongside Document Search”

By default, skills and document search coexist. When a skill is active and documents are attached:

  • The document search pipeline runs normally (RAG or File Tools, depending on configuration)
  • The agent can use both skill instructions and document search results
  • The activate_skill tool output includes a <document_paths> section listing all attached documents with their absolute paths

This works well for most skills — the skill provides procedural knowledge while the document pipeline provides content retrieval.

Some skills need direct control over how documents are processed. A PDF form-filling skill might need raw file access. A data analysis skill might want to parse spreadsheets with its own scripts rather than relying on chunked text. For these cases, skills can declare:

metadata:
skip-doc-search: true

When this flag is set and the skill is active, the entire document search pipeline is bypassed:

  1. RAG mode: the document_search tool is removed from the enabled tools. No embeddings are generated, no chunks are created.
  2. File Tools mode: the unix_command tool is not auto-enabled for document workspace access. No workspace directory is created.
  3. Direct injection: documents are not injected into the system prompt.

Instead, the skill receives raw document information through the activate_skill tool:

<document_paths>
<document id="abc123" filename="report.pdf" chars="45000" type="application/pdf">
<path>/absolute/path/to/report.pdf</path>
<extracted>/absolute/path/to/report.extracted.txt</extracted>
</document>
<document id="def456" filename="data.xlsx" chars="12000" type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet">
<path>/absolute/path/to/data.xlsx</path>
<extracted>/absolute/path/to/data.extracted.txt</extracted>
</document>
</document_paths>

Each document entry includes:

FieldDescription
idUnique document identifier
filenameOriginal filename
charsCharacter count of extracted text
typeMIME type
<path>Absolute path to the original file
<extracted>Absolute path to the extracted text sidecar (.extracted.txt)

The skill’s instructions can then direct the agent to process these files using its own scripts or tools — reading the extracted text, parsing the original binary file, or passing paths to custom processing scripts via skill_execute.

Three metadata flags control how QARK handles documents when a skill is active. All three are set under the metadata: key in the skill’s YAML frontmatter.

Typeboolean
Defaultfalse
EffectBypasses the entire document search pipeline — RAG, File Tools, and direct injection are all disabled. The skill receives raw document paths instead.

When true, this flag changes two things:

  1. At upload time: document chunking and embedding are skipped entirely. The raw file is saved to disk but no vector index entries are created — saving processing time and embedding costs.
  2. At message time: the document_search tool is removed from enabled tools. No search results are injected into context. Instead, the activate_skill tool output includes a <document_paths> block with absolute paths to the original files and their extracted text sidecars.
Typeboolean
Defaulttrue
EffectControls whether .extracted.txt sidecar files are created during document upload.

When skip-doc-search is true, QARK still needs to decide whether to extract text from binary formats (PDF, DOCX, XLSX, PPTX, EPUB). The extract-text flag controls this:

  • true (default): QARK parses the document and writes a plain-text sidecar file alongside the original. The sidecar path appears in the <extracted> element of the document paths XML. This gives the skill’s scripts easy access to the text content without needing to parse the binary format themselves.
  • false: No sidecar is created. The skill receives only the path to the original binary file. Use this when the skill has its own parser or only needs the raw binary (e.g., a PDF form-filling skill that writes directly to the PDF).
Typeboolean
Defaultfalse
EffectControls whether a vision model is used to extract text from image files in the document set.

When true, image files (PNG, JPG, etc.) attached to the conversation are sent to a vision-capable model to extract any text content. The extracted text is written to a .extracted.txt sidecar. This is useful for skills that need to process screenshots, scanned documents, or images containing text — but it incurs additional API cost for the vision model call.

When false (default), images are stored as-is. The skill receives the image path but no text extraction is performed.

---
name: contract-analyzer
description: Analyze legal contracts for key clauses, obligations, and risks.
metadata:
skip-doc-search: true # Skill handles documents with its own scripts
extract-text: true # Create text sidecars for easy access
extract-images: true # Extract text from scanned contract pages
---

With this configuration:

  • The user attaches a scanned contract PDF and several page images
  • QARK extracts text from the PDF into a .extracted.txt sidecar
  • QARK sends each image to a vision model to extract text into sidecars
  • No RAG chunking or embedding occurs
  • The skill’s activate_skill output includes paths to all originals and their sidecars
  • The skill’s scripts can process the extracted text with full control

The File Tools document search mode and skill-owned document handling are independent systems:

  • File Tools mode active, no skill override: documents are extracted to the workspace directory and searchable via unix_command. This is the standard File Tools behavior.
  • Skill with skip-doc-search: true active: File Tools mode is bypassed. The skill receives raw document paths and handles everything itself.
  • File Tools mode active, skill without skip-doc-search: both coexist — the workspace is created, and the skill also receives document paths in its activate_skill output.

The skip-doc-search flag always takes precedence. When a skill declares it, the skill is fully responsible for document interaction — neither RAG nor File Tools will interfere.

ScenarioApproach
Skill adds expertise but documents are searched normallyDefault (no flag)
Skill needs raw file access (form filling, binary parsing)skip-doc-search: true
Skill processes documents with its own scriptsskip-doc-search: true + scripts in scripts/
Skill works alongside document contextDefault — use both skill instructions and search results