Skills and Documents

When a conversation has attached documents, QARK normally handles them through its document search pipeline — either via RAG (semantic search with embeddings) or File Tools mode (plain text files searchable with Unix commands). Skills can interact with this pipeline in two ways: they can work alongside it, or they can take over document handling entirely.

Default Behavior — Skills Alongside Document Search

By default, skills and document search coexist. When a skill is active and documents are attached:

The document search pipeline runs normally (RAG or File Tools, depending on configuration)
The agent can use both skill instructions and document search results
The activate_skill tool output includes a <document_paths> section listing all attached documents with their absolute paths

This works well for most skills — the skill provides procedural knowledge while the document pipeline provides content retrieval.

Taking Over — The `skip-doc-search` Flag

Some skills need direct control over how documents are processed. A PDF form-filling skill might need raw file access. A data analysis skill might want to parse spreadsheets with its own scripts rather than relying on chunked text. For these cases, skills can declare:

metadata:
  skip-doc-search: true

When this flag is set and the skill is active, the entire document search pipeline is bypassed:

RAG mode: the document_search tool is removed from the enabled tools. No embeddings are generated, no chunks are created.
File Tools mode: the unix_command tool is not auto-enabled for document workspace access. No workspace directory is created.
Direct injection: documents are not injected into the system prompt.

Instead, the skill receives raw document information through the activate_skill tool:

<document_paths>
  <document id="abc123" filename="report.pdf" chars="45000" type="application/pdf">
    <path>/absolute/path/to/report.pdf</path>
    <extracted>/absolute/path/to/report.extracted.txt</extracted>
  </document>
  <document id="def456" filename="data.xlsx" chars="12000" type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet">
    <path>/absolute/path/to/data.xlsx</path>
    <extracted>/absolute/path/to/data.extracted.txt</extracted>
  </document>
</document_paths>

Each document entry includes:

Field	Description
`id`	Unique document identifier
`filename`	Original filename
`chars`	Character count of extracted text
`type`	MIME type
`<path>`	Absolute path to the original file
`<extracted>`	Absolute path to the extracted text sidecar (`.extracted.txt`)

The skill’s instructions can then direct the agent to process these files using its own scripts or tools — reading the extracted text, parsing the original binary file, or passing paths to custom processing scripts via skill_execute.

Document Processing Metadata Flags

Three metadata flags control how QARK handles documents when a skill is active. All three are set under the metadata: key in the skill’s YAML frontmatter.

`skip-doc-search`


Type	boolean
Default	`false`
Effect	Bypasses the entire document search pipeline — RAG, File Tools, and direct injection are all disabled. The skill receives raw document paths instead.

When true, this flag changes two things:

At upload time: document chunking and embedding are skipped entirely. The raw file is saved to disk but no vector index entries are created — saving processing time and embedding costs.
At message time: the document_search tool is removed from enabled tools. No search results are injected into context. Instead, the activate_skill tool output includes a <document_paths> block with absolute paths to the original files and their extracted text sidecars.

`extract-text`


Type	boolean
Default	`true`
Effect	Controls whether `.extracted.txt` sidecar files are created during document upload.

When skip-doc-search is true, QARK still needs to decide whether to extract text from binary formats (PDF, DOCX, XLSX, PPTX, EPUB). The extract-text flag controls this:

true (default): QARK parses the document and writes a plain-text sidecar file alongside the original. The sidecar path appears in the <extracted> element of the document paths XML. This gives the skill’s scripts easy access to the text content without needing to parse the binary format themselves.
false: No sidecar is created. The skill receives only the path to the original binary file. Use this when the skill has its own parser or only needs the raw binary (e.g., a PDF form-filling skill that writes directly to the PDF).

`extract-images`


Type	boolean
Default	`false`
Effect	Controls whether a vision model is used to extract text from image files in the document set.

When true, image files (PNG, JPG, etc.) attached to the conversation are sent to a vision-capable model to extract any text content. The extracted text is written to a .extracted.txt sidecar. This is useful for skills that need to process screenshots, scanned documents, or images containing text — but it incurs additional API cost for the vision model call.

When false (default), images are stored as-is. The skill receives the image path but no text extraction is performed.

Example

---
name: contract-analyzer
description: Analyze legal contracts for key clauses, obligations, and risks.
metadata:
  skip-doc-search: true     # Skill handles documents with its own scripts
  extract-text: true         # Create text sidecars for easy access
  extract-images: true       # Extract text from scanned contract pages
---

With this configuration:

The user attaches a scanned contract PDF and several page images
QARK extracts text from the PDF into a .extracted.txt sidecar
QARK sends each image to a vision model to extract text into sidecars
No RAG chunking or embedding occurs
The skill’s activate_skill output includes paths to all originals and their sidecars
The skill’s scripts can process the extracted text with full control

Interaction with File Tools Mode

The File Tools document search mode and skill-owned document handling are independent systems:

File Tools mode active, no skill override: documents are extracted to the workspace directory and searchable via unix_command. This is the standard File Tools behavior.
Skill with skip-doc-search: true active: File Tools mode is bypassed. The skill receives raw document paths and handles everything itself.
File Tools mode active, skill without skip-doc-search: both coexist — the workspace is created, and the skill also receives document paths in its activate_skill output.

The skip-doc-search flag always takes precedence. When a skill declares it, the skill is fully responsible for document interaction — neither RAG nor File Tools will interfere.

When to Use Each Approach

Scenario	Approach
Skill adds expertise but documents are searched normally	Default (no flag)
Skill needs raw file access (form filling, binary parsing)	`skip-doc-search: true`
Skill processes documents with its own scripts	`skip-doc-search: true` + scripts in `scripts/`
Skill works alongside document context	Default — use both skill instructions and search results