The Agent Kernel

What “kernel” means

The word “kernel” is borrowed from operating systems. An OS kernel manages processes, memory, and hardware. QARK’s agent kernel manages agent execution: it resolves configuration, assembles a tool vector, applies a context strategy, streams the request to a provider, handles tool calls in a loop, and returns the final response.

You interact with a conversation UI. Under that UI, the kernel is doing real work:

Resolving which agent configuration applies to this conversation.
Building the exact tool vector — which tools this agent can call.
Applying the context strategy to decide what message history reaches the model.
Setting the model, temperature, and all LLM parameters.
Streaming the request through a Rust backend (powered by rig-core).
Handling any tool calls the model returns, looping back as needed.
Delivering the streamed response to the React frontend.

Even a single-turn factual question goes through this entire pipeline. There is no “lightweight mode” — every prompt gets the full agent treatment.

Agent dispatch flow

When you press Send, the kernel executes this sequence:

User message
  → Resolve agent config (global defaults → agent config → conversation overrides)
  → Build tool vector (builtin + MCP + agent-tools, filtered by agent config)
  → Apply context strategy (auto_compact, last_n, etc.)
  → Stream request to provider (via rig-core Rust backend)
  → Model returns response (possibly with tool calls)
  → Execute tool calls → append results → re-stream (loop)
  → Final response delivered to frontend via event channel

Screenshot: Architecture diagram showing user → kernel → agent → tools → provider → response

Multi-turn tool loops

When the model returns a tool call, the kernel executes the tool, appends the result to the conversation, and sends the updated history back to the model. This loop continues until the model responds without a tool call — or hits the turn limit.

Turn limits vary by tool category:

Tool category	Max turns per message
Default (no tools triggered)	10
Unix commands active	20
Agent-tools or MCP tools active	50

These limits prevent runaway execution. If an agent hits the limit, the kernel stops the loop and returns whatever the model has produced so far, along with a truncation notice.

Streaming architecture

QARK does not wait for a complete response before displaying text. The architecture is fully streaming:

Rust backend (rig-core) opens a streaming connection to the provider’s API.
Token chunks arrive and are pushed into an event channel.
The React 19 frontend consumes events and renders tokens as they arrive.
Tool call events trigger execution in the Rust layer; results flow back through the same channel.
Token counts, cost estimates, and timing data update in real time as the stream progresses.

This means you see the first token within milliseconds of the provider starting generation — not after the full response completes.

Default agent vs. custom agents

Every conversation starts with the default agent. This agent uses:

The global system prompt (configurable in Settings).
All available built-in tools.
The auto_compact context strategy.
The model assigned to the conversation (or the global default model).
Default temperature and LLM parameters.

You can create custom agents that override any of these. A custom agent might:

Use a specialized system prompt (“You are a senior Rust developer reviewing code for safety issues.”).
Restrict the tool vector to only @web-search and @document-search.
Set the context strategy to none for stateless operation.
Pin a specific model regardless of the conversation’s model picker.
Override temperature, top-p, or max tokens.

Custom agents are reusable across conversations. Assign one to a conversation and every prompt in that conversation dispatches through that agent’s configuration.

Configuration cascade

Agent configuration follows a 3-level cascade. Each level overrides the one above it:

Global defaults (Settings → Agent Defaults)
  ↓ overridden by
Agent config (the specific agent assigned to this conversation)
  ↓ overridden by
Conversation overrides (per-conversation settings in the conversation panel)

Example: Your global default temperature is 0.7. You create a “Code Review” agent with temperature 0.3. You open a conversation using that agent, but set the conversation-level temperature to 0.1. The model receives temperature 0.1.

This cascade applies to every configurable parameter: system prompt, tool vector, context strategy, model, temperature, top-p, max tokens, and all other LLM parameters.

What this means in practice

Change a global default and it propagates to every conversation that does not have an agent or conversation-level override.
Change an agent config and it propagates to every conversation using that agent (unless that conversation has its own override).
Change a conversation override and it affects only that conversation.

This gives you broad control at the top and surgical control at the bottom — without duplicating configuration.