Skip to content
Download for Mac

Message Rendering & Actions

Every message passes through a rendering pipeline that handles markdown, mathematics, syntax-highlighted code, citations, thinking blocks, and tool call results. Each message also carries an action bar and a live token display that updates during streaming.

QARK supports the full GFM (GitHub Flavored Markdown) specification:

  • Headings (h1–h6) with proper hierarchy
  • Bold, italic, strikethrough, and inline code
  • Ordered and unordered lists with nesting
  • Block quotes with multi-level nesting
  • Tables with alignment (left, center, right)
  • Task lists with checkboxes
  • Horizontal rules
  • Links and autolinks
  • Images with alt text

Line breaks, paragraph spacing, and nested structures render identically to how they appear on GitHub.

QARK renders mathematical notation using KaTeX, supporting both inline and block expressions:

  • Inline math — Wrap expressions in single dollar signs: $E = mc^2$ renders inline within the surrounding text
  • Block math — Wrap expressions in double dollar signs for centered, full-width display:
$$
\int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}
$$

KaTeX covers a broad subset of LaTeX math commands including matrices, fractions, summations, integrals, Greek letters, and custom operators. Rendering happens client-side with no external service calls.

Code Blocks with Shiki Syntax Highlighting

Section titled “Code Blocks with Shiki Syntax Highlighting”

Code blocks receive syntax highlighting through Shiki, the same engine used by VS Code:

  • Dual-theme supportgithub-light in light mode, github-dark in dark mode. The theme switches automatically with your system or application preference.
  • Language badge — A label in the top-right corner of each code block shows the detected or specified language (Python, TypeScript, Rust, SQL, and 100+ others).
  • Copy button — One click copies the entire code block contents to your clipboard.
  • Deferred highlighting — Syntax highlighting runs via requestAnimationFrame to avoid blocking the main thread. Code appears immediately as plain text, then highlights apply in the next animation frame. For long code blocks, this prevents interface freezes.
def fibonacci(n: int) -> list[int]:
"""Generate first n Fibonacci numbers."""
sequence = [0, 1]
for i in range(2, n):
sequence.append(sequence[i - 1] + sequence[i - 2])
return sequence[:n]

When RAG (retrieval-augmented generation) provides context from your documents, the model’s response includes inline citation badges:

  • Citations appear as numeric superscript badges (e.g., [1], [2], [3]) positioned inline with the text they support
  • Hover over a badge to see a tooltip with:
    • Source document name
    • Page number or chunk identifier
    • Relevance score (percentage match from the retrieval step)
  • Citations link the model’s claims back to your source material, making it possible to verify accuracy without searching manually

When chain-of-thought thinking is enabled, the model’s internal reasoning appears in dedicated thinking blocks:

  • Visually separated from the main response with distinct styling (muted background, different text treatment)
  • Expandable and collapsible — Click to toggle visibility. Collapsed by default to keep the focus on the final answer.
  • Thinking content renders with the same markdown and code highlighting as regular messages
  • Thinking tokens are tracked separately in the token/cost badge

When the model invokes tools, each call renders as a collapsible block with structured information:

  • Icon — Visual identifier for the tool type
  • Label — The tool name and a brief summary of what was called
  • Status indicator — Shows whether the call is pending, succeeded, or failed
  • Color coding by tool type:
ColorTool CategoryExamples
PurpleThinking / reasoningChain-of-thought, planning
BlueWeb operationsWeb search, browse
CyanFetch / retrievalURL fetch, API calls
EmeraldDocument operationsFile read, RAG query

Expand a tool call block to see the full input parameters and output returned by the tool. Failed tool calls display the error message with enough context to diagnose the issue.

When an agent tool invokes a sub-agent, the resulting tool calls render with a depth indicator — a visual indent and connector line showing the nesting hierarchy. You can trace the full execution path: top-level agent → agent tool → sub-agent’s tool calls → results bubbling back up.

Images and videos generated by AI models or returned by tools render inline within the conversation:

  • Images display at a responsive width with aspect ratio preserved
  • Click any image to open a lightbox overlay with full-resolution viewing, zoom, and pan
  • Video content embeds with native playback controls
  • Generated images include metadata (model used, prompt, dimensions) in a caption area
Rich message showing a syntax-highlighted Python code block, a rendered LaTeX equation, an inline citation badge with hover tooltip, and a collapsed tool call block with status indicator

Hover over any message to reveal the action bar. The available actions differ by role:

IconActionBehavior
CopyCopyCopies the raw markdown text to your clipboard. Icon changes to a checkmark for 2 seconds as confirmation.
RegenerateRegenerateRe-sends the conversation to the model and generates a new response. The original response is preserved as a sibling — use the branch navigator (X/Y indicator + arrows) to switch between versions. See Conversations for branch navigation.
ForkForkCreates a new conversation containing all messages up to and including this response. The new conversation opens in a new tab. Useful for exploring a different direction without affecting the original thread.
ExportExportOpens a submenu with three options: Markdown, HTML, PDF. Exports this single message with its attachments, tool calls, and citations. See Export.
DeleteDeleteRemoves the message after a confirmation dialog. Hidden while the message is actively streaming.
IconActionBehavior
CopyCopyCopies the user message text
ForkForkCreates a new conversation containing all messages before this one — the user message itself is not copied, so you can rewrite it in the new conversation
ExportExportSame three-format submenu
DeleteDeleteRemoves the message with confirmation

User messages do not have a Regenerate action. Messages are immutable — there is no inline edit action. To change what you said, fork from the message and rewrite it.

After a response completes, the bottom of each assistant message displays a token count:

1,022 in · 767 out
  • in — input tokens sent to the model for this turn (conversation history + system prompt + tools)
  • out — output tokens the model generated

The badge appears on hover alongside the message actions. Cost in USD is not shown per-message — cumulative cost is displayed in the conversation’s Info Panel.

During response generation, the message header shows real-time metrics that update as tokens arrive:

1,022 in · ~234 out · 45%

Three values, updated via requestAnimationFrame batching:

ValueSourceUpdates
Input tokensActual context tokens calculated at stream startFixed for the duration of the stream
~Output tokensEstimated from streamed content length (content.length / 4). Includes thinking tokens if chain-of-thought is active. Prefixed with ~ because it is an estimate until the stream completes.Real-time as tokens arrive
Context usage %(input tokens / model context window) × 100, roundedFixed for the duration of the stream

The percentage value changes color as context fills up:

UsageColorMeaning
Below 60%Normal (muted)Plenty of headroom
60% – 79%YellowApproaching capacity
80%+RedNear or past compaction threshold

This gives you a live signal of how much of the model’s context window the current conversation is consuming — before the response even finishes.

While waiting for content to appear, the message shows contextual status text:

  • Thinking… — default status while the model processes
  • Compacting context… — context strategy is summarizing history (auto_compact triggered)
  • Fetching {hostname}… — URL prefetch is in progress

Once tokens start arriving, these status messages are replaced by the streaming content with an animated cursor.

When documents are being processed for the response, a progress indicator shows:

  • Current processing stage (parsing → chunking → embedding → indexing → searching)
  • Progress bar with percentage (for embedding and indexing stages)
  • Current file being processed
  • Completed / total document count

Rendering must keep pace with token streaming without consuming excessive CPU or dropping frames. QARK uses two key techniques:

During streaming, incoming tokens arrive faster than the DOM should update. QARK batches token arrivals and re-renders the message content at most every 80 milliseconds (approximately 12 renders per second). This interval balances perceived responsiveness against rendering cost — the content appears to flow smoothly without saturating the browser’s layout engine.

requestAnimationFrame Batching for Syntax Highlighting

Section titled “requestAnimationFrame Batching for Syntax Highlighting”

Syntax highlighting is computationally expensive for large code blocks. Rather than highlighting on every render cycle, QARK defers Shiki processing to requestAnimationFrame callbacks. This means:

  1. Raw code text appears immediately during streaming
  2. Highlighting applies during the browser’s next idle paint frame
  3. If the code block is still growing (more tokens arriving), highlighting re-applies only after the stream settles

The result: code blocks never cause visible jank during streaming, even for 500+ line outputs.