Skip to content
Download for Mac

Budget & Usage

QARK logs every token across every provider. Set monthly spending limits per provider, monitor costs in real time during streaming, and drill down into per-model breakdowns across your usage history. The cost ledger is append-only — entries persist even when you delete conversations.


Configure a monthly USD spending limit for each provider individually:

  1. Open Settings → Budget & Usage.
  2. Each configured provider displays a budget input field.
  3. Enter a dollar amount (e.g., 50.00 for $50/month). The value commits on blur or when you press Enter.
  4. Toggle the Enforce switch to control whether QARK blocks requests when the budget is exceeded.
Enforcement StateBehavior
EnabledStreaming stops and new requests are blocked once the monthly limit is reached
DisabledCosts are tracked and warnings are shown, but requests continue without interruption

Set different limits for different providers — $100/month for your primary cloud provider, $20/month for a secondary one, unlimited for local models.


Every message displays a detailed cost breakdown:

MetricDescription
Input tokensTokens sent to the model (prompt + context)
Output tokensTokens generated by the model
Thinking tokensTokens consumed by the reasoning/thinking step (tracked and priced separately for native-thinking models like Claude)
Cost (USD)Calculated cost based on the model’s per-token pricing

Compaction and compression operations (context window management) are tracked as separate cost entries with a purpose field distinguishing them from regular message costs.


Every cost event is written to an append-only ledger. Each record contains:

FieldDescription
idUnique record identifier
conversation_idWhich conversation generated this cost
message_idWhich message generated this cost
providerProvider name (e.g., Anthropic, OpenAI, Google)
modelSpecific model identifier
tokens_inInput token count
tokens_outOutput token count
tokens_thinkingThinking/reasoning token count
cost_usdCalculated cost in USD
purposemessage, compaction, compression, or other operation type
timestampWhen the cost was incurred

QARK monitors spending against your configured limits during streaming:

  • Warning threshold (80%): When a provider’s monthly spending reaches 80% of its budget, a warning notification appears.
  • Budget exceeded (100%): If enforcement is enabled, check_provider_budget returns allowed=false and streaming stops immediately. A clear message explains which provider hit its limit and what the current spend is.

Warnings are emitted in real time — you’ll see them during an active stream if a response pushes spending past the threshold.


The Budget & Usage dashboard provides a visual overview of your spending patterns.

A summary card at the top shows:

  • Total spending this month across all providers
  • Stacked bar chart breaking down the top 3 provider spenders plus an “Others” bucket (8-color rotation palette)

Below the current month, the last 3 months are displayed as expandable sections:

ElementDescription
Month headerMonth name + total spend
Trending indicatorPercentage change vs. previous month (▲ up / ▼ down)
Provider breakdownMini progress bars per provider showing cost, message count, and percentage of the month’s total

Click any month to expand its full provider-level breakdown.

Below the timeline, each provider gets a dedicated card sorted by spending (highest first):

Visual StateCondition
Primary color barSpending below 80% of budget
Yellow barSpending between 80%–99% of budget
Red barSpending at or above 100% of budget

Each card displays: current spend, budget limit, remaining budget, and a progress bar.

Budget dashboard with stacked bar chart and provider cards Provider card showing progress bar near budget limit

Open the Usage History modal from the dashboard for a detailed tabular view:

ColumnDescription
MonthCalendar month
MessagesTotal message count
Input TokensTotal input tokens consumed
Output TokensTotal output tokens generated
CostTotal USD spent

Click any month row to expand a model-level breakdown sorted by cost (highest first). Each sub-row shows the model name, message count, token counts, and cost — so you can identify exactly which models drive your spending.

The table footer displays cumulative totals across your entire usage history: total messages, total tokens (input + output), and total cost.

Usage history modal with expanded model breakdown

Every conversation tracks its own cumulative cost. Open the Info panel for any conversation to see:

  • Total cost for that conversation
  • Token counts (input, output, thinking)
  • Number of messages
  • Provider/model usage breakdown

While a response is actively streaming, QARK displays an estimated token count calculated as:

estimated_tokens = content.length / 4

This estimate updates in real time and is replaced by the exact token count from the provider’s response metadata once streaming completes.


StrategyImpact
Use affordable models for draftsRoute first-pass writing, brainstorming, and iteration to cheaper models. Switch to premium models for final output.
Run local models for zero-cost tasksOllama, LM Studio, and other local providers incur no API cost. Use them for repetitive tasks, formatting, and exploration.
Set token_budget context strategyApply a hard token cap per conversation to prevent runaway context growth. Older messages are compacted when the cap is reached.
Monitor the dashboard weeklyCatch unexpected cost spikes early. The month-over-month trending indicator highlights changes before they compound.