Troubleshooting

Turn debugger: per-conversation forensics

When a buyer reports "the bot asked for an email instead of opening a ticket" or "it just stopped", reproducing the moment is usually impossible — the visitor is gone, the context was unique. The turn debugger removes the need to reproduce: every widget turn silently records what the agent actually did, and the platform admin can replay the evidence per conversation.

Where

Super-admin → /admin/conversations → open a conversation. Under every assistant reply sits a collapsible trace card (error traces start expanded). Turns that never persisted a message — provider failures, human-takeover short-circuits — appear as standalone cards at the end of the thread, because failures are exactly when forensics matter.

What a trace shows

Section	Answers
Route	Fast-router decision: `knowledge (no_signal)` or `tool_loop (keyword · score 0.91 → open_ticket)`. "Why didn't it run the tool?" starts here.
History	How many prior messages were in the LLM context — stale or unexpected history is the classic "bot continued the wrong thread" cause.
Retrieval	Cache hit/miss, rerank skipped, low-confidence flag, and each chunk's URL + scores + snippet. "0 chunks passed the threshold" explains an "I'm not sure" answer instantly.
Tool loop	Hop by hop: which tools the model requested, the exact arguments, the (truncated) result or error, duplicate calls that were dropped — or "skipped (fast router)" / "stopped without calling anything".
Outcome	Latency, token estimate, blocks emitted, CTAs, lead-form offer, model — or the exact exception class + message on a failed turn.

Trace kinds

llm (normal turn), error (provider/stream failure — starts expanded), curated (pinned answer short-circuit), workflow (scripted flow), human_shortcut (explicit "talk to a human" phrase), human_pending / takeover (operator owns the conversation; the bot stayed silent by design).

Cost & retention

Traces are assembled in memory during the turn and persisted by a queued job after the SSE stream closes — the visitor never waits on them. If the database write fails (classic case: code deployed, php artisan migrate run later), the trace is stashed in a capped cache buffer and automatically drained into the table — original timestamps intact — as soon as the next write succeeds. Traces survive the outage instead of vanishing into a log line. Chunk snippets and tool results are truncated hard; traces are breadcrumbs, not a copy of the knowledge base. The daily turn-traces:prune scheduler deletes traces older than the retention window.

TURN_TRACES_ENABLED=true        # kill switch (default on)
TURN_TRACE_RETENTION_DAYS=14    # prune window

php artisan turn-traces:prune            # manual sweep
php artisan turn-traces:prune --days=7   # override window once