Skip to main content

Overview

WriterzRoom’s researcher agent runs every source batch through a deterministic quality scorer before passing results to the writer. When source quality falls below threshold, an adaptive retry loop β€” Agent-Supervised Tool Adaptation (ASTA) β€” reformulates the search query and merges the best results from both attempts. This pipeline has two goals: prevent low-relevance sources from reaching the writer, and eliminate fabricated or misattributed citations from the final output.

Source Quality Scoring

Every source returned by Tavily or the pgvector corpus is scored on four axes before it enters the generation pipeline.
DimensionWeightWhat It Measures
Relevance45%TF-IDF weighted term overlap between query and source snippet
Authority20%Domain tier β€” .gov, .edu, and peer-reviewed domains score highest
Recency15%Exponential decay from publication date with a configurable half-life
Density20%Snippet length and information-to-filler ratio
The composite score is the weighted sum of all four dimensions. A batch’s quality is measured as the mean composite of its top 3 sources. Thresholds by tier:
TierMin Composite to Accept
Standard0.35
Premium0.35 (higher bar applied via Voyage reranking)

ASTA β€” Adaptive Query Reformulation

When the top-3 mean composite falls below threshold, ASTA fires a single reformulation call using Claude Haiku before accepting the results. Retry flow:
Attempt 1 β†’ score batch β†’ composite β‰₯ threshold β†’ accept
                        β†’ composite < threshold  β†’ reformulate β†’ Attempt 2
                                                              β†’ merge (RRF)
                                                              β†’ re-score
                                                              β†’ accept best-of
The reformulator generates two alternative queries:
  • Broader β€” removes specificity that may have narrowed results too much
  • Reframed β€” approaches the same topic from a different angle
Both queries run in parallel. Results are fused with the original attempt via Reciprocal Rank Fusion, meaning the retry can only improve or maintain quality β€” it never replaces good results with worse ones. ASTA uses claude-haiku-4-5 for reformulation. The call is capped at 200 tokens and runs in under 2 seconds.

Citation Audit

After the writer generates content, the editor runs a deterministic citation audit before any other validation pass. Three checks run in sequence:
  1. Bounds check β€” Any [N] marker where N exceeds the verified source count is stripped. If the researcher returned 3 sources, [4] and above are fabrication and are removed.
  2. Misattribution check β€” For each surviving [N], the editor checks whether the 300-character window around the inline citation shares content terms with the source’s snippet. Zero overlap flags the marker as misattributed and strips it.
  3. Orphan pruning β€” After stripping, any reference list entry whose index number no longer appears inline is removed from the references section.
No LLM is involved in the citation audit. It is fully deterministic regex-based enforcement.

Stat Validation

Statistics and expert quotes extracted from research results are cross-validated against source snippets before the writer sees them. A claim must pass a three-component match to be included:
  • Number match (35%) β€” normalized numeric overlap between claim and source
  • Context match (25%) β€” content-term overlap between claim context and source text
  • Proximity match (40%) β€” content-term overlap in an 80-character window around the matched number in the source
Claims that fail the 0.50 composite threshold are dropped at the retrieval boundary and never reach the writer prompt.

End-to-End Quality Chain

Tavily + pgvector
      ↓
  RRF fusion
      ↓
Source quality scoring
      ↓  (below threshold)
ASTA reformulation + retry
      ↓
Voyage reranking
      ↓
Stat validation (drops unverified claims)
      ↓
Writer (citation contract in Zone A prompt)
      ↓
Editor citation audit (bounds + misattribution + orphan pruning)
      ↓
Final content
Each layer is independent and additive β€” a failure at any stage does not silently degrade output, it either triggers a retry or strips the problematic content before publication.
Last modified on April 4, 2026