CrawlioCrawlio Docs

Evidence Mode

The problem

Method mode gives your AI the right methods. Your AI still produces unverifiable output.

Your AI can call analyze_page, receive structured data, and then claim "site A has better performance than site B" without citing what it measured, without noting that security data failed to load, and without adjusting its confidence to reflect that gap.

Three failure modes recur:

  1. Untyped output. Findings are free-text prose. Nothing enforces required fields.
  2. Invisible gaps. When a data source times out, nothing records that absence.
  3. Uncalibrated confidence. Your AI claims "high confidence" regardless of whether supporting evidence loaded.

Evidence mode addresses all three.

What evidence mode adds

Evidence mode is what you get when method mode's composite tools return typed evidence records instead of ad-hoc objects.

Layer What it adds Variance
Code Mode Search + execute Medium
Method Mode Composite tools, normalized output Low
Evidence Mode Typed records + enforcement Minimal

Each layer constrains more. Evidence mode constrains the most.

Four core concepts

Concept What it does
Evidence Record Typed return value from extraction methods. Structured data with explicit null fields.
Gap What data is missing, why it is missing, and whether it affects confidence.
Finding Validated research claim with required fields. Enforced at the tool level.
Quality Computed from gaps. Tells callers how much to trust the evidence.

Evidence on the Crawlio App side

Evidence records

Each analyze_page call creates an observation log entry and returns an evidenceId. The record includes:

  • url, timestamp, captureTriggered
  • enrichment (framework, network, console, DOM)
  • enrichmentStatus (ok or timeout)
  • crawlStatus (download state for the URL)

Evidence gaps

Gaps are tied to HTTP outcomes:

Gap Triggered by
captureRejected Capture endpoint returned non-202
captureUnreachable Capture transport failure
enrichmentTimeout Polling exhausted without data
enrichmentError Server error during polling
crawlStatusMissing No crawl data for the URL

Evidence quality

Quality is computed from gaps:

Quality Condition
complete No gaps
partial Gaps present but capture succeeded
degraded Capture-level failure
unavailable Cannot produce usable evidence

Comparison readiness

compare_pages produces a readiness signal from both sides:

Readiness Condition
ready Both sides complete
cautious One side partial
unreliable Either side degraded

The comparison also includes symmetric (whether both sides have the same gap profile), degradationNotes, timingDelta, and paired evidenceId values for verification via get_observation.

Creating findings

Use create_finding to record a research claim with evidence:

create_finding(
  title: "React hydration errors on /products",
  url: "https://example.com/products",
  evidence: ["obs-1", "obs-2"],
  synthesis: "SSR/client HTML mismatch causing hydration failures.",
  confidence: "high",
  category: "framework"
)

Findings are persisted in the observation log. Retrieve them with get_findings.

Evidence on the browser side

The browser agent implements the same concepts with substrate-appropriate mechanisms.

Typed findings

smart.finding() validates every research claim synchronously. Required fields: claim, evidence (array of strings), sourceUrl, confidence, method, dimension. Malformed input throws immediately.

Coverage gaps

When extractPage() runs, it fires seven parallel operations. If supplementary calls fail, the field returns as null and a gap is recorded:

{
  dimension: "performance",
  reason: "CDP domain disabled",
  impact: "method-failed",
  reducesConfidence: true
}

Not all gaps reduce confidence:

Supplementary call Dimension Reduces confidence
Performance metrics performance Yes
Security state security Yes
Font detection fonts No
Accessibility tree accessibility No
Mobile readiness mobile-readiness No

Confidence propagation

When a finding's dimension matches an active gap with reducesConfidence: true, the runtime caps confidence one level down:

Input Active gap? Output
high Yes medium (capped, with cappedBy field)
medium Yes low (capped)
low Yes low (floor)
any No unchanged

This is automatic. Your AI does not choose whether to cap. The runtime enforces it based on what data loaded.

Session aggregation

smart.findings() returns all findings in the session. smart.clearFindings() resets both findings and gaps. Gaps are append-only until cleared.

Comparison scaffolds

comparePages() returns a comparison scaffold with 11 fixed dimensions:

Dimension Data source
framework Detected framework
performance Performance metrics + gaps
security Security state
seo Title, canonical URL, structured data
accessibility Accessibility tree summary
error-surface Console errors
third-party-load Network requests
architecture Framework analysis
content-delivery Protocol and TLS info
mobile-readiness Viewport meta, media queries
data-structure Structured data presence

A dimension is comparable: true only when both sides are present.

Key differences between substrates

Aspect Crawlio App Browser Agent
Gap model Per-HTTP-outcome Per-CDP-call
Quality signal Explicit enum Implicit (gaps.length === 0)
Confidence propagation Not yet automatic Automatic via reducesConfidence
Finding enforcement create_finding HTTP POST smart.finding() synchronous validation
Persistence Observation log (durable) Session memory (ephemeral)
Evidence chain evidenceId via get_observation findings() returns copy

Known limits

  1. Sequential comparison. Both substrates compare sites sequentially. Site B timing is affected by site A processing.
  2. Append-only gaps. Session gaps persist until explicitly cleared. Old gaps affect later findings.
  3. One-level confidence cap. A gap drops high to medium, not high to low. Two gaps on different dimensions cap independently but cannot compound on the same finding.
  4. Crawlio App confidence propagation pending. The app accepts confidence on create_finding but does not auto-adjust based on evidence quality yet.
  5. Accessibility depth limited. The accessibility tree is capped at depth 3.
  6. Mobile readiness is read-only. Viewport analysis only. No viewport resizing or touch target testing.

Next steps

  • Method Mode: the composite tools that evidence mode builds on
  • Code Mode: the search-and-execute pattern underneath
  • Tool Reference: all 49 full-mode tools with parameters
  • JIT Context: how the aggregator handles context loading
© 2026 Crawlio. All rights reserved.