CrawlioCrawlio Docs

Changelog

Changelog

All notable changes to Crawlio are documented here.

v1.4.1: April 2026

New Features

  • Preflight Phase 4: robots.txt gate. New Stage 6 check blocks disallowed crawl paths before they enter the frontier. Resolution uses the target host's robots.txt with fallback caching.
  • Preflight Phase 4: resume-existing-crawl gate. Stage 0 detects an on-disk crawl state and prompts to resume with a settings hash check. Hash mismatch aborts the resume attempt.
  • CLI flags. --assume-yes / -y auto-accepts defaults on confirm prompts; --follow-redirect silently follows cross-registrable redirects without prompting. Mutex validation ensures --no-preflight cannot combine with --assume-yes or --follow-redirect.
  • Interceptor email intelligence. FlowBuffer.extractEmails() captures email addresses from response bodies using an RFC-5322-lite regex, dedupes per flow, and filters asset-path noise. New interceptor_emails MCP tool exposes extraction with optional host filtering and result limits.
  • Interceptor TLS stealth profiles. Three ClientHello spoofing profiles available on the interceptor proxy — chrome-131 (free tier; default), safari-17 (Pro tier), firefox-120 (Pro tier). New --stealth-profile CLI flag and four MCP tools: interceptor_list_profiles, interceptor_set_profile, interceptor_ja3, interceptor_emails.
  • TLS spoofing skill. New aggregator skill tls_spoofing.md documents profiles, tier gating, and verification workflow.

Improvements

  • macOS 13 (Ventura) and macOS 14 (Sonoma) runtime compatibility restored. MACOSX_DEPLOYMENT_TARGET=13.0 is now pinned in release.sh so Swift 6.2+ on the macOS 26 SDK no longer silently raises the floor to 15.0. Universal binary (arm64 + x86_64) verified via vtool -show-build | grep minos.
  • Test seams. DNSResolving and PreflightHTTPClientProtocol protocols allow actor-based mocking. 12 new tests cover all preflight stages.

Fixes

  • Preflight pipeline regression coverage. 40/40 regression tests pass alongside the 12 new Phase 4 tests.

2026-04-11

Documentation rewrite. All 35 pages updated with accurate numbers, expanded CLI reference (22 commands), full MCP tool catalog (49 full-mode + 6 code-mode), and updated platform requirements (macOS 13+).

v1.4.0: April 2026

New Features

  • MCP architecture redesign. Tiered entitlement gating via JWT — free users get Swift CrawlioMCP tools, Core unlocks headless browser, Pro unlocks interceptor and intelligence stack.
  • App Store distribution. Free Mac App Store listing as distribution funnel. Identical GUI to direct DMG, license key validates then redirects to crawlio.app for Pro activation.
  • Intelligence client. Ship distribution routes perception, judgment, and cortex through api.crawlio.app REST endpoints instead of local engines.

Improvements

  • macOS 14 Sonoma support. Lowered deployment target from macOS 15 to macOS 14, expanding compatibility to Sonoma users.
  • Sandbox compliance. Added #if !APPSTORE guards for Process(), native messaging, and video download operations.
  • App Store build pipeline. Sparkle stripping, StoreKit removal, encryption flag declaration, TEAM_ID resolution.

Browser Agent v1.6.4: March 2026

New Features

  • Auto-connect on wake. Extension probes for MCP server on tab switch and page load. No popup "Connect" click required after install.
  • OCR in code mode. ocrScreenshot() now available in the execute sandbox — macOS Vision.framework text extraction from any page.
  • 9 full-stack skills. All skills rewritten to teach Code Mode + Method Mode + Evidence Mode. Every skill produces validated findings with confidence propagation via smart.finding().

Improvements

  • Platform thesis alignment. Extension stripped of passive detection (heartbeat badge, SERP auto-detection, auto-overlays). Extension is now a dumb CDP bridge per the platform thesis.
  • CDP recovery hardened. 3-attempt exponential backoff (1s, 3s, 5s) on domain re-enable after navigation. Heavy SPAs no longer cause "recovery failed".
  • DOM timeout bumped. DOM.getDocument from 5s to 10s for accessibility snapshots on React/Next.js sites.
  • Execute tool description. Anti-patterns embedded in tool description: no smart.screenshot(), no sleep() loops, no location.href navigation.

Browser Agent v1.6.1: March 2026

New Features

  • 114 tools (was 100). Added tracking pixel parsing, technographic fingerprinting, SEO auditing, and dataLayer inspection.
  • Detection wedge. Parse Facebook/GA4/TikTok/LinkedIn/Pinterest pixels, validate against vendor schemas, inspect runtime tracker state, detect duplicate fires.
  • SEO intelligence. 11-dimension on-page audit, raw-vs-rendered comparison, CrUX field metrics, SERP overlay, robots.txt and sitemap validation.
  • Technology detection. Wappalyzer-style fingerprinting with confidence scoring and implies/excludes relationship graph.
  • 147 searchable commands in code mode (was 133), 17 smart.* higher-order methods (was 8).

Fixes

  • parseInt masking zero values in fingerprint confidence scoring.
  • User-controlled parameter values truncated in validation error messages.
  • Levenshtein distance capped at 256 chars to prevent DoS.
  • Tracking heuristic regex corrected.
  • tabSerpState migration added to tabs.onReplaced handler.

v1.3.4: March 2026

Improvements

  • macOS 14 Sonoma support. Lowered deployment target from macOS 15 to macOS 14, enabling Intel Mac users on Sonoma to run Crawlio.
  • Intel Mac compatibility. Universal binary already shipped arm64 + x86_64; this release removes the macOS 15 floor that blocked older systems.

v1.3.3: March 2026

Improvements

  • Always-visible sidebar toggle. Sidebar toggle button now appears in all states (empty, single project, multi project) with a "New Project" button at the bottom of the sidebar.
  • Toolbar titlebar layout fix. Fixed NavigationSplitView title positioning in single-project mode — no more mispositioned or flickering title text.
  • Cleaner URL input area. Removed the inline "+" button from the URL input bar; new projects are created from the sidebar instead.

v1.3.2: March 2026

Fixes

  • Sparkle update permissions. When macOS App Management blocks the update, the app now shows a guided dialog directing users to System Settings → Privacy & Security → App Management, with a button to open the settings pane directly.

Browser Agent v1.6.0: March 2026

Fixes

  • CWS extension bridge connection. Added Chrome Private Network Access (PNA) headers to WebSocket bridge HTTP responses, enabling CWS-installed extensions to connect to localhost MCP server.
  • Persistent reconnect alarm. Reconnect alarm now created before probing, ensuring the extension always retries server discovery even when initial probes fail.

v1.3.0: March 2026

Fixes

  • License activation pipeline. Fixed build-time token injection — activation now works in all distributed builds.
  • Credential cache key scoping. Auth credentials are now keyed by scheme://host:port, preventing cross-scheme and cross-port credential leakage.
  • Keychain empty host guard. Prevents accidentally operating on all keychain items when URL has no host.
  • EnrichmentStore persistence. Added deinit flush to prevent data loss on rapid deallocation.
  • Script removal performance. Combined 27 sequential pattern checks into a single O(n) regex alternation.

Improvements

  • 180 of 189 codebase review items fixed across security, performance, and correctness
  • CookieJar converted from actor to struct (wraps thread-safe HTTPCookieStorage)
  • Proxy host redacted in log messages for privacy
  • PublicKeyPinningError now Sendable-conformant
  • 15 new regression tests covering all Phase 11 fixes
  • 491 tests passing across 28 suites

v2.6.0: March 2026

New Features

  • WARC Web Archives. ISO 28500 with SHA1 digests, CDX sidecar indexing, payload deduplication, per-record gzip compression, and configurable file splitting.
  • Proxy Support. HTTP, HTTPS, and SOCKS5 proxy tunneling with environment variable fallback and no_proxy bypass.
  • Certificate Pinning. SHA-256 public key pinning with hard fail on mismatch.
  • HSTS Database. Strict-Transport-Security with time-based expiry, includeSubDomains, and HTTP→HTTPS auto-upgrade.

Improvements

  • MCP server: WARC configuration passthrough in export_site, proxy/pinning in update_settings
  • 447 tests passing across 22 suites
  • E2E black-box test suite validates full pipeline

Fixes

  • Fixed proxy/pinning settings being wiped when starting crawl via MCP API
  • Fixed SIGSEGV crash in window configuration (AttributeGraph KVO race)

v2.5.0: March 2026

New Features

  • Vision OCR Pipeline. Extract text from images during crawls using Apple Vision framework. Opt-in via Settings > OCR.
  • Code Mode for MCP. 3 meta-tools replace 34 individual tools, saving 92% of AI token usage.
  • Browser Agent JIT Context. Runtime context injection for browser automation tools.
  • CLI Autonomous Loop. AI-powered crawl agent that plans, executes, and adapts crawls automatically.

Improvements

  • Organic radial gradient thumbnails on docs home page
  • Framework detection expanded to 25+ frameworks
  • IDNA punycode canonicalization for international domain names
  • Host health tracking with circuit breaker pattern
  • Cross-domain CDN asset downloading in localizer

Fixes

  • Fixed thread safety crash in DateFormatter (SIGSEGV in autorelease pool)
  • Fixed OSAllocatedUnfairLock SIGBUS with closure tuples
  • Fixed WebKit WKWebView teardown race condition
  • Improved robots.txt parsing for edge cases

v2.4.0: February 2026

New Features

  • AI Enrichment Pipeline. Automatic metadata extraction, summary generation, and content classification.
  • Enrichment Inspector. New sidebar panel showing per-page AI enrichment data.
  • Deploy Bundle Export. deploy.json with enrichment data for static site generators.

Improvements

  • RE-guided hardening: 22 phases of engine improvements validated against SiteSucker and HTTrack binaries
  • Request classifier with Chrome-validated tracker/analytics detection
  • SiteSucker-style 7-gate URL filtering

v2.3.0: January 2026

New Features

  • MCP Server. 36 tools for AI agent integration via Model Context Protocol.
  • Browser Agent. Chrome extension + MCP server for 96 CDP browser commands.
  • Crawlio CLI. 13 commands with interactive shell for terminal-based crawling.

Improvements

  • Export formats expanded to 7: folder mirror, ZIP, single HTML, WARC, PDF, text, deploy bundle
  • Framework detection for React, Next.js, Vue, Svelte, Angular, and 20+ more
  • Link localizer with root-absolute path resolution

For the complete commit history, see the GitHub releases.

© 2026 Crawlio. All rights reserved.