CrawlioCrawlio Docs

Troubleshooting

App not running (CLI or MCP can't connect)

The CLI and MCP server connect to Crawlio.app through a 4-tier fallback chain:

  1. Unix Domain Socket at ~/Library/Logs/Crawlio/control.sock
  2. TCP port from ~/Library/Logs/Crawlio/control.port
  3. State file at ~/Library/Logs/Crawlio/state.json (read-only, no control)
  4. Headless engine at port from ~/.crawlio/headless.port

If all four fail, the app is not running.

Fix:

  1. Launch Crawlio.app
  2. Check the socket exists: ls -la ~/Library/Logs/Crawlio/control.sock
  3. Check the port file: cat ~/Library/Logs/Crawlio/control.port
  4. If both files are missing, quit and relaunch Crawlio

Stale port file

If the app crashed or was force-quit, the port file may still exist but point to a dead process.

Symptoms: CLI commands hang or return "Connection refused."

Fix:

# Check if the port is responding
PORT=$(cat ~/Library/Logs/Crawlio/control.port)
curl -s http://localhost:$PORT/health
 
# If no response, delete the stale file and relaunch
rm ~/Library/Logs/Crawlio/control.port
rm ~/Library/Logs/Crawlio/control.sock
# Then open Crawlio.app

Crawl stuck

If a crawl appears frozen (progress counters stopped), check these causes:

Circuit breaker tripped

The engine automatically backs off from hosts returning repeated errors (429, 503, connection failures). After several consecutive failures, the host is temporarily blocked.

Check: Look at the crawl log for "circuit breaker" messages.

curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/logs?level=warning&limit=20

Fix: The circuit breaker resets automatically after a cooldown period. If the target server is healthy, stop and restart the crawl.

Bandwidth throttle

If you set a bandwidth limit, downloads may appear stalled when the limit is reached.

Fix: Increase or remove the bandwidth limit in Settings.

Per-host limits

The default per-host connection limit is 6. If all connections are waiting on a slow host, the crawl looks stuck.

Fix: Increase per-host connections, or add a crawl delay to spread requests over time.

Crawl delay

A high crawl delay (e.g., 5 seconds) with low concurrency makes the crawl very slow by design.

Fix: Reduce crawlDelay in settings. Values below 1.0 are typical.


MCP not connecting

Check installation

# Verify the binary exists
which crawlio-mcp
 
# Test basic startup
crawlio-mcp --help

Check config file

Run crawlio-mcp init to write the correct config for your AI client. The command auto-detects installed clients (Claude Code, Claude Desktop, VS Code, Cursor, Windsurf).

crawlio-mcp init

Verify the config

Check that the config file points to the correct binary path:

Client Config path
Claude Code ~/.claude.json or .mcp.json
Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json
VS Code ~/Library/Application Support/Code/User/mcp.json
Cursor ~/.cursor/mcp.json
Windsurf ~/.codeium/windsurf/mcp_config.json

Test the connection

# Check if Crawlio.app is running and the API is accessible
curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/health

If the health check fails, the MCP server cannot reach the app. Launch Crawlio.app first.


Browser agent not connecting

The Chrome extension communicates with Crawlio through native messaging and a WebSocket bridge.

Bridge file missing

The native messaging manifest must exist at the correct path for your browser.

Fix:

# Install native messaging for all detected browsers
crawlio-mcp init

Or install manually through Settings > Advanced > Browser Extension Integration.

WebSocket port conflict

If another process is using the bridge port, the connection fails silently.

Fix: Check for port conflicts in ~/Library/Logs/Crawlio/crawl.jsonl. Restart the app to rebind.

Extension not installed

Make sure the Crawlio for Chrome extension is installed and enabled in your browser.


Export fails

Disk space

Large sites can produce multi-gigabyte exports, especially WARC format with gzip disabled.

Fix: Check available disk space before exporting. Use WARC with compression enabled for the most compact output.

Permissions

The export destination directory must be writable.

Fix: Choose a destination in your home directory (e.g., ~/Downloads/ or ~/Desktop/). Avoid system directories.

If links are broken in the exported folder, the link localizer did not recognize a URL pattern.

Common causes:

  • JavaScript-generated links. Not in the HTML source, so they cannot be rewritten. Enable WebKit mode.
  • Fragment-only links (e.g., #section). These should work without rewriting.
  • Cross-domain assets on nested pages. Deep pages with CDN references may have incorrect relative paths.

WARC validation

# Check the file is not empty
wc -c export.warc.gz
 
# View the first few WARC records
zcat export.warc.gz | head -50

WARC files conform to ISO 28500 and include SHA-1 digests, a CDX index, and deduplication. Load into ReplayWeb.page for playback verification.


"Pro required" error

Some features require a Pro license. If you see a tier-gating error, you are on the Free or Core tier.

What requires Pro

  • Intelligence endpoints: /tech-stack, /seo-findings, /design-intel, /keyword-intel, /duplicate-content
  • The intelligence tier in the MCP tools
  • Advanced analysis features in the app

How to check your tier

curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/license

How to activate

  1. Purchase a license at crawlio.app/buy/pro
  2. Open Settings > License in the app
  3. Enter your license key
  4. The app validates and activates immediately

Free tier limits

Free tier allows 5 crawls per week. The limit resets weekly. The /start endpoint returns 429 with a resetsAt timestamp when the limit is reached.


Slow crawl

Crawlio defaults to conservative concurrency to avoid overwhelming servers.

Speed up

# Increase parallel connections (default: 10, max: 40)
crawlio settings set settings.maxConcurrent 20
 
# Reduce per-host delay
crawlio settings set settings.crawlDelay 0.0

Slow down (avoid rate limiting)

# Add delay between requests
crawlio settings set settings.crawlDelay 1.0
 
# Reduce concurrency
crawlio settings set settings.maxConcurrent 3

High memory on large sites

For sites with 10,000+ pages:

  • Set max pages or max depth to cap the crawl
  • Disable WebKit mode if not needed (rendering uses more memory)
  • Close other apps to free memory

SSL errors

Certificate validation

Crawlio validates SSL certificates by default. Self-signed or expired certificates cause download failures.

Fix for sites you own: Disable strict certificate validation in Settings > Advanced. Only do this for trusted sites.

Certificate pinning

If you configured certificate pinning in settings, only connections matching the pinned public key are allowed.

Fix: Remove or update the pinned key if the server certificate has changed.

HTTP-to-HTTPS upgrade

Crawlio automatically upgrades HTTP URLs to HTTPS when HSTS headers are present. If a site has broken HTTPS, this can cause failures.

Fix: Disable HSTS enforcement in settings if the site does not support HTTPS properly.


Site downloads but pages are blank

This means the site is a JavaScript-rendered SPA (React, Next.js, Vue, etc.). Crawlio downloads the HTML, but the content is rendered by JavaScript at runtime.

Fix: Enable WebKit mode in Settings. This renders JavaScript before saving the page.

Check the framework detection field in the status response:

curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/status

If it shows react, nextjs, vue, or similar, WebKit mode will help.


App won't open (Gatekeeper)

macOS may block Crawlio because it is distributed outside the App Store.

  1. Open System Settings > Privacy & Security
  2. Scroll down to find "Crawlio was blocked"
  3. Click Open Anyway
  4. You only need to do this once

CLI not found after Homebrew install

If crawlio is not found after brew install crawlio-app/tap/crawlio:

# Check installation
brew list crawlio
 
# Make sure Homebrew bin is in your PATH (Apple Silicon)
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc

Getting help

If your issue is not covered here:


Next steps

© 2026 Crawlio. All rights reserved.