CrawlioCrawlio Docs

File Locations

Overview

Crawlio writes several files to disk that MCP tools, the CLI, and external scripts can read. All files use standard macOS locations.


State and control files

File Path Purpose
Control socket ~/Library/Logs/Crawlio/control.sock Unix domain socket for the HTTP API (primary transport)
Control port ~/Library/Logs/Crawlio/control.port TCP port number (fallback transport)
Crawl state ~/Library/Logs/Crawlio/state.json Live crawl snapshot, updated every 500ms
Crawl log ~/Library/Logs/Crawlio/crawl.jsonl Streaming structured event log
Headless port ~/.crawlio/headless.port Headless engine HTTP port number

control.sock

Unix Domain Socket for the HTTP API. This is the primary transport used by the CLI and MCP server. Permissions are set to 0600 (owner only).

curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/status

control.port

Contains a single integer: the HTTP port number. Written on app launch, deleted on quit.

PORT=$(cat ~/Library/Logs/Crawlio/control.port)
curl http://localhost:$PORT/status

Finding the port programmatically: Read ~/Library/Logs/Crawlio/control.port. If the file does not exist, the app is not running. If the file exists but the port is not responding, the app quit without cleanup. Delete the stale file and relaunch.

state.json

Full snapshot of the current crawl. Updated every 500ms while crawling. Cleared when the app quits.

{
  "engineState": "crawling",
  "url": "https://example.com",
  "progress": {
    "discovered": 150,
    "downloaded": 85,
    "failed": 2,
    "queued": 63,
    "localized": 80
  },
  "items": [
    {
      "url": "https://example.com/index.html",
      "status": "completed",
      "size": 15234,
      "contentType": "text/html",
      "localPath": "/Users/you/Downloads/Crawlio/example.com/index.html",
      "startTime": "2026-02-14T10:30:00Z",
      "endTime": "2026-02-14T10:30:01Z"
    }
  ]
}
Field Type Description
engineState string idle, crawling, paused, completed, failed
url string The URL being crawled
progress object Aggregate counters
items array Every individual download item

crawl.jsonl

One JSON object per line. Written continuously during a crawl.

{"timestamp":"2026-02-14T10:30:00.123Z","category":"engine","level":"info","message":"Crawl started for https://example.com"}
{"timestamp":"2026-02-14T10:30:01.456Z","category":"download","level":"info","message":"Downloaded https://example.com/index.html (15.2 KB)"}

Categories: engine, download, parser, localizer, network

Levels: debug, info, notice, warning, error, fault

Log rotation

Log files rotate at 10 MB. Up to 3 rotated files are kept. Files older than 7 days are deleted automatically.


Download locations

Default

Downloaded files are saved to:

~/Downloads/Crawlio/{domain}/

For example, downloading https://example.com creates:

~/Downloads/Crawlio/example.com/
  index.html
  about/
    index.html
  css/
    style.css
  images/
    logo.png

Custom destination

Set a custom destination per-crawl through:

  • The project settings panel in the app
  • The --dest flag in the CLI
  • The destinationPath field in POST /start (HTTP API)
  • The destination parameter in start_crawl (MCP)

Application support

File Path Purpose
Preferences ~/Library/Preferences/com.crawlio.app.plist User preferences
Projects ~/Library/Application Support/Crawlio/projects/ Saved crawl configurations
Enrichment ~/Library/Application Support/Crawlio/enrichment/ Runtime capture data (JSON)
Checkpoints ~/Library/Application Support/Crawlio/checkpoints/ Crawl resume data
Export presets ~/Library/Application Support/Crawlio/export-presets.json Saved CSV export presets
License ~/Library/Application Support/Crawlio/license.json Local license key storage
OCR cache ~/Library/Caches/Crawlio/ocr/ OCR result cache

Checkpoints

Crawl resume data with 3-checkpoint rotation and atomic writes. Crawls survive app crashes and system restarts. Resume a crawl by relaunching the app.

Enrichment

Runtime capture data (framework detection, network requests, console logs, DOM snapshots) persisted as JSON. Scoped per-project.


Browser extension bridge

File Path Purpose
Bridge directory ~/.crawlio/bridge/ Chrome extension bridge files
Native messaging host ~/Library/Application Support/{browser}/NativeMessagingHosts/com.crawlio.agent.json Native messaging manifest
Wrapper script ~/Library/Application Support/Crawlio/native-messaging-host.sh Native messaging wrapper

{browser} is one of: Google/Chrome, Chromium, BraveSoftware/Brave-Browser, Microsoft Edge.


MCP and CLI files

MCP server binary

File Path Purpose
npm binary cache ~/.crawlio/bin/CrawlioMCP Auto-downloaded from GitHub Releases
MCP config ~/.crawlio-mcp.json MCP server configuration

launchd service (portal mode)

File Path Purpose
Service plist ~/Library/LaunchAgents/com.crawlio.mcp.plist Portal mode auto-start
stdout log ~/Library/Logs/Crawlio/mcp-server.log Portal stdout
stderr log ~/Library/Logs/Crawlio/mcp-server.err Portal stderr

MCP client config files

The crawlio-mcp init command writes to detected client config files:

Path Client
~/.mcp.json Global fallback
.mcp.json Per-project
~/.claude.json Claude Code
~/Library/Application Support/Claude/claude_desktop_config.json Claude Desktop
~/Library/Application Support/Code/User/mcp.json VS Code
~/.cursor/mcp.json Cursor
~/.codeium/windsurf/mcp_config.json Windsurf

Skills

Path Skill
~/.claude/skills/crawlio-mcp/SKILL.md Full tool reference
~/.claude/skills/crawl-site/SKILL.md Crawl workflow
~/.claude/skills/audit-site/SKILL.md Site audit
~/.claude/skills/observe/SKILL.md Observation querying
~/.claude/skills/finding/SKILL.md Evidence findings

Homebrew binaries

Path Description
/opt/homebrew/bin/crawlio CLI binary (Apple Silicon)
/opt/homebrew/bin/crawlio-mcp MCP binary (Apple Silicon)
/usr/local/bin/crawlio CLI binary (Intel)
/usr/local/bin/crawlio-mcp MCP binary (Intel)

Vault

File Path Purpose
Audit log ~/.crawlio/vault-audit.jsonl Per-domain session access audit trail

Vault sessions are stored in the macOS Keychain, not on disk.


Next steps

© 2026 Crawlio. All rights reserved.