File Locations
Overview
Crawlio writes several files to disk that MCP tools, the CLI, and external scripts can read. All files use standard macOS locations.
State and control files
| File | Path | Purpose |
|---|---|---|
| Control socket | ~/Library/Logs/Crawlio/control.sock |
Unix domain socket for the HTTP API (primary transport) |
| Control port | ~/Library/Logs/Crawlio/control.port |
TCP port number (fallback transport) |
| Crawl state | ~/Library/Logs/Crawlio/state.json |
Live crawl snapshot, updated every 500ms |
| Crawl log | ~/Library/Logs/Crawlio/crawl.jsonl |
Streaming structured event log |
| Headless port | ~/.crawlio/headless.port |
Headless engine HTTP port number |
control.sock
Unix Domain Socket for the HTTP API. This is the primary transport used by the CLI and MCP server. Permissions are set to 0600 (owner only).
curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/statuscontrol.port
Contains a single integer: the HTTP port number. Written on app launch, deleted on quit.
PORT=$(cat ~/Library/Logs/Crawlio/control.port)
curl http://localhost:$PORT/statusFinding the port programmatically: Read ~/Library/Logs/Crawlio/control.port. If the file does not exist, the app is not running. If the file exists but the port is not responding, the app quit without cleanup. Delete the stale file and relaunch.
state.json
Full snapshot of the current crawl. Updated every 500ms while crawling. Cleared when the app quits.
{
"engineState": "crawling",
"url": "https://example.com",
"progress": {
"discovered": 150,
"downloaded": 85,
"failed": 2,
"queued": 63,
"localized": 80
},
"items": [
{
"url": "https://example.com/index.html",
"status": "completed",
"size": 15234,
"contentType": "text/html",
"localPath": "/Users/you/Downloads/Crawlio/example.com/index.html",
"startTime": "2026-02-14T10:30:00Z",
"endTime": "2026-02-14T10:30:01Z"
}
]
}| Field | Type | Description |
|---|---|---|
engineState |
string | idle, crawling, paused, completed, failed |
url |
string | The URL being crawled |
progress |
object | Aggregate counters |
items |
array | Every individual download item |
crawl.jsonl
One JSON object per line. Written continuously during a crawl.
{"timestamp":"2026-02-14T10:30:00.123Z","category":"engine","level":"info","message":"Crawl started for https://example.com"}
{"timestamp":"2026-02-14T10:30:01.456Z","category":"download","level":"info","message":"Downloaded https://example.com/index.html (15.2 KB)"}Categories: engine, download, parser, localizer, network
Levels: debug, info, notice, warning, error, fault
Log rotation
Log files rotate at 10 MB. Up to 3 rotated files are kept. Files older than 7 days are deleted automatically.
Download locations
Default
Downloaded files are saved to:
~/Downloads/Crawlio/{domain}/For example, downloading https://example.com creates:
~/Downloads/Crawlio/example.com/
index.html
about/
index.html
css/
style.css
images/
logo.pngCustom destination
Set a custom destination per-crawl through:
- The project settings panel in the app
- The
--destflag in the CLI - The
destinationPathfield inPOST /start(HTTP API) - The
destinationparameter instart_crawl(MCP)
Application support
| File | Path | Purpose |
|---|---|---|
| Preferences | ~/Library/Preferences/com.crawlio.app.plist |
User preferences |
| Projects | ~/Library/Application Support/Crawlio/projects/ |
Saved crawl configurations |
| Enrichment | ~/Library/Application Support/Crawlio/enrichment/ |
Runtime capture data (JSON) |
| Checkpoints | ~/Library/Application Support/Crawlio/checkpoints/ |
Crawl resume data |
| Export presets | ~/Library/Application Support/Crawlio/export-presets.json |
Saved CSV export presets |
| License | ~/Library/Application Support/Crawlio/license.json |
Local license key storage |
| OCR cache | ~/Library/Caches/Crawlio/ocr/ |
OCR result cache |
Checkpoints
Crawl resume data with 3-checkpoint rotation and atomic writes. Crawls survive app crashes and system restarts. Resume a crawl by relaunching the app.
Enrichment
Runtime capture data (framework detection, network requests, console logs, DOM snapshots) persisted as JSON. Scoped per-project.
Browser extension bridge
| File | Path | Purpose |
|---|---|---|
| Bridge directory | ~/.crawlio/bridge/ |
Chrome extension bridge files |
| Native messaging host | ~/Library/Application Support/{browser}/NativeMessagingHosts/com.crawlio.agent.json |
Native messaging manifest |
| Wrapper script | ~/Library/Application Support/Crawlio/native-messaging-host.sh |
Native messaging wrapper |
{browser} is one of: Google/Chrome, Chromium, BraveSoftware/Brave-Browser, Microsoft Edge.
MCP and CLI files
MCP server binary
| File | Path | Purpose |
|---|---|---|
| npm binary cache | ~/.crawlio/bin/CrawlioMCP |
Auto-downloaded from GitHub Releases |
| MCP config | ~/.crawlio-mcp.json |
MCP server configuration |
launchd service (portal mode)
| File | Path | Purpose |
|---|---|---|
| Service plist | ~/Library/LaunchAgents/com.crawlio.mcp.plist |
Portal mode auto-start |
| stdout log | ~/Library/Logs/Crawlio/mcp-server.log |
Portal stdout |
| stderr log | ~/Library/Logs/Crawlio/mcp-server.err |
Portal stderr |
MCP client config files
The crawlio-mcp init command writes to detected client config files:
| Path | Client |
|---|---|
~/.mcp.json |
Global fallback |
.mcp.json |
Per-project |
~/.claude.json |
Claude Code |
~/Library/Application Support/Claude/claude_desktop_config.json |
Claude Desktop |
~/Library/Application Support/Code/User/mcp.json |
VS Code |
~/.cursor/mcp.json |
Cursor |
~/.codeium/windsurf/mcp_config.json |
Windsurf |
Skills
| Path | Skill |
|---|---|
~/.claude/skills/crawlio-mcp/SKILL.md |
Full tool reference |
~/.claude/skills/crawl-site/SKILL.md |
Crawl workflow |
~/.claude/skills/audit-site/SKILL.md |
Site audit |
~/.claude/skills/observe/SKILL.md |
Observation querying |
~/.claude/skills/finding/SKILL.md |
Evidence findings |
Homebrew binaries
| Path | Description |
|---|---|
/opt/homebrew/bin/crawlio |
CLI binary (Apple Silicon) |
/opt/homebrew/bin/crawlio-mcp |
MCP binary (Apple Silicon) |
/usr/local/bin/crawlio |
CLI binary (Intel) |
/usr/local/bin/crawlio-mcp |
MCP binary (Intel) |
Vault
| File | Path | Purpose |
|---|---|---|
| Audit log | ~/.crawlio/vault-audit.jsonl |
Per-domain session access audit trail |
Vault sessions are stored in the macOS Keychain, not on disk.
Next steps
- See the HTTP API for using the control socket and port
- Check Troubleshooting for stale port files and connection issues
- See MCP Tools for the tool reference