HTTP API

Overview

Crawlio.app runs a local HTTP server while active. All 45 endpoints are localhost-only and require no authentication. Request and response bodies are JSON.

Transport

Primary transport is a Unix Domain Socket. A TCP fallback port is also available.

Property	Value
Protocol	AF_UNIX (Unix Domain Socket)
Path	`~/Library/Logs/Crawlio/control.sock`
Permissions	`0600` (owner only)
Max connections	64 concurrent

The TCP port number is written to ~/Library/Logs/Crawlio/control.port on app launch and deleted on quit.

Connect via UDS

curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/status

Connect via TCP (fallback)

PORT=$(cat ~/Library/Logs/Crawlio/control.port)
curl http://localhost:$PORT/status

Engine control

Method	Path	Description
`GET`	`/status`	Engine state, progress counters, enrichment summary, output path. Pass `?since=N` for change detection (returns 304 if unchanged)
`POST`	`/start`	Start a crawl. Body: `{ "url": "..." }` or `{ "urls": [...] }` with optional `destinationPath`
`POST`	`/stop`	Stop the current crawl. Cancels all in-flight requests
`POST`	`/pause`	Pause the current crawl. In-flight downloads complete, no new requests start
`POST`	`/resume`	Resume a paused crawl
`POST`	`/recrawl`	Re-inject URLs into the frontier. Body: `{ "urls": [...] }`. Returns 200 if crawl is active, 202 if starting new

Engine states

idle, crawling, stopping, localizing, extracting, paused, completed, failed

Start crawl examples

Single URL:

{ "url": "https://example.com" }

Multiple URLs with custom destination:

{
  "urls": ["https://example.com", "https://example.org"],
  "destinationPath": "~/Downloads/Crawlio/multi"
}

Status response

{
  "engineState": "crawling",
  "seedURL": "https://example.com",
  "seq": 42,
  "progress": {
    "totalDiscovered": 150,
    "downloaded": 85,
    "failed": 2,
    "queued": 63,
    "localized": 80
  },
  "enrichment": {
    "pagesEnriched": 12,
    "frameworksDetected": 3,
    "totalNetworkRequests": 245,
    "consoleErrors": 1,
    "consoleWarnings": 5
  },
  "outputPath": "/Users/you/Downloads/Crawlio/example.com"
}

Settings

Method	Path	Description
`GET, PATCH`	`/settings`	Read or update download settings and crawl policy. PATCH uses merge patch (only include fields to change). Returns 409 if engine is active

PATCH example

{
  "settings": { "maxConcurrent": 20 },
  "policy": { "maxDepth": 5 }
}

GET response

{
  "settings": {
    "maxConcurrent": 10,
    "crawlDelay": 0.0,
    "timeout": 60,
    "downloadImages": true,
    "downloadVideo": true,
    "stripTrackingParams": true,
    "maxRetries": 3
  },
  "policy": {
    "scopeMode": "sameDomain",
    "maxDepth": 0,
    "maxPagesPerCrawl": 0,
    "respectRobotsTxt": true
  }
}

Projects

Method	Path	Description
`GET`	`/projects`	List all saved projects
`POST`	`/projects`	Save the current project. Optional body: `{ "name": "..." }`
`GET`	`/projects/{id}`	Get full project details (settings, policy, seed URL, destination)
`DELETE`	`/projects/{id}`	Delete a saved project
`POST`	`/projects/{id}/load`	Load a saved project, restoring settings and state

List response

{
  "projects": [
    {
      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "name": "Stripe Docs",
      "seedURL": "https://docs.stripe.com",
      "createdAt": "2026-02-14T10:30:00Z",
      "destinationPath": "/Users/you/Downloads/Crawlio/docs.stripe.com"
    }
  ]
}

Downloads and logs

Method	Path	Description
`GET`	`/downloads`	Downloaded files with status, size, and timing
`GET`	`/failed-urls`	Failed downloads with error messages
`GET`	`/site-tree`	ASCII directory tree of downloaded files
`GET`	`/crawled-urls`	All URLs with filtering and pagination. Query: `?status=completed&type=html&limit=1000&offset=0`
`GET`	`/logs`	Log entries. Query: `?category=engine&level=info&limit=50`

Crawled URLs response

{
  "total": 150,
  "offset": 0,
  "limit": 1000,
  "urls": [
    {
      "url": "https://example.com/index.html",
      "status": "completed",
      "bytesReceived": 15234,
      "contentType": "text/html",
      "statusCode": 200
    }
  ]
}

Status filter values: completed, downloading, failed, queued

Log categories: engine, download, parser, localizer, network

Log levels: debug, info, notice, warning, error, fault

Export and extraction

Method	Path	Description
`POST`	`/export`	Start an async export. Body: `{ "format": "warc", "destinationPath": "~/archives/site.warc" }`
`GET`	`/export/status`	Export progress. States: `idle`, `exporting`, `completed`, `failed`
`POST`	`/extract`	Start the content extraction pipeline. Body: `{ "destinationPath": "~/extracted" }`
`GET`	`/extract/status`	Extraction progress with phase and completion stats

Export formats

folder, zip, singleHTML, warc, extracted, deploy

Export status response

In progress:

{ "state": "exporting", "format": "warc", "progress": 0.45 }

Completed:

{ "state": "completed", "path": "~/archives/site.warc" }

Extraction status response

{
  "state": "completed",
  "totalPages": 247,
  "totalAssets": 1203,
  "missingSources": 3,
  "frameworkMode": "nextjs"
}

Enrichment

Method	Path	Description
`GET`	`/enrichment`	Get stored enrichment data. Pass `?url=...` to filter by URL
`POST`	`/enrichment/bundle`	Submit a complete enrichment bundle (all data types)
`POST`	`/enrichment/framework`	Submit framework detection data
`POST`	`/enrichment/network`	Submit network request data. Discovered URLs are offered to the crawl engine
`POST`	`/enrichment/console`	Submit console log data
`POST`	`/enrichment/dom`	Submit a DOM snapshot

Bundle request

{
  "url": "https://example.com",
  "framework": { "name": "nextjs", "version": "14.1", "confidence": 0.95 },
  "networkRequests": [
    { "url": "https://cdn.example.com/bundle.js", "method": "GET", "status": 200, "type": "script" }
  ],
  "consoleLogs": [
    { "level": "error", "message": "Failed to load resource", "timestamp": 1708000000 }
  ],
  "domSnapshotJSON": "{...}"
}

Bundle response

{
  "stored": true,
  "url": "https://example.com",
  "fields": ["framework", "networkRequests", "consoleLogs", "domSnapshotJSON"],
  "urlsOffered": 1
}

Capture

Method	Path	Description
`POST`	`/capture`	Trigger a WebKit runtime capture for a URL. Runs framework detection, intercepts network requests, captures console logs, and takes a DOM snapshot

Request

{ "url": "https://example.com" }

Response (202)

{ "status": "capturing", "url": "https://example.com" }

Analysis

Method	Path	Description
`GET`	`/structured-data`	JSON-LD, tables, microdata, and RDFa. Query: `?url=...&type=...`
`GET`	`/observations`	Query the observation log. Query: `?host=...&op=...&source=...&since=T&limit=N`
`GET`	`/observation/{id}`	Look up a single observation by ID (prefix match)
`POST`	`/finding`	Create a curated finding with evidence
`GET`	`/findings`	Query curated findings. Query: `?host=...&limit=N`

Observation sources

extension, webkit, agent

Create finding request

{
  "title": "React hydration errors on /products page",
  "url": "https://example.com/products",
  "evidence": ["obs-id-1", "obs-id-2"],
  "synthesis": "Multiple hydration mismatches suggest SSR/client HTML divergence."
}

Create finding response (201)

{ "status": "created", "id": "finding-uuid-..." }

Intelligence (Pro tier)

Method	Path	Description
`GET`	`/tech-stack`	Detected technologies with name, categories, confidence, version, and signals
`GET`	`/seo-findings`	SEO analysis findings. Query: `?severity=...&category=...`
`GET`	`/design-intel`	Design system data (colors, typography, spacing, breakpoints, components)
`GET`	`/keyword-intel`	Keyword analysis (top keywords, co-occurring groups, density)
`GET`	`/duplicate-content`	Exact duplicates and near-duplicates with similarity scores

These endpoints require a Pro license. They return a tier-gating error on Free and Core tiers.

Vault

Method	Path	Description
`POST`	`/vault/request-login`	Open the auth browser so you can log in. The session is captured and stored. Body: `{ "domain": "example.com", "loginURL": "https://example.com/login" }`

Debug

Method	Path	Description
`GET`	`/debug/metrics`	Runtime metrics
`GET, POST`	`/debug/log-level`	Read or set the current log level
`POST`	`/debug/dump-state`	Dump the full engine state for debugging

System

Method	Path	Description
`GET`	`/health`	Server health check. Returns `{ "status": "ok", "version": "..." }`
`GET`	`/license`	License status and tier
`POST`	`/open-folder`	Open the download output directory in Finder

Error handling

All endpoints return JSON. On error:

{ "error": "Description of the error" }

HTTP status codes

Code	Meaning
200	Success
201	Created (new finding, new project)
202	Accepted (async operation started)
304	Not Modified (no changes since `?since=N`)
400	Bad request (missing fields, invalid parameters)
404	Not found (project, enrichment data)
409	Conflict (settings change while crawling)
429	Rate limited (free tier weekly crawl limit)
500	Internal server error

Free tier rate limit (429)

{
  "error": "free_tier_limit",
  "message": "Free tier limit reached (5/5 crawls this week).",
  "resetsAt": "2026-02-15T00:00:00Z"
}

Next steps

See Architecture for how the crawl pipeline works
Use MCP Tools for AI-native access to these endpoints
Check Troubleshooting if you have connection issues

PreviousOrchestration NextArchitecture