HTTP API
Overview
Crawlio.app runs a local HTTP server while active. All 45 endpoints are localhost-only and require no authentication. Request and response bodies are JSON.
Transport
Primary transport is a Unix Domain Socket. A TCP fallback port is also available.
| Property | Value |
|---|---|
| Protocol | AF_UNIX (Unix Domain Socket) |
| Path | ~/Library/Logs/Crawlio/control.sock |
| Permissions | 0600 (owner only) |
| Max connections | 64 concurrent |
The TCP port number is written to ~/Library/Logs/Crawlio/control.port on app launch and deleted on quit.
Connect via UDS
curl --unix-socket ~/Library/Logs/Crawlio/control.sock http://localhost/statusConnect via TCP (fallback)
PORT=$(cat ~/Library/Logs/Crawlio/control.port)
curl http://localhost:$PORT/statusEngine control
| Method | Path | Description |
|---|---|---|
GET |
/status |
Engine state, progress counters, enrichment summary, output path. Pass ?since=N for change detection (returns 304 if unchanged) |
POST |
/start |
Start a crawl. Body: { "url": "..." } or { "urls": [...] } with optional destinationPath |
POST |
/stop |
Stop the current crawl. Cancels all in-flight requests |
POST |
/pause |
Pause the current crawl. In-flight downloads complete, no new requests start |
POST |
/resume |
Resume a paused crawl |
POST |
/recrawl |
Re-inject URLs into the frontier. Body: { "urls": [...] }. Returns 200 if crawl is active, 202 if starting new |
Engine states
idle, crawling, stopping, localizing, extracting, paused, completed, failed
Start crawl examples
Single URL:
{ "url": "https://example.com" }Multiple URLs with custom destination:
{
"urls": ["https://example.com", "https://example.org"],
"destinationPath": "~/Downloads/Crawlio/multi"
}Status response
{
"engineState": "crawling",
"seedURL": "https://example.com",
"seq": 42,
"progress": {
"totalDiscovered": 150,
"downloaded": 85,
"failed": 2,
"queued": 63,
"localized": 80
},
"enrichment": {
"pagesEnriched": 12,
"frameworksDetected": 3,
"totalNetworkRequests": 245,
"consoleErrors": 1,
"consoleWarnings": 5
},
"outputPath": "/Users/you/Downloads/Crawlio/example.com"
}Settings
| Method | Path | Description |
|---|---|---|
GET, PATCH |
/settings |
Read or update download settings and crawl policy. PATCH uses merge patch (only include fields to change). Returns 409 if engine is active |
PATCH example
{
"settings": { "maxConcurrent": 20 },
"policy": { "maxDepth": 5 }
}GET response
{
"settings": {
"maxConcurrent": 10,
"crawlDelay": 0.0,
"timeout": 60,
"downloadImages": true,
"downloadVideo": true,
"stripTrackingParams": true,
"maxRetries": 3
},
"policy": {
"scopeMode": "sameDomain",
"maxDepth": 0,
"maxPagesPerCrawl": 0,
"respectRobotsTxt": true
}
}Projects
| Method | Path | Description |
|---|---|---|
GET |
/projects |
List all saved projects |
POST |
/projects |
Save the current project. Optional body: { "name": "..." } |
GET |
/projects/{id} |
Get full project details (settings, policy, seed URL, destination) |
DELETE |
/projects/{id} |
Delete a saved project |
POST |
/projects/{id}/load |
Load a saved project, restoring settings and state |
List response
{
"projects": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Stripe Docs",
"seedURL": "https://docs.stripe.com",
"createdAt": "2026-02-14T10:30:00Z",
"destinationPath": "/Users/you/Downloads/Crawlio/docs.stripe.com"
}
]
}Downloads and logs
| Method | Path | Description |
|---|---|---|
GET |
/downloads |
Downloaded files with status, size, and timing |
GET |
/failed-urls |
Failed downloads with error messages |
GET |
/site-tree |
ASCII directory tree of downloaded files |
GET |
/crawled-urls |
All URLs with filtering and pagination. Query: ?status=completed&type=html&limit=1000&offset=0 |
GET |
/logs |
Log entries. Query: ?category=engine&level=info&limit=50 |
Crawled URLs response
{
"total": 150,
"offset": 0,
"limit": 1000,
"urls": [
{
"url": "https://example.com/index.html",
"status": "completed",
"bytesReceived": 15234,
"contentType": "text/html",
"statusCode": 200
}
]
}Status filter values: completed, downloading, failed, queued
Log categories: engine, download, parser, localizer, network
Log levels: debug, info, notice, warning, error, fault
Export and extraction
| Method | Path | Description |
|---|---|---|
POST |
/export |
Start an async export. Body: { "format": "warc", "destinationPath": "~/archives/site.warc" } |
GET |
/export/status |
Export progress. States: idle, exporting, completed, failed |
POST |
/extract |
Start the content extraction pipeline. Body: { "destinationPath": "~/extracted" } |
GET |
/extract/status |
Extraction progress with phase and completion stats |
Export formats
folder, zip, singleHTML, warc, extracted, deploy
Export status response
In progress:
{ "state": "exporting", "format": "warc", "progress": 0.45 }Completed:
{ "state": "completed", "path": "~/archives/site.warc" }Extraction status response
{
"state": "completed",
"totalPages": 247,
"totalAssets": 1203,
"missingSources": 3,
"frameworkMode": "nextjs"
}Enrichment
| Method | Path | Description |
|---|---|---|
GET |
/enrichment |
Get stored enrichment data. Pass ?url=... to filter by URL |
POST |
/enrichment/bundle |
Submit a complete enrichment bundle (all data types) |
POST |
/enrichment/framework |
Submit framework detection data |
POST |
/enrichment/network |
Submit network request data. Discovered URLs are offered to the crawl engine |
POST |
/enrichment/console |
Submit console log data |
POST |
/enrichment/dom |
Submit a DOM snapshot |
Bundle request
{
"url": "https://example.com",
"framework": { "name": "nextjs", "version": "14.1", "confidence": 0.95 },
"networkRequests": [
{ "url": "https://cdn.example.com/bundle.js", "method": "GET", "status": 200, "type": "script" }
],
"consoleLogs": [
{ "level": "error", "message": "Failed to load resource", "timestamp": 1708000000 }
],
"domSnapshotJSON": "{...}"
}Bundle response
{
"stored": true,
"url": "https://example.com",
"fields": ["framework", "networkRequests", "consoleLogs", "domSnapshotJSON"],
"urlsOffered": 1
}Capture
| Method | Path | Description |
|---|---|---|
POST |
/capture |
Trigger a WebKit runtime capture for a URL. Runs framework detection, intercepts network requests, captures console logs, and takes a DOM snapshot |
Request
{ "url": "https://example.com" }Response (202)
{ "status": "capturing", "url": "https://example.com" }Analysis
| Method | Path | Description |
|---|---|---|
GET |
/structured-data |
JSON-LD, tables, microdata, and RDFa. Query: ?url=...&type=... |
GET |
/observations |
Query the observation log. Query: ?host=...&op=...&source=...&since=T&limit=N |
GET |
/observation/{id} |
Look up a single observation by ID (prefix match) |
POST |
/finding |
Create a curated finding with evidence |
GET |
/findings |
Query curated findings. Query: ?host=...&limit=N |
Observation sources
extension, webkit, agent
Create finding request
{
"title": "React hydration errors on /products page",
"url": "https://example.com/products",
"evidence": ["obs-id-1", "obs-id-2"],
"synthesis": "Multiple hydration mismatches suggest SSR/client HTML divergence."
}Create finding response (201)
{ "status": "created", "id": "finding-uuid-..." }Intelligence (Pro tier)
| Method | Path | Description |
|---|---|---|
GET |
/tech-stack |
Detected technologies with name, categories, confidence, version, and signals |
GET |
/seo-findings |
SEO analysis findings. Query: ?severity=...&category=... |
GET |
/design-intel |
Design system data (colors, typography, spacing, breakpoints, components) |
GET |
/keyword-intel |
Keyword analysis (top keywords, co-occurring groups, density) |
GET |
/duplicate-content |
Exact duplicates and near-duplicates with similarity scores |
These endpoints require a Pro license. They return a tier-gating error on Free and Core tiers.
Vault
| Method | Path | Description |
|---|---|---|
POST |
/vault/request-login |
Open the auth browser so you can log in. The session is captured and stored. Body: { "domain": "example.com", "loginURL": "https://example.com/login" } |
Debug
| Method | Path | Description |
|---|---|---|
GET |
/debug/metrics |
Runtime metrics |
GET, POST |
/debug/log-level |
Read or set the current log level |
POST |
/debug/dump-state |
Dump the full engine state for debugging |
System
| Method | Path | Description |
|---|---|---|
GET |
/health |
Server health check. Returns { "status": "ok", "version": "..." } |
GET |
/license |
License status and tier |
POST |
/open-folder |
Open the download output directory in Finder |
Error handling
All endpoints return JSON. On error:
{ "error": "Description of the error" }HTTP status codes
| Code | Meaning |
|---|---|
| 200 | Success |
| 201 | Created (new finding, new project) |
| 202 | Accepted (async operation started) |
| 304 | Not Modified (no changes since ?since=N) |
| 400 | Bad request (missing fields, invalid parameters) |
| 404 | Not found (project, enrichment data) |
| 409 | Conflict (settings change while crawling) |
| 429 | Rate limited (free tier weekly crawl limit) |
| 500 | Internal server error |
Free tier rate limit (429)
{
"error": "free_tier_limit",
"message": "Free tier limit reached (5/5 crawls this week).",
"resetsAt": "2026-02-15T00:00:00Z"
}Next steps
- See Architecture for how the crawl pipeline works
- Use MCP Tools for AI-native access to these endpoints
- Check Troubleshooting if you have connection issues