CLI Commands
crawlio (default command)
The one-liner that does everything. Crawl, enrich via browser agent, and export in one command.
crawlio <url>
crawlio run <url> [options]If no URL is given, prints the welcome banner with usage hints.
| Option | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL or bare domain (e.g. crawlio.app or https://crawlio.app) |
--max-iterations |
int | 5 | Maximum autonomous loop iterations |
--target-coverage |
double | 0.95 | Target coverage ratio (0.0-1.0) |
--dest |
string | auto | Destination directory for downloads |
--export |
string | folder | Export format on completion |
--log-file |
string | -- | Path to write JSON report |
--no-agent |
bool | false | Skip browser agent enrichment |
--block-trackers |
bool | false | Block tracker/analytics requests via agent |
Behavior:
- Normalizes bare domains to
https:// - Checks license tier (free tier: limited crawls/week)
- Auto-connects to
crawlio-browserunless--no-agent - Runs the autonomous crawl loop
- Exports on completion
Exit codes: 0 = target coverage reached, 1 = exhausted/stalled, 2 = error
crawl
Direct engine control. Five subcommands.
crawl start
Start a new crawl.
crawlio crawl start <url> [urls...] [--dest <path>] [--watch]| Flag | Description |
|---|---|
<url> |
One or more URLs to crawl (required) |
--dest |
Destination directory for downloaded files |
--watch |
Stream live progress after starting |
crawlio crawl start https://docs.stripe.com --dest ~/stripe-docs --watchMultiple URLs:
crawlio crawl start https://example.com https://example.orgcrawl stop
Stop the current crawl. All in-flight requests are cancelled.
crawlio crawl stopcrawl pause
Pause the current crawl. In-flight downloads complete, but no new requests start.
crawlio crawl pausecrawl resume
Resume a paused crawl.
crawlio crawl resume [--watch]crawl recrawl
Re-crawl specific URLs. If a crawl is active, URLs are injected into the frontier. Otherwise a new crawl starts.
crawlio crawl recrawl <url> [urls...] [--watch]Retry all failed pages:
crawlio downloads failed --urls-only | xargs crawlio crawl recrawlWatch mode (--watch): Polls status every 2 seconds with live progress. Stops when the engine finishes crawling, localizing, or extracting.
status
Show current crawl status and progress counters.
crawlio status [--watch] [--interval <seconds>] [--since <seq>]| Flag | Type | Default | Description |
|---|---|---|---|
--watch |
bool | false | Continuously poll for changes |
--interval |
double | 2.0 | Poll interval in seconds (with --watch) |
--since |
int | -- | Only show changes since this sequence number |
crawlio status Status: crawling
URL: https://example.com
Discovered: 150
Downloaded: 85
Failed: 2
Queued: 63
Localized: 80JSON output:
crawlio status --json{
"engineState": "crawling",
"seedURL": "https://example.com",
"seq": 42,
"progress": {
"totalDiscovered": 150,
"downloaded": 85,
"failed": 2,
"queued": 63,
"localized": 80
}
}Works offline via file-based fallback (with a warning). If --since is provided and nothing changed, prints "No changes".
downloads
View download items. Three subcommands (default: list).
downloads list
List all download items.
crawlio downloads list [--status <status>] [--sort <field>] [--limit <n>]| Flag | Type | Default | Description |
|---|---|---|---|
--status |
string | -- | Filter: completed, downloading, failed, queued, skipped |
--sort |
string | -- | Sort by: url, status, size |
--limit |
int | 100 | Maximum items to show |
crawlio downloads list --status completed --sort size --limit 20 Downloads (20 items)
STATUS CODE SIZE URL
completed 200 1.2 MB https://example.com/images/hero.jpg
completed 200 456.0 KB https://example.com/bundle.js
completed 200 15.2 KB https://example.com/index.htmldownloads failed
Show failed downloads with error details.
crawlio downloads failed [--urls-only]| Flag | Description |
|---|---|
--urls-only |
Output only URLs, one per line (for piping) |
crawlio downloads failed Failed Downloads (3)
https://example.com/api/internal HTTP 403 Forbidden
https://example.com/old-page HTTP 404 Not Found
https://example.com/timeout Connection timed outdownloads tree
Show downloaded files as a directory tree.
crawlio downloads tree Site Tree (47 files)
example.com/
├── index.html
├── about/
│ └── index.html
├── css/
│ └── style.css
└── images/
├── logo.png
└── hero.jpgexport
Export the downloaded site. Two subcommands (default: run). Tier: Core.
export run
Export in a specific format.
crawlio export run <format> --dest <path> [--watch]| Flag | Description |
|---|---|
<format> |
Export format (required): folder, zip, singleHTML, warc |
--dest |
Destination path (required) |
--watch |
Watch export progress until completion |
crawlio export run warc --dest ~/archives/example.warc --watch Export started (warc)
Exporting... 100%
Export completed: ~/archives/example.warcexport status
Check the current export state.
crawlio export statusStates: idle, exporting (with progress %), completed, failed.
extract
Run the content extraction pipeline. Two subcommands (default: run). Tier: Core.
extract run
crawlio extract run [--dest <path>] [--watch]| Flag | Description |
|---|---|
--dest |
Output directory for extracted artifacts |
--watch |
Watch extraction progress until completion |
crawlio extract run --dest ~/extracted --watch Extraction started
Extracting (parsing)... 100%
Extraction completed
Framework: nextjs
Pages: 247
Assets: 1,203
Missing sources: 3extract status
Check the current extraction state.
crawlio extract statusStates: idle, running (with phase + progress %), completed, failed.
settings
View and modify engine settings. Three subcommands (default: show).
settings show
Display current download settings and crawl policy.
crawlio settings show Download Settings
crawlDelay: 0.0
downloadFonts: true
downloadImages: true
maxConcurrent: 10
maxRetries: 3
timeout: 60
Crawl Policy
maxDepth: 0
maxPagesPerCrawl: 0
respectRobotsTxt: true
scopeMode: sameDomainsettings set
Update a single setting using dot notation.
crawlio settings set <key> <value>Settings fields:
| Key | Default | Description |
|---|---|---|
settings.maxConcurrent |
4 | Max concurrent downloads |
settings.crawlDelay |
0.5 | Delay between requests (seconds) |
settings.timeout |
30 | Request timeout (seconds) |
settings.downloadImages |
true | Download image files |
settings.downloadVideo |
false | Download video files |
settings.maxRetries |
2 | Max retry attempts per URL |
settings.stripTrackingParams |
true | Strip tracking query parameters |
Policy fields:
| Key | Default | Description |
|---|---|---|
policy.scopeMode |
sameDomain | URL scope mode |
policy.maxDepth |
5 | Maximum crawl depth |
policy.maxPagesPerCrawl |
1000 | Maximum pages per crawl |
policy.respectRobotsTxt |
true | Respect robots.txt |
policy.excludePatterns |
[] | URL patterns to exclude |
policy.includePatterns |
[] | URL patterns to include |
policy.includeSupportingFiles |
true | Download supporting assets (CSS, JS, images) |
crawlio settings set settings.maxConcurrent 20
crawlio settings set policy.maxDepth 3Settings can only be changed when the engine is idle. Stop any active crawl first (HTTP 409 if active).
settings reset
Reset settings to defaults.
crawlio settings reset [--settings] [--policy]| Flag | Description |
|---|---|
--settings |
Reset only download settings |
--policy |
Reset only crawl policy |
If neither flag is given, both are reset.
project
Manage saved projects. Five subcommands (default: list).
project list
crawlio project list Saved Projects (2)
ID NAME URL CREATED
a1b2c3d4-... Stripe Docs https://docs.stripe.com 2026-02-14
e5f6g7h8-... React Docs https://react.dev 2026-02-13project save
Save the current project.
crawlio project save [--name <name>]project load
Load a saved project by ID.
crawlio project load <id>project run
Load a project and start crawling immediately.
crawlio project run <id> [--watch]project delete
Delete a saved project.
crawlio project delete <id> [--force]| Flag | Description |
|---|---|
--force |
Skip confirmation prompt |
logs
Show structured log entries from ~/Library/Logs/Crawlio/.
crawlio logs [--category <cat>] [--level <lvl>] [--limit <n>] [--errors] [--follow]| Flag | Type | Default | Description |
|---|---|---|---|
--category |
string | -- | Filter by category |
--level |
string | -- | Filter by minimum level |
--limit |
int | 50 | Max entries to show |
--errors |
bool | false | Show only error/fault entries |
--follow |
bool | false | Tail continuously (like tail -f) |
Categories: engine, download, parser, localizer, network, ui
Levels: debug, info, default, error, fault
crawlio logs --category download --level error --follow 2026-02-14T10:30:02Z download error HTTP 429 for https://example.com/api -- retrying in 30s
2026-02-14T10:30:05Z download error Connection timeout for https://example.com/slow-pageFollow mode watches the log file and prints new entries as they appear. Category, level, and error filters still apply. Ctrl+C to stop.
loop
Run the autonomous crawl-analyze-adjust loop. Tier: Core.
crawlio loop <url> [options]| Flag | Type | Default | Description |
|---|---|---|---|
<url> |
string | required | URL to crawl |
--max-iterations |
int | 5 | Maximum loop iterations |
--target-coverage |
double | 0.95 | Target download/discovered ratio (0.0-1.0) |
--dest |
string | auto | Destination directory |
--export |
string | -- | Auto-export format on completion |
--log-file |
string | -- | JSON report path |
--agent |
bool | false | Enable browser agent |
--auth-url |
string | -- | URL for interactive browser auth before crawling |
--auth-file |
string | -- | Cookie file path (Netscape or JSON) |
--block-trackers |
bool | false | Block tracker/analytics requests via agent |
Algorithm: auth, pre-crawl intelligence, then iterate (crawl, gap analysis, adjust settings, check coverage / circuit breaker), then export.
Exit codes: 0 = target reached, 1 = exhausted/stalled, 2 = error
See Autonomous Loop for details on coverage calculation, adjustments, and agent integration.
pipeline
Stdio pipeline mode. JSON in on stdin, JSONL out on stdout. Designed for VM guest or scripted usage.
echo '{"url":"https://example.com"}' | crawlio pipelineReads a PipelineRequest JSON from stdin, starts the crawl, emits per-page JSONL results to stdout, and progress to stderr.
investigate
Deep investigation. Intercept and analyze all traffic from a target URL. Tier: Pro.
crawlio investigate <url> [options]| Flag | Type | Default | Description |
|---|---|---|---|
<url> |
string | required | Target URL |
--mode |
string | host | vm or host |
--timeout |
int | 300 | Max seconds |
--format |
string | summary | Output format: json, openapi, har, summary |
--output |
string | -- | Output directory |
--include-flows |
bool | false | Include raw flows |
crawlio investigate https://api.example.com --format openapi --output ~/api-specintel
Intelligence analysis. Five subcommands (default: tech-stack). Tier: Pro.
intel tech-stack
crawlio intel tech-stackDetected technology stack from the current crawl.
intel seo
crawlio intel seo [--severity <level>] [--category <cat>]SEO findings with optional severity and category filters.
intel design
crawlio intel designDesign intelligence data (colors, fonts, layout patterns).
intel keywords
crawlio intel keywordsKeyword frequency and distribution analysis.
intel duplicates
crawlio intel duplicatesDuplicate and near-duplicate content detection.
observe
Manage observations. Three subcommands (default: list).
observe list
crawlio observe list [--host <host>] [--op <op>] [--source <src>] [--sid <id>] [--since <ts>] [--limit <n>]Query the observation log with optional filters.
observe get
crawlio observe get <id>Get a single observation by ID (prefix match supported).
observe create
crawlio observe create --url <url> --op <op> --source <src> [--meta key=value...]Create a new observation entry.
finding
Manage findings. Two subcommands (default: list).
finding list
crawlio finding list [--host <host>] [--limit <n>]finding create
crawlio finding create --title <title> --url <url> --evidence <text> --synthesis <text> [--confidence <0.0-1.0>] [--category <cat>]capture
Trigger a WebKit runtime capture. Tier: Pro.
crawlio capture <url> [--timeout 30] [--format json|summary]Sends the URL to the capture endpoint. Returns frameworks detected, network request count, and console message count.
crawlio capture https://spa-example.com --format jsondebug
Debug and diagnostics. Four subcommands (default: metrics). Tier: Core.
debug metrics
crawlio debug metricsEngine metrics with per-host breakdown (requests, errors, latency, bandwidth).
debug log-level
crawlio debug log-level # Get current level
crawlio debug log-level <level> # Set levelLevels: debug, info, default, error, fault.
debug dump-state
crawlio debug dump-state [--output <path>]Dump the full engine state snapshot. Write to file with --output.
enrichment
Browser enrichment data. Six subcommands (default: show). Tier: Core.
enrichment show
crawlio enrichment show [--url <url>]Display stored enrichment data for a URL or the current crawl.
enrichment framework
Submit framework detection results.
crawlio enrichment framework [--file <path>]enrichment network
Submit network request data.
crawlio enrichment network [--file <path>]enrichment console
Submit console log data.
crawlio enrichment console [--file <path>]enrichment dom
Submit DOM snapshot data.
crawlio enrichment dom [--file <path>]enrichment bundle
Submit all enrichment types in one payload.
crawlio enrichment bundle [--file <path>]All submit subcommands read JSON from stdin (pipe from crawlio-agent) or from a file via --file <path>.
data
Structured data extraction. Tier: Core.
crawlio data show [--url <url>] [--type jsonld|tables|microdata|rdfa]Shows JSON-LD, HTML tables, microdata, or RDFa extracted from the crawled pages.
license
License management. Two subcommands (default: show).
license show
crawlio license showShows current tier, remaining crawls, and feature flags. Works offline via Keychain.
license activate
crawlio license activate <key>Stores the license key in Keychain and validates via the app if connected.
shell
Start an interactive REPL session. Tier: Core.
crawlio shellFeatures: connection-aware prompt, command history, auto-agent connection, SIGINT handling. All crawl and agent commands available without the crawlio prefix.
See Interactive Shell for the full command list and example session.
version
Print version info.
crawlio version Crawlio CLI v1.0.0With --json:
{"version": "1.0.0"}Next steps
- Learn about the Interactive Shell for hands-on exploration
- Learn about the Autonomous Loop for hands-free crawling
- Return to the CLI Overview for connection and install details