CLI Commands

crawlio (default command)

The one-liner that does everything. Crawl, enrich via browser agent, and export in one command.

crawlio <url>
crawlio run <url> [options]

If no URL is given, prints the welcome banner with usage hints.

Option	Type	Default	Description
`url`	string	required	URL or bare domain (e.g. `crawlio.app` or `https://crawlio.app`)
`--max-iterations`	int	5	Maximum autonomous loop iterations
`--target-coverage`	double	0.95	Target coverage ratio (0.0-1.0)
`--dest`	string	auto	Destination directory for downloads
`--export`	string	folder	Export format on completion
`--log-file`	string	--	Path to write JSON report
`--no-agent`	bool	false	Skip browser agent enrichment
`--block-trackers`	bool	false	Block tracker/analytics requests via agent

Behavior:

Normalizes bare domains to https://
Checks license tier (free tier: limited crawls/week)
Auto-connects to crawlio-browser unless --no-agent
Runs the autonomous crawl loop
Exports on completion

Exit codes: 0 = target coverage reached, 1 = exhausted/stalled, 2 = error

crawl

Direct engine control. Five subcommands.

crawl start

Start a new crawl.

crawlio crawl start <url> [urls...] [--dest <path>] [--watch]

Flag	Description
`<url>`	One or more URLs to crawl (required)
`--dest`	Destination directory for downloaded files
`--watch`	Stream live progress after starting

crawlio crawl start https://docs.stripe.com --dest ~/stripe-docs --watch

Multiple URLs:

crawlio crawl start https://example.com https://example.org

crawl stop

Stop the current crawl. All in-flight requests are cancelled.

crawlio crawl stop

crawl pause

Pause the current crawl. In-flight downloads complete, but no new requests start.

crawlio crawl pause

crawl resume

Resume a paused crawl.

crawlio crawl resume [--watch]

crawl recrawl

Re-crawl specific URLs. If a crawl is active, URLs are injected into the frontier. Otherwise a new crawl starts.

crawlio crawl recrawl <url> [urls...] [--watch]

Retry all failed pages:

crawlio downloads failed --urls-only | xargs crawlio crawl recrawl

Watch mode (--watch): Polls status every 2 seconds with live progress. Stops when the engine finishes crawling, localizing, or extracting.

status

Show current crawl status and progress counters.

crawlio status [--watch] [--interval <seconds>] [--since <seq>]

Flag	Type	Default	Description
`--watch`	bool	false	Continuously poll for changes
`--interval`	double	2.0	Poll interval in seconds (with `--watch`)
`--since`	int	--	Only show changes since this sequence number

crawlio status

  Status:   crawling
  URL:      https://example.com
 
  Discovered:  150
  Downloaded:  85
  Failed:      2
  Queued:      63
  Localized:   80

JSON output:

crawlio status --json

{
  "engineState": "crawling",
  "seedURL": "https://example.com",
  "seq": 42,
  "progress": {
    "totalDiscovered": 150,
    "downloaded": 85,
    "failed": 2,
    "queued": 63,
    "localized": 80
  }
}

Works offline via file-based fallback (with a warning). If --since is provided and nothing changed, prints "No changes".

downloads

View download items. Three subcommands (default: list).

downloads list

List all download items.

crawlio downloads list [--status <status>] [--sort <field>] [--limit <n>]

Flag	Type	Default	Description
`--status`	string	--	Filter: `completed`, `downloading`, `failed`, `queued`, `skipped`
`--sort`	string	--	Sort by: `url`, `status`, `size`
`--limit`	int	100	Maximum items to show

crawlio downloads list --status completed --sort size --limit 20

  Downloads (20 items)
 
  STATUS        CODE     SIZE  URL
  completed      200   1.2 MB  https://example.com/images/hero.jpg
  completed      200  456.0 KB  https://example.com/bundle.js
  completed      200   15.2 KB  https://example.com/index.html

downloads failed

Show failed downloads with error details.

crawlio downloads failed [--urls-only]

Flag	Description
`--urls-only`	Output only URLs, one per line (for piping)

crawlio downloads failed

  Failed Downloads (3)
 
  https://example.com/api/internal  HTTP 403  Forbidden
  https://example.com/old-page      HTTP 404  Not Found
  https://example.com/timeout        Connection timed out

downloads tree

Show downloaded files as a directory tree.

crawlio downloads tree

  Site Tree (47 files)
 
  example.com/
  ├── index.html
  ├── about/
  │   └── index.html
  ├── css/
  │   └── style.css
  └── images/
      ├── logo.png
      └── hero.jpg

export

Export the downloaded site. Two subcommands (default: run). Tier: Core.

export run

Export in a specific format.

crawlio export run <format> --dest <path> [--watch]

Flag	Description
`<format>`	Export format (required): `folder`, `zip`, `singleHTML`, `warc`
`--dest`	Destination path (required)
`--watch`	Watch export progress until completion

crawlio export run warc --dest ~/archives/example.warc --watch

  Export started (warc)
  Exporting... 100%
  Export completed: ~/archives/example.warc

export status

Check the current export state.

crawlio export status

States: idle, exporting (with progress %), completed, failed.

extract

Run the content extraction pipeline. Two subcommands (default: run). Tier: Core.

extract run

crawlio extract run [--dest <path>] [--watch]

Flag	Description
`--dest`	Output directory for extracted artifacts
`--watch`	Watch extraction progress until completion

crawlio extract run --dest ~/extracted --watch

  Extraction started
  Extracting (parsing)... 100%
  Extraction completed
  Framework:       nextjs
  Pages:           247
  Assets:          1,203
  Missing sources: 3

extract status

Check the current extraction state.

crawlio extract status

States: idle, running (with phase + progress %), completed, failed.

settings

View and modify engine settings. Three subcommands (default: show).

settings show

Display current download settings and crawl policy.

crawlio settings show

  Download Settings
  crawlDelay: 0.0
  downloadFonts: true
  downloadImages: true
  maxConcurrent: 10
  maxRetries: 3
  timeout: 60
 
  Crawl Policy
  maxDepth: 0
  maxPagesPerCrawl: 0
  respectRobotsTxt: true
  scopeMode: sameDomain

settings set

Update a single setting using dot notation.

crawlio settings set <key> <value>

Settings fields:

Key	Default	Description
`settings.maxConcurrent`	4	Max concurrent downloads
`settings.crawlDelay`	0.5	Delay between requests (seconds)
`settings.timeout`	30	Request timeout (seconds)
`settings.downloadImages`	true	Download image files
`settings.downloadVideo`	false	Download video files
`settings.maxRetries`	2	Max retry attempts per URL
`settings.stripTrackingParams`	true	Strip tracking query parameters

Policy fields:

Key	Default	Description
`policy.scopeMode`	sameDomain	URL scope mode
`policy.maxDepth`	5	Maximum crawl depth
`policy.maxPagesPerCrawl`	1000	Maximum pages per crawl
`policy.respectRobotsTxt`	true	Respect robots.txt
`policy.excludePatterns`	[]	URL patterns to exclude
`policy.includePatterns`	[]	URL patterns to include
`policy.includeSupportingFiles`	true	Download supporting assets (CSS, JS, images)

crawlio settings set settings.maxConcurrent 20
crawlio settings set policy.maxDepth 3

Settings can only be changed when the engine is idle. Stop any active crawl first (HTTP 409 if active).

settings reset

Reset settings to defaults.

crawlio settings reset [--settings] [--policy]

Flag	Description
`--settings`	Reset only download settings
`--policy`	Reset only crawl policy

If neither flag is given, both are reset.

project

Manage saved projects. Five subcommands (default: list).

project list

crawlio project list

  Saved Projects (2)
 
  ID                                    NAME              URL                           CREATED
  a1b2c3d4-...                          Stripe Docs       https://docs.stripe.com       2026-02-14
  e5f6g7h8-...                          React Docs        https://react.dev             2026-02-13

project save

Save the current project.

crawlio project save [--name <name>]

project load

Load a saved project by ID.

crawlio project load <id>

project run

Load a project and start crawling immediately.

crawlio project run <id> [--watch]

project delete

Delete a saved project.

crawlio project delete <id> [--force]

Flag	Description
`--force`	Skip confirmation prompt

logs

Show structured log entries from ~/Library/Logs/Crawlio/.

crawlio logs [--category <cat>] [--level <lvl>] [--limit <n>] [--errors] [--follow]

Flag	Type	Default	Description
`--category`	string	--	Filter by category
`--level`	string	--	Filter by minimum level
`--limit`	int	50	Max entries to show
`--errors`	bool	false	Show only error/fault entries
`--follow`	bool	false	Tail continuously (like `tail -f`)

Categories: engine, download, parser, localizer, network, ui

Levels: debug, info, default, error, fault

crawlio logs --category download --level error --follow

  2026-02-14T10:30:02Z  download  error  HTTP 429 for https://example.com/api -- retrying in 30s
  2026-02-14T10:30:05Z  download  error  Connection timeout for https://example.com/slow-page

Follow mode watches the log file and prints new entries as they appear. Category, level, and error filters still apply. Ctrl+C to stop.

loop

Run the autonomous crawl-analyze-adjust loop. Tier: Core.

crawlio loop <url> [options]

Flag	Type	Default	Description
`<url>`	string	required	URL to crawl
`--max-iterations`	int	5	Maximum loop iterations
`--target-coverage`	double	0.95	Target download/discovered ratio (0.0-1.0)
`--dest`	string	auto	Destination directory
`--export`	string	--	Auto-export format on completion
`--log-file`	string	--	JSON report path
`--agent`	bool	false	Enable browser agent
`--auth-url`	string	--	URL for interactive browser auth before crawling
`--auth-file`	string	--	Cookie file path (Netscape or JSON)
`--block-trackers`	bool	false	Block tracker/analytics requests via agent

Algorithm: auth, pre-crawl intelligence, then iterate (crawl, gap analysis, adjust settings, check coverage / circuit breaker), then export.

Exit codes: 0 = target reached, 1 = exhausted/stalled, 2 = error

See Autonomous Loop for details on coverage calculation, adjustments, and agent integration.

pipeline

Stdio pipeline mode. JSON in on stdin, JSONL out on stdout. Designed for VM guest or scripted usage.

echo '{"url":"https://example.com"}' | crawlio pipeline

Reads a PipelineRequest JSON from stdin, starts the crawl, emits per-page JSONL results to stdout, and progress to stderr.

investigate

Deep investigation. Intercept and analyze all traffic from a target URL. Tier: Pro.

crawlio investigate <url> [options]

Flag	Type	Default	Description
`<url>`	string	required	Target URL
`--mode`	string	host	`vm` or `host`
`--timeout`	int	300	Max seconds
`--format`	string	summary	Output format: `json`, `openapi`, `har`, `summary`
`--output`	string	--	Output directory
`--include-flows`	bool	false	Include raw flows

crawlio investigate https://api.example.com --format openapi --output ~/api-spec

intel

Intelligence analysis. Five subcommands (default: tech-stack). Tier: Pro.

intel tech-stack

crawlio intel tech-stack

Detected technology stack from the current crawl.

intel seo

crawlio intel seo [--severity <level>] [--category <cat>]

SEO findings with optional severity and category filters.

intel design

crawlio intel design

Design intelligence data (colors, fonts, layout patterns).

intel keywords

crawlio intel keywords

Keyword frequency and distribution analysis.

intel duplicates

crawlio intel duplicates

Duplicate and near-duplicate content detection.

observe

Manage observations. Three subcommands (default: list).

observe list

crawlio observe list [--host <host>] [--op <op>] [--source <src>] [--sid <id>] [--since <ts>] [--limit <n>]

Query the observation log with optional filters.

observe get

crawlio observe get <id>

Get a single observation by ID (prefix match supported).

observe create

crawlio observe create --url <url> --op <op> --source <src> [--meta key=value...]

Create a new observation entry.

finding

Manage findings. Two subcommands (default: list).

finding list

crawlio finding list [--host <host>] [--limit <n>]

finding create

crawlio finding create --title <title> --url <url> --evidence <text> --synthesis <text> [--confidence <0.0-1.0>] [--category <cat>]

capture

Trigger a WebKit runtime capture. Tier: Pro.

crawlio capture <url> [--timeout 30] [--format json|summary]

Sends the URL to the capture endpoint. Returns frameworks detected, network request count, and console message count.

crawlio capture https://spa-example.com --format json

debug

Debug and diagnostics. Four subcommands (default: metrics). Tier: Core.

debug metrics

crawlio debug metrics

Engine metrics with per-host breakdown (requests, errors, latency, bandwidth).

debug log-level

crawlio debug log-level              # Get current level
crawlio debug log-level <level>      # Set level

Levels: debug, info, default, error, fault.

debug dump-state

crawlio debug dump-state [--output <path>]

Dump the full engine state snapshot. Write to file with --output.

enrichment

Browser enrichment data. Six subcommands (default: show). Tier: Core.

enrichment show

crawlio enrichment show [--url <url>]

Display stored enrichment data for a URL or the current crawl.

enrichment framework

Submit framework detection results.

crawlio enrichment framework [--file <path>]

enrichment network

Submit network request data.

crawlio enrichment network [--file <path>]

enrichment console

Submit console log data.

crawlio enrichment console [--file <path>]

enrichment dom

Submit DOM snapshot data.

crawlio enrichment dom [--file <path>]

enrichment bundle

Submit all enrichment types in one payload.

crawlio enrichment bundle [--file <path>]

All submit subcommands read JSON from stdin (pipe from crawlio-agent) or from a file via --file <path>.

data

Structured data extraction. Tier: Core.

crawlio data show [--url <url>] [--type jsonld|tables|microdata|rdfa]

Shows JSON-LD, HTML tables, microdata, or RDFa extracted from the crawled pages.

license

License management. Two subcommands (default: show).

license show

crawlio license show

Shows current tier, remaining crawls, and feature flags. Works offline via Keychain.

license activate

crawlio license activate <key>

Stores the license key in Keychain and validates via the app if connected.

shell

Start an interactive REPL session. Tier: Core.

crawlio shell

Features: connection-aware prompt, command history, auto-agent connection, SIGINT handling. All crawl and agent commands available without the crawlio prefix.

See Interactive Shell for the full command list and example session.

version

Print version info.

crawlio version

  Crawlio CLI v1.0.0

With --json:

{"version": "1.0.0"}

Next steps

Learn about the Interactive Shell for hands-on exploration
Learn about the Autonomous Loop for hands-free crawling
Return to the CLI Overview for connection and install details

PreviousOverview NextInteractive Shell