CrawlioCrawlio Docs

CLI Commands

crawlio (default command)

The one-liner that does everything. Crawl, enrich via browser agent, and export in one command.

crawlio <url>
crawlio run <url> [options]

If no URL is given, prints the welcome banner with usage hints.

Option Type Default Description
url string required URL or bare domain (e.g. crawlio.app or https://crawlio.app)
--max-iterations int 5 Maximum autonomous loop iterations
--target-coverage double 0.95 Target coverage ratio (0.0-1.0)
--dest string auto Destination directory for downloads
--export string folder Export format on completion
--log-file string -- Path to write JSON report
--no-agent bool false Skip browser agent enrichment
--block-trackers bool false Block tracker/analytics requests via agent

Behavior:

  1. Normalizes bare domains to https://
  2. Checks license tier (free tier: limited crawls/week)
  3. Auto-connects to crawlio-browser unless --no-agent
  4. Runs the autonomous crawl loop
  5. Exports on completion

Exit codes: 0 = target coverage reached, 1 = exhausted/stalled, 2 = error


crawl

Direct engine control. Five subcommands.

crawl start

Start a new crawl.

crawlio crawl start <url> [urls...] [--dest <path>] [--watch]
Flag Description
<url> One or more URLs to crawl (required)
--dest Destination directory for downloaded files
--watch Stream live progress after starting
crawlio crawl start https://docs.stripe.com --dest ~/stripe-docs --watch

Multiple URLs:

crawlio crawl start https://example.com https://example.org

crawl stop

Stop the current crawl. All in-flight requests are cancelled.

crawlio crawl stop

crawl pause

Pause the current crawl. In-flight downloads complete, but no new requests start.

crawlio crawl pause

crawl resume

Resume a paused crawl.

crawlio crawl resume [--watch]

crawl recrawl

Re-crawl specific URLs. If a crawl is active, URLs are injected into the frontier. Otherwise a new crawl starts.

crawlio crawl recrawl <url> [urls...] [--watch]

Retry all failed pages:

crawlio downloads failed --urls-only | xargs crawlio crawl recrawl

Watch mode (--watch): Polls status every 2 seconds with live progress. Stops when the engine finishes crawling, localizing, or extracting.


status

Show current crawl status and progress counters.

crawlio status [--watch] [--interval <seconds>] [--since <seq>]
Flag Type Default Description
--watch bool false Continuously poll for changes
--interval double 2.0 Poll interval in seconds (with --watch)
--since int -- Only show changes since this sequence number
crawlio status
  Status:   crawling
  URL:      https://example.com
 
  Discovered:  150
  Downloaded:  85
  Failed:      2
  Queued:      63
  Localized:   80

JSON output:

crawlio status --json
{
  "engineState": "crawling",
  "seedURL": "https://example.com",
  "seq": 42,
  "progress": {
    "totalDiscovered": 150,
    "downloaded": 85,
    "failed": 2,
    "queued": 63,
    "localized": 80
  }
}

Works offline via file-based fallback (with a warning). If --since is provided and nothing changed, prints "No changes".


downloads

View download items. Three subcommands (default: list).

downloads list

List all download items.

crawlio downloads list [--status <status>] [--sort <field>] [--limit <n>]
Flag Type Default Description
--status string -- Filter: completed, downloading, failed, queued, skipped
--sort string -- Sort by: url, status, size
--limit int 100 Maximum items to show
crawlio downloads list --status completed --sort size --limit 20
  Downloads (20 items)
 
  STATUS        CODE     SIZE  URL
  completed      200   1.2 MB  https://example.com/images/hero.jpg
  completed      200  456.0 KB  https://example.com/bundle.js
  completed      200   15.2 KB  https://example.com/index.html

downloads failed

Show failed downloads with error details.

crawlio downloads failed [--urls-only]
Flag Description
--urls-only Output only URLs, one per line (for piping)
crawlio downloads failed
  Failed Downloads (3)
 
  https://example.com/api/internal  HTTP 403  Forbidden
  https://example.com/old-page      HTTP 404  Not Found
  https://example.com/timeout        Connection timed out

downloads tree

Show downloaded files as a directory tree.

crawlio downloads tree
  Site Tree (47 files)
 
  example.com/
  ├── index.html
  ├── about/
  │   └── index.html
  ├── css/
  │   └── style.css
  └── images/
      ├── logo.png
      └── hero.jpg

export

Export the downloaded site. Two subcommands (default: run). Tier: Core.

export run

Export in a specific format.

crawlio export run <format> --dest <path> [--watch]
Flag Description
<format> Export format (required): folder, zip, singleHTML, warc
--dest Destination path (required)
--watch Watch export progress until completion
crawlio export run warc --dest ~/archives/example.warc --watch
  Export started (warc)
  Exporting... 100%
  Export completed: ~/archives/example.warc

export status

Check the current export state.

crawlio export status

States: idle, exporting (with progress %), completed, failed.


extract

Run the content extraction pipeline. Two subcommands (default: run). Tier: Core.

extract run

crawlio extract run [--dest <path>] [--watch]
Flag Description
--dest Output directory for extracted artifacts
--watch Watch extraction progress until completion
crawlio extract run --dest ~/extracted --watch
  Extraction started
  Extracting (parsing)... 100%
  Extraction completed
  Framework:       nextjs
  Pages:           247
  Assets:          1,203
  Missing sources: 3

extract status

Check the current extraction state.

crawlio extract status

States: idle, running (with phase + progress %), completed, failed.


settings

View and modify engine settings. Three subcommands (default: show).

settings show

Display current download settings and crawl policy.

crawlio settings show
  Download Settings
  crawlDelay: 0.0
  downloadFonts: true
  downloadImages: true
  maxConcurrent: 10
  maxRetries: 3
  timeout: 60
 
  Crawl Policy
  maxDepth: 0
  maxPagesPerCrawl: 0
  respectRobotsTxt: true
  scopeMode: sameDomain

settings set

Update a single setting using dot notation.

crawlio settings set <key> <value>

Settings fields:

Key Default Description
settings.maxConcurrent 4 Max concurrent downloads
settings.crawlDelay 0.5 Delay between requests (seconds)
settings.timeout 30 Request timeout (seconds)
settings.downloadImages true Download image files
settings.downloadVideo false Download video files
settings.maxRetries 2 Max retry attempts per URL
settings.stripTrackingParams true Strip tracking query parameters

Policy fields:

Key Default Description
policy.scopeMode sameDomain URL scope mode
policy.maxDepth 5 Maximum crawl depth
policy.maxPagesPerCrawl 1000 Maximum pages per crawl
policy.respectRobotsTxt true Respect robots.txt
policy.excludePatterns [] URL patterns to exclude
policy.includePatterns [] URL patterns to include
policy.includeSupportingFiles true Download supporting assets (CSS, JS, images)
crawlio settings set settings.maxConcurrent 20
crawlio settings set policy.maxDepth 3

Settings can only be changed when the engine is idle. Stop any active crawl first (HTTP 409 if active).

settings reset

Reset settings to defaults.

crawlio settings reset [--settings] [--policy]
Flag Description
--settings Reset only download settings
--policy Reset only crawl policy

If neither flag is given, both are reset.


project

Manage saved projects. Five subcommands (default: list).

project list

crawlio project list
  Saved Projects (2)
 
  ID                                    NAME              URL                           CREATED
  a1b2c3d4-...                          Stripe Docs       https://docs.stripe.com       2026-02-14
  e5f6g7h8-...                          React Docs        https://react.dev             2026-02-13

project save

Save the current project.

crawlio project save [--name <name>]

project load

Load a saved project by ID.

crawlio project load <id>

project run

Load a project and start crawling immediately.

crawlio project run <id> [--watch]

project delete

Delete a saved project.

crawlio project delete <id> [--force]
Flag Description
--force Skip confirmation prompt

logs

Show structured log entries from ~/Library/Logs/Crawlio/.

crawlio logs [--category <cat>] [--level <lvl>] [--limit <n>] [--errors] [--follow]
Flag Type Default Description
--category string -- Filter by category
--level string -- Filter by minimum level
--limit int 50 Max entries to show
--errors bool false Show only error/fault entries
--follow bool false Tail continuously (like tail -f)

Categories: engine, download, parser, localizer, network, ui

Levels: debug, info, default, error, fault

crawlio logs --category download --level error --follow
  2026-02-14T10:30:02Z  download  error  HTTP 429 for https://example.com/api -- retrying in 30s
  2026-02-14T10:30:05Z  download  error  Connection timeout for https://example.com/slow-page

Follow mode watches the log file and prints new entries as they appear. Category, level, and error filters still apply. Ctrl+C to stop.


loop

Run the autonomous crawl-analyze-adjust loop. Tier: Core.

crawlio loop <url> [options]
Flag Type Default Description
<url> string required URL to crawl
--max-iterations int 5 Maximum loop iterations
--target-coverage double 0.95 Target download/discovered ratio (0.0-1.0)
--dest string auto Destination directory
--export string -- Auto-export format on completion
--log-file string -- JSON report path
--agent bool false Enable browser agent
--auth-url string -- URL for interactive browser auth before crawling
--auth-file string -- Cookie file path (Netscape or JSON)
--block-trackers bool false Block tracker/analytics requests via agent

Algorithm: auth, pre-crawl intelligence, then iterate (crawl, gap analysis, adjust settings, check coverage / circuit breaker), then export.

Exit codes: 0 = target reached, 1 = exhausted/stalled, 2 = error

See Autonomous Loop for details on coverage calculation, adjustments, and agent integration.


pipeline

Stdio pipeline mode. JSON in on stdin, JSONL out on stdout. Designed for VM guest or scripted usage.

echo '{"url":"https://example.com"}' | crawlio pipeline

Reads a PipelineRequest JSON from stdin, starts the crawl, emits per-page JSONL results to stdout, and progress to stderr.


investigate

Deep investigation. Intercept and analyze all traffic from a target URL. Tier: Pro.

crawlio investigate <url> [options]
Flag Type Default Description
<url> string required Target URL
--mode string host vm or host
--timeout int 300 Max seconds
--format string summary Output format: json, openapi, har, summary
--output string -- Output directory
--include-flows bool false Include raw flows
crawlio investigate https://api.example.com --format openapi --output ~/api-spec

intel

Intelligence analysis. Five subcommands (default: tech-stack). Tier: Pro.

intel tech-stack

crawlio intel tech-stack

Detected technology stack from the current crawl.

intel seo

crawlio intel seo [--severity <level>] [--category <cat>]

SEO findings with optional severity and category filters.

intel design

crawlio intel design

Design intelligence data (colors, fonts, layout patterns).

intel keywords

crawlio intel keywords

Keyword frequency and distribution analysis.

intel duplicates

crawlio intel duplicates

Duplicate and near-duplicate content detection.


observe

Manage observations. Three subcommands (default: list).

observe list

crawlio observe list [--host <host>] [--op <op>] [--source <src>] [--sid <id>] [--since <ts>] [--limit <n>]

Query the observation log with optional filters.

observe get

crawlio observe get <id>

Get a single observation by ID (prefix match supported).

observe create

crawlio observe create --url <url> --op <op> --source <src> [--meta key=value...]

Create a new observation entry.


finding

Manage findings. Two subcommands (default: list).

finding list

crawlio finding list [--host <host>] [--limit <n>]

finding create

crawlio finding create --title <title> --url <url> --evidence <text> --synthesis <text> [--confidence <0.0-1.0>] [--category <cat>]

capture

Trigger a WebKit runtime capture. Tier: Pro.

crawlio capture <url> [--timeout 30] [--format json|summary]

Sends the URL to the capture endpoint. Returns frameworks detected, network request count, and console message count.

crawlio capture https://spa-example.com --format json

debug

Debug and diagnostics. Four subcommands (default: metrics). Tier: Core.

debug metrics

crawlio debug metrics

Engine metrics with per-host breakdown (requests, errors, latency, bandwidth).

debug log-level

crawlio debug log-level              # Get current level
crawlio debug log-level <level>      # Set level

Levels: debug, info, default, error, fault.

debug dump-state

crawlio debug dump-state [--output <path>]

Dump the full engine state snapshot. Write to file with --output.


enrichment

Browser enrichment data. Six subcommands (default: show). Tier: Core.

enrichment show

crawlio enrichment show [--url <url>]

Display stored enrichment data for a URL or the current crawl.

enrichment framework

Submit framework detection results.

crawlio enrichment framework [--file <path>]

enrichment network

Submit network request data.

crawlio enrichment network [--file <path>]

enrichment console

Submit console log data.

crawlio enrichment console [--file <path>]

enrichment dom

Submit DOM snapshot data.

crawlio enrichment dom [--file <path>]

enrichment bundle

Submit all enrichment types in one payload.

crawlio enrichment bundle [--file <path>]

All submit subcommands read JSON from stdin (pipe from crawlio-agent) or from a file via --file <path>.


data

Structured data extraction. Tier: Core.

crawlio data show [--url <url>] [--type jsonld|tables|microdata|rdfa]

Shows JSON-LD, HTML tables, microdata, or RDFa extracted from the crawled pages.


license

License management. Two subcommands (default: show).

license show

crawlio license show

Shows current tier, remaining crawls, and feature flags. Works offline via Keychain.

license activate

crawlio license activate <key>

Stores the license key in Keychain and validates via the app if connected.


shell

Start an interactive REPL session. Tier: Core.

crawlio shell

Features: connection-aware prompt, command history, auto-agent connection, SIGINT handling. All crawl and agent commands available without the crawlio prefix.

See Interactive Shell for the full command list and example session.


version

Print version info.

crawlio version
  Crawlio CLI v1.0.0

With --json:

{"version": "1.0.0"}

Next steps

© 2026 Crawlio. All rights reserved.