CrawlioCrawlio Docs

Interactive Shell

The Crawlio shell is an interactive REPL for live crawl management. It maintains a persistent connection to the Crawlio app and the browser agent, so you can start crawls, check status, adjust settings, and control the browser from one session.

Tier: Core.

Launch

crawlio shell
 ██████╗██████╗  █████╗ ██╗    ██╗██╗     ██╗ ██████╗
██╔════╝██╔══██╗██╔══██╗██║    ██║██║     ██║██╔═══██╗
██║     ██████╔╝███████║██║ █╗ ██║██║     ██║██║   ██║
██║     ██╔══██╗██╔══██║██║███╗██║██║     ██║██║   ██║
╚██████╗██║  ██║██║  ██║╚███╔███╔╝███████╗██║╚██████╔╝
 ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚══════╝╚═╝ ╚═════╝
 
Interactive Shell v1.0.0
Type help for commands, quit to exit
 
crawlio> _

Features

  • Connection-aware prompt. Shows crawlio[connected]> or crawlio[offline]>, with |agent suffix when the browser agent is connected.
  • Command history. Arrow keys to recall previous commands, persisted across sessions.
  • History recall. !N replays command history entry N.
  • Auto-agent connection. If crawlio-browser is running, connects automatically via WebSocket.
  • Connection refresh. Re-checks connectivity every 10 commands.
  • SIGINT handling. Ctrl+C cancels the current line without exiting the shell.
  • EOF exit. Ctrl+D exits the shell.

Built-in commands

Command Description
crawl <url> [--dest <path>] Start a crawl
stop Stop current crawl
pause Pause current crawl
resume Resume paused crawl
recrawl <url> [<url>...] Re-crawl specific URLs
status Show crawl status
watch Watch progress live (Ctrl+C to stop)
downloads [failed|tree] List downloads
export <format> --dest <path> Export the site
extract [--dest <path>] Run extraction pipeline
settings [show|set|reset] View/modify settings
project [list|save|load|delete] Manage projects
agent [subcommand] Browser agent commands
history Show command history
!N Recall history entry N
clear Clear the terminal
help / ? Show available commands
quit / exit / q Exit shell

Agent subcommands

When the browser agent is connected, you can control the browser directly from the shell:

Command Arguments Description
agent status -- Agent health + connected tabs
agent connect [host:port] Connect to agent (default: 127.0.0.1:9333-9342)
agent disconnect -- Disconnect from agent
agent tabs -- List browser tabs
agent detect [tabId] Detect frameworks on page
agent capture [tabId] Full page capture (framework + network + console)
agent network [tabId] Get network requests
agent console [tabId] Get console logs
agent cookies [domain] Get cookies
agent screenshot [tabId] Take screenshot (saves to temp dir)
agent dom [tabId] Get DOM snapshot
agent navigate <url> Navigate browser to URL
agent click <selector> Click a DOM element
agent type <selector> <text> Type text into input
agent press <key> Press keyboard key
agent hover <selector> Hover over element
agent select <selector> <value> Select dropdown option
agent wait <seconds> Wait/sleep (max 30s)
agent intercept <pattern> <action> Intercept requests (block/mock)
agent intercept disable Disable all interception

Example session

crawlio[connected]> crawl https://react.dev
  Crawl started for https://react.dev
 
crawlio[connected]> watch
  Status:   crawling
  URL:      https://react.dev
  Downloaded: 45 / 312  (14%)
  ^C
 
crawlio[connected]> settings set settings.maxConcurrent 20
  settings.maxConcurrent = 20
 
crawlio[connected]> downloads failed
  No failed downloads
 
crawlio[connected|agent]> agent detect
  Framework: React 18.2.0
  Bundler: webpack 5.x
  Router: react-router 6.x
 
crawlio[connected|agent]> agent network
  142 requests captured
  87 documents, 32 scripts, 15 stylesheets, 8 images
 
crawlio[connected]> export warc --dest ~/react-docs.warc
  Export started (warc)
  Exporting... 100%
  Export completed: ~/react-docs.warc
 
crawlio[connected]> quit
  Goodbye!

Next steps

© 2026 Crawlio. All rights reserved.