First Crawl
Your first download
- Launch Crawlio. Open the app. You will see an empty project window with a URL input field at the top.
- Paste a URL. Type or paste any website URL (for example,
https://example.com). Crawlio addshttps://automatically if you type a bare domain. - Hit Start. Click the download button or press
Cmd+Return. - Watch the waterfall. The waterfall view shows every file being downloaded in real time, color-coded by content type (HTML, CSS, JS, images, fonts, media).
- Browse offline. When the crawl finishes, open the downloaded folder and browse the site locally.
Crawlio rewrites all links so the site works offline. Stylesheets, images, fonts, and internal links all point to local files.
Default destination
Downloads save to ~/Downloads/Crawlio/ by default. Change this in Settings (Cmd+,) under the General tab, or per-project in the project settings panel.
What gets downloaded
Crawlio follows links and downloads all linked resources:
| Content type | Examples |
|---|---|
| HTML pages | Every page reachable from your starting URL |
| Stylesheets | CSS files including @import chains and url() references |
| Images | JPEG, PNG, GIF, WebP, SVG, ICO, AVIF, BMP, TIFF |
| Fonts | WOFF, WOFF2, TTF, OTF, EOT |
| Scripts | JavaScript files referenced in HTML |
| Media | MP4, WebM, MP3, WAV, OGG, and other audio/video |
| Documents | PDFs (with link extraction), XML, JSON |
| Other | Favicons, manifests, robots.txt, sitemaps |
12 specialized parsers handle URL discovery across HTML, CSS, SVG, PDF, JavaScript, sitemaps, manifests, and more.
Crawl settings
Open Settings (Cmd+,) to configure how Crawlio crawls. Key settings:
| Setting | Default | Description |
|---|---|---|
| Max Depth | 5 | How many links deep to follow |
| Concurrent Downloads | 4 | Parallel connections (1 to 40) |
| Crawl Delay | 0.5s | Pause between requests per host |
| Scope | Same Domain | Stay on domain, allow subdomains, or custom list |
| Respect robots.txt | On | Honor site crawl rules |
| Cross-Domain Assets | On | Download CSS, JS, fonts, images from external domains |
| Max File Size | 50 MB | Skip files larger than this |
| Max Total Size | 500 MB | Stop when total download reaches this limit |
For large sites, start with a lower max depth (3 to 5) to test your settings before doing a full crawl.
Export your download
When a crawl completes, export your archive in 7 formats:
| Format | Best for |
|---|---|
| Folder | Default. Offline browsing with rewritten links |
| ZIP | Sharing compressed archives |
| Single HTML | One-file page snapshots |
| WARC | ISO 28500 web archives (Wayback Machine compatible) |
| Full-page PDF rendering via WebKit | |
| Extracted | Clean text and Markdown for AI pipelines |
| Deploy | Deploy-ready static site with manifest and sitemap |
See Export Formats for details on each format.
Using the CLI
Start a crawl from the terminal:
crawlio crawl start https://example.com --depth 5 --scope same-domainThe CLI connects to the running Crawlio.app and gives you terminal control over crawls. See CLI Overview for the full command reference.
Next steps
- Export Formats: All 7 export formats in detail
- Connect AI: Set up MCP for AI-driven crawls
- Common Workflows: Practical recipes for common tasks
- Settings Reference: All configuration options