CrawlioCrawlio Docs

MCP Tools Reference

Overview

Crawlio exposes ~362 tools across 3 pillars. This page covers:

  1. Aggregator meta-tools (5 tools) that unify all pillars
  2. Crawlio App tools (49 tools in Pillar 3) for crawl control, export, intelligence, and vault
  3. References to the other pillar tool sets

In code mode (the default), the 49 Pillar 3 tools are replaced by 6 code-mode tools: search_api, execute_api, trigger_capture, analyze_page, compare_pages, and extract_text_from_image. Every tool below maps to an endpoint accessible through execute_api.


Section 1: Aggregator meta-tools

These 5 tools are what your AI sees when using the aggregator. They route across all 3 pillars.

crawlio_discover

List available tools across all pillars. Returns only schemas matching the current task.

Parameter Type Required Description
query string Yes Describe what you need (e.g. "crawl and export", "browser automation")

Returns: Array of matching tool schemas with names, descriptions, and parameters.


crawlio_call

Route a tool call to the correct pillar.

Parameter Type Required Description
tool string Yes Tool name
args object No Tool arguments

Returns: The tool's response, routed to the appropriate pillar.


crawlio_do

Execute a high-level task with automatic pillar selection.

Parameter Type Required Description
task string Yes Natural language task description

Returns: Task result. The aggregator picks the best pillar based on session state.


crawlio_cortex

Query intelligence data across pillar boundaries.

Parameter Type Required Description
query string Yes Intelligence query

Returns: Combined data from enrichment, browser detection, and crawl analysis.


crawlio_consult

Multi-pillar consultation for complex tasks.

Parameter Type Required Description
question string Yes What you need help with

Returns: Coordinated response from multiple pillars.


Section 2: Crawlio App tools (Pillar 3)

49 tools across 10 categories.

Crawl monitoring (7 tools)

get_crawl_status

Returns current engine state and progress counters.

Parameter Type Required Description
since integer No Returns "no changes" when sequence has not advanced past this value

Returns: { engineState, seedURL, seq, progress: { totalDiscovered, downloaded, failed, queued, localized }, enrichment: { pagesEnriched, frameworksDetected, ... } }


get_crawl_logs

Returns recent log entries with optional filtering.

Parameter Type Required Description
category string No engine, download, parser, localizer, network, ui
level string No debug, info, default, error, fault
limit integer No Max entries (default: 50)

get_errors

Returns error-level and fault-level log entries only.

Parameter Type Required Description
limit integer No Max entries (default: 50)

get_downloads

Returns all download items with status, size, and content type.

Parameter Type Required Description
status string No pending, downloading, completed, failed

get_failed_urls

Returns only failed download items with error details.

Parameters: None.


get_site_tree

Returns an ASCII directory tree of downloaded files.

Parameter Type Required Description
max_depth integer No Maximum tree depth (default: 5)

get_crawled_urls

Returns downloaded URLs with filtering and pagination.

Parameter Type Required Description
status string No completed, downloading, failed, queued
type string No Content type substring (e.g. html)
limit integer No Max results (default: 1000)
offset integer No Skip first N results

Crawl control (5 tools)

start_crawl

Start downloading a website.

Parameter Type Required Description
url string One of url/urls Single URL to crawl
urls array One of url/urls Multiple seed URLs
destinationPath string No Local path to save files

stop_crawl

Stop the current download. All in-flight requests are cancelled.

Parameters: None.


pause_crawl

Pause the current download. In-flight requests complete, no new requests start.

Parameters: None.


resume_crawl

Resume a paused download.

Parameters: None.


recrawl_urls

Re-inject URLs into the crawl frontier.

Parameter Type Required Description
urls array Yes URLs to re-crawl

Settings (2 tools)

get_settings

Returns current download settings and crawl policy.

Parameters: None.

Returns: { settings: {...}, policy: {...} }


update_settings

Update download settings and/or crawl policy via merge patch. Only works when the engine is idle.

Parameter Type Required Description
settings object No Download settings fields to merge
policy object No Crawl policy fields to merge

Projects (5 tools)

list_projects

List all saved projects.

Parameters: None.

Returns: Array of projects with id, name, seedURL, createdAt.


get_project

Get full details for a saved project.

Parameter Type Required Description
id string Yes Project UUID

save_project

Save the current project state.

Parameter Type Required Description
name string No Project name (auto-generated if omitted)

load_project

Load a saved project, restoring settings and state.

Parameter Type Required Description
id string Yes Project UUID

delete_project

Delete a saved project.

Parameter Type Required Description
id string Yes Project UUID

Export and extraction (4 tools)

export_site

Start an asynchronous export of the downloaded site.

Parameter Type Required Description
format string Yes folder, zip, singleHTML, warc, extracted, deploy
destinationPath string Yes Where to write the export
warcConfiguration object No WARC options: { compressionEnabled, maxFileSize, cdxEnabled, dedupEnabled }

Poll get_export_status to track progress.


get_export_status

Returns the current export state and progress.

Parameters: None.

Returns: { state, format, progress, path, error }


extract_site

Run the content extraction pipeline on a completed crawl.

Parameter Type Required Description
destinationPath string No Output directory

Poll get_extraction_status to track progress.


get_extraction_status

Returns the current extraction state.

Parameters: None.

Returns: { state, phase, progress, totalPages, totalAssets }


Enrichment (8 tools)

trigger_capture

Trigger a WebKit runtime capture for a URL. Runs framework detection JS, intercepts network requests, captures console logs, and takes a DOM snapshot.

Parameter Type Required Description
url string Yes URL to capture

get_enrichment

Returns browser enrichment data for a URL or all URLs.

Parameter Type Required Description
url string No Specific URL (omit for all)

Returns: Enrichment objects with framework, networkRequests, consoleLogs, domSnapshotJSON.


get_structured_data

Returns JSON-LD, HTML tables, microdata, and RDFa extracted from crawled pages.

Parameter Type Required Description
url string No Specific URL (omit for site-wide aggregate)

submit_enrichment_bundle

Submit a complete enrichment bundle with all data types.

Parameter Type Required Description
url string Yes The page URL
framework object No { name, version?, confidence? }
networkRequests array No Captured network requests
consoleLogs array No Console output entries
domSnapshotJSON string No DOM snapshot as JSON

submit_enrichment_framework

Submit framework detection data only.

Parameter Type Required Description
url string Yes The page URL
framework object Yes { name, version?, confidence? }

submit_enrichment_network

Submit network request data. Discovered URLs are offered to the crawl engine.

Parameter Type Required Description
url string Yes The page URL
networkRequests array Yes Array of { url, method, status, type }

submit_enrichment_console

Submit console log data.

Parameter Type Required Description
url string Yes The page URL
consoleLogs array Yes Array of { level, message, timestamp }

submit_enrichment_dom

Submit a DOM snapshot.

Parameter Type Required Description
url string Yes The page URL
domSnapshotJSON string Yes DOM snapshot as JSON

Observations and findings (4 tools)

get_observations

Query the append-only observation log.

Parameter Type Required Description
host string No Filter by hostname
op string No Filter by operation type
source string No extension, webkit, agent
since number No Unix timestamp
limit integer No Max entries

get_observation

Look up a single observation or finding by ID.

Parameter Type Required Description
id string Yes Observation ID

create_finding

Create a curated finding with evidence.

Parameter Type Required Description
title string Yes Finding title
url string No Related URL
evidence array No Observation IDs
synthesis string No Summary analysis
confidence string No Confidence level
category string No Finding category

get_findings

Returns curated findings.

Parameter Type Required Description
host string No Filter by hostname
limit integer No Max entries

Composite analysis (3 tools)

analyze_page

Composite: trigger capture + poll enrichment with backoff + return unified evidence record.

Parameter Type Required Description
url string Yes URL to analyze

Returns: { url, timestamp, captureTriggered, enrichment, enrichmentStatus, crawlStatus }

Timeout: 60s


compare_pages

Composite: run analyze_page on two URLs, return structured comparison.

Parameter Type Required Description
urlA string Yes First URL
urlB string Yes Second URL

Returns: { siteA, siteB, comparisonSummary }

Timeout: 120s


synthesize_openapi

Composite: chain traffic analysis + schema extraction + OpenAPI 3.0.3 YAML export.

Parameter Type Required Description
exhaustDir string No Path to flows.jsonl directory
title string No API title
serverURL string No Base server URL

Intelligence (5 tools)

get_tech_stack

Returns detected technologies with name, categories, confidence, version, and detection signals.

Parameters: None.


get_seo_findings

Returns SEO analysis: title, meta description, headings, canonical, word count, readability.

Parameter Type Required Description
severity string No Filter by severity
category string No Filter by category

get_design_intel

Returns design system data: colors, typography, spacing, breakpoints, components.

Parameters: None.


get_keyword_intel

Returns keyword analysis: top keywords by frequency, co-occurring groups, density.

Parameters: None.


get_duplicate_content

Returns duplicate content detection: exact duplicates and near-duplicates with similarity scores.

Parameters: None.


Vault (5 tools)

vault_list_domains

List all domains with stored auth sessions.

Parameters: None.


vault_get_session

Retrieve a stored auth session for a domain. Audit-logged.

Parameter Type Required Description
domain string Yes Domain (e.g. example.com)

Returns: { cookies, userAgent, isExpired }


vault_mark_expired

Mark a stored session as expired without deleting it.

Parameter Type Required Description
domain string Yes Domain

vault_delete

Delete a stored auth session permanently.

Parameter Type Required Description
domain string Yes Domain

vault_request_login

Open the auth browser in Crawlio so you can log in. The session is captured and stored in the vault.

Parameter Type Required Description
domain string Yes Domain to authenticate against
loginURL string Yes Login page URL

OCR (1 tool)

extract_text_from_image

Run Vision OCR on a local image file. Does not require Crawlio.app to be running.

Parameter Type Required Description
path string Yes Absolute path to the image
languages array No Language codes (e.g. ["en-US"])
recognitionLevel string No accurate (default) or fast

Supported formats: PNG, JPEG, TIFF, BMP, WebP. SVG and GIF are not supported.


Timeout reference

Category Timeout
Read-only (get_, list_) 5s
Control (start, stop, settings) 15s
Enrichment (submit_*) 10s
Capture (trigger_capture) 60s
Export (export_site, extract_site) 120s
Composite (analyze_page) 60s
Composite (compare_pages, synthesize_openapi) 120s
OCR (extract_text_from_image) 15s

Tool annotations

All tools carry MCP annotations:

Annotation Meaning Applies to
readOnlyHint: true Does not modify state All get_* and list_* tools
destructiveHint: true Irreversible action stop_crawl, delete_project
idempotentHint: true Safe to repeat pause_crawl, resume_crawl, update_settings
openWorldHint: true Interacts with external systems start_crawl, recrawl_urls, trigger_capture, vault_request_login

Section 3: Other pillars

Chrome Extension (Pillar 1)

~114 tools for live browser automation via CDP. Requires the Crawlio for Chrome extension.

Covers: tab management, navigation, screenshots, DOM interaction, network interception, console capture, accessibility tree, performance metrics, security state, and framework detection.

See Browser Agent Tools for the full reference.

Headless Agent (Pillar 2)

~199 tools across 5 tiers:

Tier What it covers
Browser Headless Chromium automation
Converter Format conversion (PDF, screenshots)
mgrep/RE Pattern matching
Interceptor Network interception
Core File I/O, orchestration

The headless agent runs without a visible browser. It handles background tasks and is the fallback when no Chrome tab is connected.

Next steps

  • MCP Overview: the 3-pillar architecture
  • Code Mode: use search_api + execute_api instead of 49 individual tools
  • Resources: MCP resources, prompts, and skills
  • JIT Context: how the aggregator loads context on demand
© 2026 Crawlio. All rights reserved.