MCP Tools Reference
Overview
Crawlio exposes ~362 tools across 3 pillars. This page covers:
- Aggregator meta-tools (5 tools) that unify all pillars
- Crawlio App tools (49 tools in Pillar 3) for crawl control, export, intelligence, and vault
- References to the other pillar tool sets
In code mode (the default), the 49 Pillar 3 tools are replaced by 6 code-mode tools:
search_api,execute_api,trigger_capture,analyze_page,compare_pages, andextract_text_from_image. Every tool below maps to an endpoint accessible throughexecute_api.
Section 1: Aggregator meta-tools
These 5 tools are what your AI sees when using the aggregator. They route across all 3 pillars.
crawlio_discover
List available tools across all pillars. Returns only schemas matching the current task.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Describe what you need (e.g. "crawl and export", "browser automation") |
Returns: Array of matching tool schemas with names, descriptions, and parameters.
crawlio_call
Route a tool call to the correct pillar.
| Parameter | Type | Required | Description |
|---|---|---|---|
tool |
string | Yes | Tool name |
args |
object | No | Tool arguments |
Returns: The tool's response, routed to the appropriate pillar.
crawlio_do
Execute a high-level task with automatic pillar selection.
| Parameter | Type | Required | Description |
|---|---|---|---|
task |
string | Yes | Natural language task description |
Returns: Task result. The aggregator picks the best pillar based on session state.
crawlio_cortex
Query intelligence data across pillar boundaries.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Intelligence query |
Returns: Combined data from enrichment, browser detection, and crawl analysis.
crawlio_consult
Multi-pillar consultation for complex tasks.
| Parameter | Type | Required | Description |
|---|---|---|---|
question |
string | Yes | What you need help with |
Returns: Coordinated response from multiple pillars.
Section 2: Crawlio App tools (Pillar 3)
49 tools across 10 categories.
Crawl monitoring (7 tools)
get_crawl_status
Returns current engine state and progress counters.
| Parameter | Type | Required | Description |
|---|---|---|---|
since |
integer | No | Returns "no changes" when sequence has not advanced past this value |
Returns: { engineState, seedURL, seq, progress: { totalDiscovered, downloaded, failed, queued, localized }, enrichment: { pagesEnriched, frameworksDetected, ... } }
get_crawl_logs
Returns recent log entries with optional filtering.
| Parameter | Type | Required | Description |
|---|---|---|---|
category |
string | No | engine, download, parser, localizer, network, ui |
level |
string | No | debug, info, default, error, fault |
limit |
integer | No | Max entries (default: 50) |
get_errors
Returns error-level and fault-level log entries only.
| Parameter | Type | Required | Description |
|---|---|---|---|
limit |
integer | No | Max entries (default: 50) |
get_downloads
Returns all download items with status, size, and content type.
| Parameter | Type | Required | Description |
|---|---|---|---|
status |
string | No | pending, downloading, completed, failed |
get_failed_urls
Returns only failed download items with error details.
Parameters: None.
get_site_tree
Returns an ASCII directory tree of downloaded files.
| Parameter | Type | Required | Description |
|---|---|---|---|
max_depth |
integer | No | Maximum tree depth (default: 5) |
get_crawled_urls
Returns downloaded URLs with filtering and pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
status |
string | No | completed, downloading, failed, queued |
type |
string | No | Content type substring (e.g. html) |
limit |
integer | No | Max results (default: 1000) |
offset |
integer | No | Skip first N results |
Crawl control (5 tools)
start_crawl
Start downloading a website.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | One of url/urls |
Single URL to crawl |
urls |
array | One of url/urls |
Multiple seed URLs |
destinationPath |
string | No | Local path to save files |
stop_crawl
Stop the current download. All in-flight requests are cancelled.
Parameters: None.
pause_crawl
Pause the current download. In-flight requests complete, no new requests start.
Parameters: None.
resume_crawl
Resume a paused download.
Parameters: None.
recrawl_urls
Re-inject URLs into the crawl frontier.
| Parameter | Type | Required | Description |
|---|---|---|---|
urls |
array | Yes | URLs to re-crawl |
Settings (2 tools)
get_settings
Returns current download settings and crawl policy.
Parameters: None.
Returns: { settings: {...}, policy: {...} }
update_settings
Update download settings and/or crawl policy via merge patch. Only works when the engine is idle.
| Parameter | Type | Required | Description |
|---|---|---|---|
settings |
object | No | Download settings fields to merge |
policy |
object | No | Crawl policy fields to merge |
Projects (5 tools)
list_projects
List all saved projects.
Parameters: None.
Returns: Array of projects with id, name, seedURL, createdAt.
get_project
Get full details for a saved project.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Project UUID |
save_project
Save the current project state.
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | No | Project name (auto-generated if omitted) |
load_project
Load a saved project, restoring settings and state.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Project UUID |
delete_project
Delete a saved project.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Project UUID |
Export and extraction (4 tools)
export_site
Start an asynchronous export of the downloaded site.
| Parameter | Type | Required | Description |
|---|---|---|---|
format |
string | Yes | folder, zip, singleHTML, warc, extracted, deploy |
destinationPath |
string | Yes | Where to write the export |
warcConfiguration |
object | No | WARC options: { compressionEnabled, maxFileSize, cdxEnabled, dedupEnabled } |
Poll get_export_status to track progress.
get_export_status
Returns the current export state and progress.
Parameters: None.
Returns: { state, format, progress, path, error }
extract_site
Run the content extraction pipeline on a completed crawl.
| Parameter | Type | Required | Description |
|---|---|---|---|
destinationPath |
string | No | Output directory |
Poll get_extraction_status to track progress.
get_extraction_status
Returns the current extraction state.
Parameters: None.
Returns: { state, phase, progress, totalPages, totalAssets }
Enrichment (8 tools)
trigger_capture
Trigger a WebKit runtime capture for a URL. Runs framework detection JS, intercepts network requests, captures console logs, and takes a DOM snapshot.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | URL to capture |
get_enrichment
Returns browser enrichment data for a URL or all URLs.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | No | Specific URL (omit for all) |
Returns: Enrichment objects with framework, networkRequests, consoleLogs, domSnapshotJSON.
get_structured_data
Returns JSON-LD, HTML tables, microdata, and RDFa extracted from crawled pages.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | No | Specific URL (omit for site-wide aggregate) |
submit_enrichment_bundle
Submit a complete enrichment bundle with all data types.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The page URL |
framework |
object | No | { name, version?, confidence? } |
networkRequests |
array | No | Captured network requests |
consoleLogs |
array | No | Console output entries |
domSnapshotJSON |
string | No | DOM snapshot as JSON |
submit_enrichment_framework
Submit framework detection data only.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The page URL |
framework |
object | Yes | { name, version?, confidence? } |
submit_enrichment_network
Submit network request data. Discovered URLs are offered to the crawl engine.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The page URL |
networkRequests |
array | Yes | Array of { url, method, status, type } |
submit_enrichment_console
Submit console log data.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The page URL |
consoleLogs |
array | Yes | Array of { level, message, timestamp } |
submit_enrichment_dom
Submit a DOM snapshot.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The page URL |
domSnapshotJSON |
string | Yes | DOM snapshot as JSON |
Observations and findings (4 tools)
get_observations
Query the append-only observation log.
| Parameter | Type | Required | Description |
|---|---|---|---|
host |
string | No | Filter by hostname |
op |
string | No | Filter by operation type |
source |
string | No | extension, webkit, agent |
since |
number | No | Unix timestamp |
limit |
integer | No | Max entries |
get_observation
Look up a single observation or finding by ID.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Observation ID |
create_finding
Create a curated finding with evidence.
| Parameter | Type | Required | Description |
|---|---|---|---|
title |
string | Yes | Finding title |
url |
string | No | Related URL |
evidence |
array | No | Observation IDs |
synthesis |
string | No | Summary analysis |
confidence |
string | No | Confidence level |
category |
string | No | Finding category |
get_findings
Returns curated findings.
| Parameter | Type | Required | Description |
|---|---|---|---|
host |
string | No | Filter by hostname |
limit |
integer | No | Max entries |
Composite analysis (3 tools)
analyze_page
Composite: trigger capture + poll enrichment with backoff + return unified evidence record.
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | URL to analyze |
Returns: { url, timestamp, captureTriggered, enrichment, enrichmentStatus, crawlStatus }
Timeout: 60s
compare_pages
Composite: run analyze_page on two URLs, return structured comparison.
| Parameter | Type | Required | Description |
|---|---|---|---|
urlA |
string | Yes | First URL |
urlB |
string | Yes | Second URL |
Returns: { siteA, siteB, comparisonSummary }
Timeout: 120s
synthesize_openapi
Composite: chain traffic analysis + schema extraction + OpenAPI 3.0.3 YAML export.
| Parameter | Type | Required | Description |
|---|---|---|---|
exhaustDir |
string | No | Path to flows.jsonl directory |
title |
string | No | API title |
serverURL |
string | No | Base server URL |
Intelligence (5 tools)
get_tech_stack
Returns detected technologies with name, categories, confidence, version, and detection signals.
Parameters: None.
get_seo_findings
Returns SEO analysis: title, meta description, headings, canonical, word count, readability.
| Parameter | Type | Required | Description |
|---|---|---|---|
severity |
string | No | Filter by severity |
category |
string | No | Filter by category |
get_design_intel
Returns design system data: colors, typography, spacing, breakpoints, components.
Parameters: None.
get_keyword_intel
Returns keyword analysis: top keywords by frequency, co-occurring groups, density.
Parameters: None.
get_duplicate_content
Returns duplicate content detection: exact duplicates and near-duplicates with similarity scores.
Parameters: None.
Vault (5 tools)
vault_list_domains
List all domains with stored auth sessions.
Parameters: None.
vault_get_session
Retrieve a stored auth session for a domain. Audit-logged.
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain (e.g. example.com) |
Returns: { cookies, userAgent, isExpired }
vault_mark_expired
Mark a stored session as expired without deleting it.
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain |
vault_delete
Delete a stored auth session permanently.
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain |
vault_request_login
Open the auth browser in Crawlio so you can log in. The session is captured and stored in the vault.
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain to authenticate against |
loginURL |
string | Yes | Login page URL |
OCR (1 tool)
extract_text_from_image
Run Vision OCR on a local image file. Does not require Crawlio.app to be running.
| Parameter | Type | Required | Description |
|---|---|---|---|
path |
string | Yes | Absolute path to the image |
languages |
array | No | Language codes (e.g. ["en-US"]) |
recognitionLevel |
string | No | accurate (default) or fast |
Supported formats: PNG, JPEG, TIFF, BMP, WebP. SVG and GIF are not supported.
Timeout reference
| Category | Timeout |
|---|---|
| Read-only (get_, list_) | 5s |
| Control (start, stop, settings) | 15s |
| Enrichment (submit_*) | 10s |
| Capture (trigger_capture) | 60s |
| Export (export_site, extract_site) | 120s |
| Composite (analyze_page) | 60s |
| Composite (compare_pages, synthesize_openapi) | 120s |
| OCR (extract_text_from_image) | 15s |
Tool annotations
All tools carry MCP annotations:
| Annotation | Meaning | Applies to |
|---|---|---|
readOnlyHint: true |
Does not modify state | All get_* and list_* tools |
destructiveHint: true |
Irreversible action | stop_crawl, delete_project |
idempotentHint: true |
Safe to repeat | pause_crawl, resume_crawl, update_settings |
openWorldHint: true |
Interacts with external systems | start_crawl, recrawl_urls, trigger_capture, vault_request_login |
Section 3: Other pillars
Chrome Extension (Pillar 1)
~114 tools for live browser automation via CDP. Requires the Crawlio for Chrome extension.
Covers: tab management, navigation, screenshots, DOM interaction, network interception, console capture, accessibility tree, performance metrics, security state, and framework detection.
See Browser Agent Tools for the full reference.
Headless Agent (Pillar 2)
~199 tools across 5 tiers:
| Tier | What it covers |
|---|---|
| Browser | Headless Chromium automation |
| Converter | Format conversion (PDF, screenshots) |
| mgrep/RE | Pattern matching |
| Interceptor | Network interception |
| Core | File I/O, orchestration |
The headless agent runs without a visible browser. It handles background tasks and is the fallback when no Chrome tab is connected.
Next steps
- MCP Overview: the 3-pillar architecture
- Code Mode: use
search_api+execute_apiinstead of 49 individual tools - Resources: MCP resources, prompts, and skills
- JIT Context: how the aggregator loads context on demand