JIT Context
The problem
Crawlio has ~362 tools across 3 MCP servers. Loading all of them into your AI's context window would consume over 15,000 tokens on schema alone, before any work begins. Attention degrades as tool count grows. Your AI spends reasoning capacity on tool selection instead of task execution.
The solution
The aggregator never dumps all ~362 tool schemas into the context window. Instead, 5 meta-tools are always present. When your AI calls crawlio_discover, it gets back only the tool schemas relevant to its current task. Everything else stays off the context window.
Without JIT context:
362 tool schemas loaded → ~15,000 tokens consumed → attention diluted
With JIT context:
5 meta-tool schemas loaded → ~800 tokens consumed → tools loaded on demandHow discovery works
Your AI calls crawlio_discover with a description of what it needs:
crawlio_discover("I need to crawl a site and export as WARC")
--> Returns schemas for: start_crawl, get_crawl_status, export_site, get_export_statuscrawlio_discover("I need to capture a page and check its framework")
--> Returns schemas for: trigger_capture, get_enrichment, get_tech_stackThe aggregator searches across all 3 pillars and returns only matching tool schemas. Your AI learns the parameters on demand, then calls crawlio_call or crawlio_do to execute.
How routing works
crawlio_call
Routes a tool call to the correct pillar:
crawlio_call("start_crawl", { url: "https://example.com" })
--> Routed to Pillar 3 (Crawlio App)
crawlio_call("browser_navigate", { url: "https://example.com" })
--> Routed to Pillar 1 (Chrome Extension) or Pillar 2 (Headless Agent)Your AI does not need to know which pillar owns a tool. The aggregator handles routing.
crawlio_do
Handles high-level tasks with automatic pillar selection:
crawlio_do("capture this page")
--> Uses Chrome Extension if a tab is connected, Headless Agent if notcrawlio_do picks the best pillar based on current session state. If the Chrome extension has an active tab, it uses that. If not, it falls back to the headless agent. For tasks that only Pillar 3 handles (crawl control, export, vault), it routes there directly.
crawlio_cortex
Queries intelligence data across pillar boundaries:
crawlio_cortex("what technologies does this site use?")
--> Combines tech stack data from enrichment, browser detection, and crawl analysiscrawlio_consult
Multi-pillar consultation for complex tasks that need data from multiple sources.
Routing rules
Session-sticky browser routing
The aggregator prefers the Chrome extension when a tab is connected. If no tab is connected, browser commands go to the headless agent. This is automatic. Your AI does not choose.
| Browser connected? | Browser tools route to |
|---|---|
| Yes (tab connected) | Pillar 1 (Chrome Extension) |
| No | Pillar 2 (Headless Agent) |
Tool deduplication
Pillars 1 and 2 both have browser automation tools. The aggregator exposes overlapping tools once and routes per session state. Your AI sees browser_navigate, not pillar1_browser_navigate and pillar2_browser_navigate.
Crawlio App tools are additive
Pillar 3 tools (crawl control, export, vault, intelligence, OCR) do not overlap with Pillar 1 or 2. They are always routed to Pillar 3.
Pillar resolution order
For any tool call, the aggregator resolves in this order:
- Exact match in Pillar 3. If the tool name matches a Crawlio App tool, route there.
- Browser tool with active session. If the tool is a browser command and a Chrome tab is connected, route to Pillar 1.
- Browser tool without session. Route to Pillar 2 (Headless Agent).
- Unknown tool. Return an error with available alternatives from
crawlio_discover.
Skills reduce context further
Skills encode domain knowledge that your AI would otherwise need to figure out. Instead of your AI discovering which tools to use, composing them, and handling edge cases, a skill provides a tested workflow.
Five skills are installed by crawlio-mcp init:
| Skill | What it does |
|---|---|
crawlio-mcp |
Full tool reference with parameters and examples |
crawl-site |
Intelligent crawl workflow: start, monitor, adjust, export |
audit-site |
Multi-phase site audit: crawl, capture, enrich, analyze, report |
observe |
Query the observation log with filters |
finding |
Create evidence-backed findings from observations |
Skills work with both the aggregator and direct Pillar 3 access. They reduce what your AI needs to reason about by providing pre-built sequences.
The browser-side execution runtime
On the browser side, JIT context goes deeper. The execute tool runs JavaScript in a sandbox with framework-aware instrumentation injected at runtime.
Framework detection
Before your AI's code executes, the runtime probes the browser for framework signatures. Based on what it finds, it constructs a polymorphic smart object with the appropriate namespace methods:
| Tier | Frameworks |
|---|---|
| Core | React, Vue, Angular, Svelte |
| Meta-frameworks | Next.js, Nuxt, Remix, Gatsby |
| E-commerce | Shopify, WooCommerce |
| Content systems | WordPress, Drupal, Laravel, Django |
| Libraries | Redux, Alpine.js, jQuery |
If the target page runs React, your AI's script gains smart.react with methods to read the React devtools hook, query component trees, and extract rendered state. If the page is a Shopify storefront, smart.shopify appears with cart state and shop configuration. The runtime detects the environment and shapes the SDK to match.
Execution context
Scripts run via execute land in a scope with:
| Variable | Description |
|---|---|
bridge |
Send CDP commands to the browser |
crawlio |
HTTP client for Crawlio App endpoints |
smart |
7 core + 17 higher-order methods + up to 17 framework namespaces |
sleep |
Async wait (max 30s per call) |
ocrScreenshot |
macOS Vision OCR |
Actionability checks
When your AI calls smart.click(selector), the runtime does not immediately dispatch a click. It polls until the element is ready: it must exist, have non-zero dimensions, be CSS-visible, be enabled, and not be obscured by overlapping elements. If the element is not ready within the timeout budget, the runtime returns a structured error explaining why.
Persistent browser state
The runtime maintains a persistent connection to the browser. If a script fails, the browser is still in the exact same state. Your AI reads the error, adjusts its approach, and calls execute again against the same DOM. No context is lost between cycles.
Direct access vs aggregator
You can use Crawlio MCP in two ways:
| Approach | What your AI sees | Best for |
|---|---|---|
| Aggregator | 5 meta-tools, JIT context loading | Multi-pillar workflows, browser + crawl |
| Direct Pillar 3 | 6 code-mode tools or 49 full-mode tools | Crawl-only workflows, simpler setup |
Both approaches give full access to Pillar 3 capabilities. The aggregator adds cross-pillar routing and browser access.
Next steps
- MCP Overview: the 3-pillar architecture
- Code Mode: direct Pillar 3 access with 6 tools
- Method Mode: composite tools for multi-step workflows
- Tool Reference: all 49 Crawlio App tools
- Browser Agent: Chrome automation setup