Framework Detection
Overview
Crawlio identifies the web technologies powering every site it crawls. Detection runs automatically. No configuration needed. Results are available in the app UI, exports, and MCP tool responses.
Detection layers
Four layers run in sequence. Each catches technologies the previous layers miss.
| Layer | Method | Scope |
|---|---|---|
| 1. Quick detection | HTML string matching | 5 frameworks, per-host cached |
| 2. Static detection | 6 signal types with confidence scoring | 59 technologies, first page per host |
| 3. WordPress detection | Weighted scoring with version extraction | WordPress + 3 page builders |
| 4. Runtime detection | JavaScript injection in off-screen WebKit | 13 frameworks via window globals and DOM state |
Results from all layers are merged. When the same framework is detected by multiple layers, signals are unioned and the highest-confidence detection wins.
Signal types and weights (Layer 2)
The static detector examines 6 signal types, each with weighted confidence:
| Signal | Weight | Source |
|---|---|---|
| HTTP Headers | 0.30 | Response headers (e.g., x-nextjs-cache, server: cloudflare) |
| Meta Generator | 0.30 | <meta name="generator"> tag |
| HTML Patterns | 0.20 | Data attributes, class names, IDs in body content |
| Script URLs | 0.15 | <script src> values |
| DOM Patterns | 0.15 | Data attributes and element selectors |
| Cookies | 0.20 | Set-Cookie header names |
A framework is reported when cumulative confidence exceeds the 0.15 threshold. A single strong signal (header or meta at 0.30) is enough. Multiple weak signals accumulate. Maximum confidence is capped at 1.0.
Detected technologies (59 total)
JS Runtimes and Libraries (19)
| Technology | Key Detection Signals |
|---|---|
| Next.js | Headers: x-nextjs-cache. HTML: /_next/, __NEXT_DATA__. Implies: React |
| Nuxt | Meta: Nuxt. HTML: __nuxt, /_nuxt/. Implies: Vue |
| Angular | HTML: ng-version=, ng-app. Scripts: angular.min.js |
| SvelteKit | HTML: __sveltekit, data-sveltekit. Implies: Svelte |
| Remix | HTML: __remix, data-remix. Implies: React |
| Qwik | HTML: q:container, q:base |
| SolidJS | HTML: data-hk=. Scripts: solid-js |
| Stencil | HTML: s-id=. Scripts: stencil |
| Marko | HTML: data-marko-key |
| React | HTML: data-reactroot. Scripts: react.min.js |
| Vue | HTML: data-v-[a-f0-9]. Scripts: vue.min.js |
| Svelte | HTML: <svelte:, __svelte |
| Alpine.js | HTML: x-data=, x-bind:, x-show |
| HTMX | HTML: hx-get=, hx-post=, hx-swap= |
| Turbo | HTML: data-turbo=, <turbo-frame |
| Lit | Scripts: lit.min.js, @lit/ |
| Preact | Scripts: preact.min.js |
| jQuery | Scripts: jquery.min.js |
| Backbone.js | Scripts: backbone.min.js. Implies: Underscore.js |
Static Site Generators (8)
| Technology | Key Detection Signals |
|---|---|
| Astro | Meta: ^Astro. HTML: astro-island, /_astro/ |
| Gatsby | Headers: x-gatsby-cache. HTML: ___gatsby. Implies: React |
| Hugo | Meta: ^hugo |
| Jekyll | Meta: ^jekyll. HTML: begin jekyll seo tag |
| Hexo | Meta: ^hexo |
| Docusaurus | Meta: ^docusaurus. Implies: React |
| VuePress | Meta: ^vuepress. Implies: Vue |
| Eleventy | Meta: ^eleventy |
CMS, E-commerce, and Builders (18)
| Technology | Key Detection Signals |
|---|---|
| WordPress | Meta: WordPress. HTML: /wp-content/, /wp-includes/ |
| Shopify | Headers: powered-by: shopify. Cookies: _shopify_s |
| WooCommerce | HTML: class="woocommerce". Implies: WordPress |
| Webflow | Meta: Webflow. HTML: data-wf-site |
| Wix | Headers: x-wix-renderer-server. Implies: React |
| Framer | Headers: server: ^framer/. Implies: React |
| Squarespace | Headers: server: squarespace |
| Drupal | Headers: x-drupal-cache. Meta: ^drupal |
| Magento | HTML: text/x-magento-init. Cookies: mage-cache-storage |
| Joomla | Headers: x-content-encoded-by: joomla! |
| Ghost | Headers: x-ghost-cache-status |
| Bubble | Headers: x-bubble-capacity-limit |
| BigCommerce | Scripts: bigcommerce |
| PrestaShop | Headers: powered-by: ^prestashop$ |
| OpenCart | Cookies: ocsessid |
| Hydrogen | Headers: powered-by: hydrogen. Implies: Shopify, React |
| Tilda | Scripts: tildacdn, tilda.ws |
| Duda | Scripts: dd-cdn.multiscreensite.com |
CSS Frameworks (2)
| Technology | Key Detection Signals |
|---|---|
| Bootstrap | HTML: bootstrap.min.css |
| Tailwind CSS | HTML: tailwind.min.css |
Backend and Hosting (9)
| Technology | Key Detection Signals |
|---|---|
| Vercel | Headers: server: vercel, x-vercel-cache |
| Netlify | Headers: server: ^netlify, x-nf-request-id |
| Cloudflare | Headers: cf-cache-status, server: ^cloudflare$ |
| Laravel | Cookies: laravel_session |
| Django | HTML: csrfmiddlewaretoken |
| Ruby on Rails | HTML: csrf-param.*authenticity_token |
| GoDaddy Website Builder | Meta: go daddy website builder |
| Weebly | Scripts: cdn*.editmysite.com |
| Gridsome | Meta: ^gridsome. Implies: Vue |
WordPress page builders
When WordPress is confirmed, Crawlio further identifies the page builder:
| Page Builder | Signals |
|---|---|
| Elementor | HTML: elementor-widget- classes, elementorFrontend |
| DIVI | HTML: et_pb_ classes, /wp-content/themes/Divi/ |
| Beaver Builder | HTML: fl-builder classes, /wp-content/plugins/bb-plugin/ |
Runtime detection (Layer 4)
The runtime layer loads pages in an off-screen WebKit view and runs JavaScript to probe window globals and DOM state that only exist after client-side hydration.
| Framework | Runtime Probes |
|---|---|
| Next.js | window.__NEXT_DATA__, window.__next_f, #__next |
| Nuxt | window.__NUXT__, window.__nuxt, #__nuxt |
| React | window.__REACT_DEVTOOLS_GLOBAL_HOOK__, [data-reactroot] |
| Vue | window.__vue_app__, window.__VUE__ |
| Angular | [ng-version] (extracts version from attribute) |
| Svelte | [class*='svelte-'] CSS classes |
| SvelteKit | window.__sveltekit |
| Remix | window.__remixContext |
| Gatsby | window.___gatsby |
| Astro | <astro-island> custom elements |
| WordPress | link[href*='wp-content'] |
| Webflow | [data-wf-site] data attribute |
Runtime detection also extracts SSR mode (hybrid vs static), framework subtypes, and build IDs when available.
Detection sources
When both static and runtime detection find the same framework, results are merged:
| Field | Merge rule |
|---|---|
| Confidence | Runtime confidence wins (it had more evidence) |
| Signals | Union of all signals from both sources |
| Version | Runtime version preferred, static as fallback |
| Category | Static category preserved |
Each detection is tagged with its source: static, dynamic, or merged.
Implied technologies
Some frameworks imply the presence of underlying technologies. These are included automatically:
- Next.js, Gatsby, Remix, Docusaurus, Wix, Framer, Hydrogen --> React
- Nuxt, VuePress, Gridsome --> Vue
- SvelteKit --> Svelte
- WooCommerce --> WordPress
- Backbone.js --> Underscore.js
How detection affects crawl behavior
When frameworks are detected, Crawlio adjusts automatically:
- Next.js: Enables RSC extraction, discovers
/_next/static/chunk URLs from webpack manifests - Astro: Extracts
component-urlandrenderer-urlattributes, discovers/_astro/paths - Svelte/SvelteKit: Extracts
/_app/immutable/chunk paths - WordPress: Strips
?ver=cache busters, suggests excluding/wp-admin/and/wp-login.php - SPA shells detected: Triggers WebKit re-render when
enableJSRenderingis active
Where detection data appears
- App UI: Framework badge in the downloads list and site inspector panel
- Exports:
deploy.jsonenrichment data,crawl-manifest.jsonmetadata - MCP:
get_crawl_statusandget_enrichmenttool responses - CLI:
--format jsonoutput includes framework detections
Next steps
- AI Enrichment: Browser capture, OCR, and the enrichment store
- Settings Reference: Enable JS rendering for SPAs
- Export Formats: How detection data flows into exports