CrawlioCrawlio Docs

Framework Detection

Overview

Crawlio identifies the web technologies powering every site it crawls. Detection runs automatically. No configuration needed. Results are available in the app UI, exports, and MCP tool responses.

Detection layers

Four layers run in sequence. Each catches technologies the previous layers miss.

Layer Method Scope
1. Quick detection HTML string matching 5 frameworks, per-host cached
2. Static detection 6 signal types with confidence scoring 59 technologies, first page per host
3. WordPress detection Weighted scoring with version extraction WordPress + 3 page builders
4. Runtime detection JavaScript injection in off-screen WebKit 13 frameworks via window globals and DOM state

Results from all layers are merged. When the same framework is detected by multiple layers, signals are unioned and the highest-confidence detection wins.

Signal types and weights (Layer 2)

The static detector examines 6 signal types, each with weighted confidence:

Signal Weight Source
HTTP Headers 0.30 Response headers (e.g., x-nextjs-cache, server: cloudflare)
Meta Generator 0.30 <meta name="generator"> tag
HTML Patterns 0.20 Data attributes, class names, IDs in body content
Script URLs 0.15 <script src> values
DOM Patterns 0.15 Data attributes and element selectors
Cookies 0.20 Set-Cookie header names

A framework is reported when cumulative confidence exceeds the 0.15 threshold. A single strong signal (header or meta at 0.30) is enough. Multiple weak signals accumulate. Maximum confidence is capped at 1.0.

Detected technologies (59 total)

JS Runtimes and Libraries (19)

Technology Key Detection Signals
Next.js Headers: x-nextjs-cache. HTML: /_next/, __NEXT_DATA__. Implies: React
Nuxt Meta: Nuxt. HTML: __nuxt, /_nuxt/. Implies: Vue
Angular HTML: ng-version=, ng-app. Scripts: angular.min.js
SvelteKit HTML: __sveltekit, data-sveltekit. Implies: Svelte
Remix HTML: __remix, data-remix. Implies: React
Qwik HTML: q:container, q:base
SolidJS HTML: data-hk=. Scripts: solid-js
Stencil HTML: s-id=. Scripts: stencil
Marko HTML: data-marko-key
React HTML: data-reactroot. Scripts: react.min.js
Vue HTML: data-v-[a-f0-9]. Scripts: vue.min.js
Svelte HTML: <svelte:, __svelte
Alpine.js HTML: x-data=, x-bind:, x-show
HTMX HTML: hx-get=, hx-post=, hx-swap=
Turbo HTML: data-turbo=, <turbo-frame
Lit Scripts: lit.min.js, @lit/
Preact Scripts: preact.min.js
jQuery Scripts: jquery.min.js
Backbone.js Scripts: backbone.min.js. Implies: Underscore.js

Static Site Generators (8)

Technology Key Detection Signals
Astro Meta: ^Astro. HTML: astro-island, /_astro/
Gatsby Headers: x-gatsby-cache. HTML: ___gatsby. Implies: React
Hugo Meta: ^hugo
Jekyll Meta: ^jekyll. HTML: begin jekyll seo tag
Hexo Meta: ^hexo
Docusaurus Meta: ^docusaurus. Implies: React
VuePress Meta: ^vuepress. Implies: Vue
Eleventy Meta: ^eleventy

CMS, E-commerce, and Builders (18)

Technology Key Detection Signals
WordPress Meta: WordPress. HTML: /wp-content/, /wp-includes/
Shopify Headers: powered-by: shopify. Cookies: _shopify_s
WooCommerce HTML: class="woocommerce". Implies: WordPress
Webflow Meta: Webflow. HTML: data-wf-site
Wix Headers: x-wix-renderer-server. Implies: React
Framer Headers: server: ^framer/. Implies: React
Squarespace Headers: server: squarespace
Drupal Headers: x-drupal-cache. Meta: ^drupal
Magento HTML: text/x-magento-init. Cookies: mage-cache-storage
Joomla Headers: x-content-encoded-by: joomla!
Ghost Headers: x-ghost-cache-status
Bubble Headers: x-bubble-capacity-limit
BigCommerce Scripts: bigcommerce
PrestaShop Headers: powered-by: ^prestashop$
OpenCart Cookies: ocsessid
Hydrogen Headers: powered-by: hydrogen. Implies: Shopify, React
Tilda Scripts: tildacdn, tilda.ws
Duda Scripts: dd-cdn.multiscreensite.com

CSS Frameworks (2)

Technology Key Detection Signals
Bootstrap HTML: bootstrap.min.css
Tailwind CSS HTML: tailwind.min.css

Backend and Hosting (9)

Technology Key Detection Signals
Vercel Headers: server: vercel, x-vercel-cache
Netlify Headers: server: ^netlify, x-nf-request-id
Cloudflare Headers: cf-cache-status, server: ^cloudflare$
Laravel Cookies: laravel_session
Django HTML: csrfmiddlewaretoken
Ruby on Rails HTML: csrf-param.*authenticity_token
GoDaddy Website Builder Meta: go daddy website builder
Weebly Scripts: cdn*.editmysite.com
Gridsome Meta: ^gridsome. Implies: Vue

WordPress page builders

When WordPress is confirmed, Crawlio further identifies the page builder:

Page Builder Signals
Elementor HTML: elementor-widget- classes, elementorFrontend
DIVI HTML: et_pb_ classes, /wp-content/themes/Divi/
Beaver Builder HTML: fl-builder classes, /wp-content/plugins/bb-plugin/

Runtime detection (Layer 4)

The runtime layer loads pages in an off-screen WebKit view and runs JavaScript to probe window globals and DOM state that only exist after client-side hydration.

Framework Runtime Probes
Next.js window.__NEXT_DATA__, window.__next_f, #__next
Nuxt window.__NUXT__, window.__nuxt, #__nuxt
React window.__REACT_DEVTOOLS_GLOBAL_HOOK__, [data-reactroot]
Vue window.__vue_app__, window.__VUE__
Angular [ng-version] (extracts version from attribute)
Svelte [class*='svelte-'] CSS classes
SvelteKit window.__sveltekit
Remix window.__remixContext
Gatsby window.___gatsby
Astro <astro-island> custom elements
WordPress link[href*='wp-content']
Webflow [data-wf-site] data attribute

Runtime detection also extracts SSR mode (hybrid vs static), framework subtypes, and build IDs when available.

Detection sources

When both static and runtime detection find the same framework, results are merged:

Field Merge rule
Confidence Runtime confidence wins (it had more evidence)
Signals Union of all signals from both sources
Version Runtime version preferred, static as fallback
Category Static category preserved

Each detection is tagged with its source: static, dynamic, or merged.

Implied technologies

Some frameworks imply the presence of underlying technologies. These are included automatically:

  • Next.js, Gatsby, Remix, Docusaurus, Wix, Framer, Hydrogen --> React
  • Nuxt, VuePress, Gridsome --> Vue
  • SvelteKit --> Svelte
  • WooCommerce --> WordPress
  • Backbone.js --> Underscore.js

How detection affects crawl behavior

When frameworks are detected, Crawlio adjusts automatically:

  • Next.js: Enables RSC extraction, discovers /_next/static/ chunk URLs from webpack manifests
  • Astro: Extracts component-url and renderer-url attributes, discovers /_astro/ paths
  • Svelte/SvelteKit: Extracts /_app/immutable/ chunk paths
  • WordPress: Strips ?ver= cache busters, suggests excluding /wp-admin/ and /wp-login.php
  • SPA shells detected: Triggers WebKit re-render when enableJSRendering is active

Where detection data appears

  • App UI: Framework badge in the downloads list and site inspector panel
  • Exports: deploy.json enrichment data, crawl-manifest.json metadata
  • MCP: get_crawl_status and get_enrichment tool responses
  • CLI: --format json output includes framework detections

Next steps

© 2026 Crawlio. All rights reserved.