Anti-bot detection in 2026: what actually blocks scrapers

A technical breakdown of how Cloudflare, DataDome, Akamai, PerimeterX, and Kasada actually detect bots in 2026 — and which tier of scraping tool beats each one. Real numbers from 2.3M production scrapes.

Curtis VaughanApr 20, 20269 min read

Anti-bot detection in 2026: what actually blocks scrapers

Five companies gate 60% of the web's anti-bot infrastructure: Cloudflare, DataDome, Akamai, PerimeterX (now Human), and Kasada. Each one blocks scrapers differently. Each one has a known counter-tier. Most scraping tutorials online treat them as "the same problem" — they aren't.

This post is a technical breakdown of what each provider actually checks, which scraping tier beats each one, and the production success rates from 2.3 million requests our router served between March 15 and April 15, 2026. We publish failures where they happened. The numbers below are not projections.

The short version

If you skip the rest of this post, here are the tier decisions that cover 95% of real-world scraping:

Defense	What to use	Success rate (our data)
No defense / plain Apache / Nginx	Plain HTTP (Node `fetch`, Python `requests`)	99%+
Cloudflare Bot Fight Mode	curl_cffi with `impersonate="chrome131"`	94%
Cloudflare Managed Challenge	Stealth Playwright + residential proxy	78%
Cloudflare "I'm Under Attack"	Camoufox + residential proxy + CAPTCHA solver	62%
DataDome	Camoufox + residential proxy	88%
Akamai Bot Manager	Camoufox + residential proxy	91%
PerimeterX / Human	Camoufox + residential proxy	84%
Kasada	Camoufox + residential proxy + challenge solver	54%

The 54% on Kasada is embarrassing and we're working on it. Published so you know before you spend credits.

How the five providers actually work

Cloudflare — the three modes most engineers confuse

Cloudflare has three bot-detection modes customers enable in cascading order, each catching different scrapers.

Bot Fight Mode (free tier, always on) runs before the HTTP response is even generated. It inspects the TLS ClientHello during the handshake and compares your JA4 fingerprint against a list of known non-browser signatures. Node's default TLS (from the tls module) has a JA4 hash that appears on Cloudflare's reject-list. Your request gets a 403 in under 10 milliseconds — before any application code runs.

This is cheap to bypass. curl_cffi with impersonate="chrome131" replays a real Chrome TLS handshake at the C level. The JA4 hash matches Chrome's, the handshake completes, and Cloudflare passes the request through to the origin. Our production data: 94% of Bot-Fight-Mode-protected sites succeed on curl_cffi. The 6% that fail are sites that chain Bot Fight Mode with Managed Challenge.

Managed Challenge runs JavaScript in the browser to score behavioral signals. The key check: did you load a JS runtime, execute a specific WebAssembly module, and submit a token within the expected window? Pure HTTP requests can't pass because there's no JS to execute. You need a headless browser.

Stealth Playwright beats Managed Challenge 78% of the time in our data. The 22% it fails are sites where Cloudflare has dialed the challenge difficulty higher — behavioral signals (mouse movement, timing) that stealth-playwright-extra's default patches don't reliably produce. Camoufox beats those cases 91% of the time.

"I'm Under Attack" mode adds a 5-second waiting room, a PoW challenge, and aggressive IP-based rate-limiting. 99% of scrapers die here. Our hit rate is 62% — we'd like it higher and we're honest about the number.

DataDome — behavioral scoring, not just fingerprinting

DataDome (used by Etsy, Vinted, LinkedIn Jobs, Wayfair, StubHub) runs a challenge script that takes 800ms to 2 seconds to complete. It scores around 400 signals: mouse entropy, scroll physics, font enumeration via canvas fingerprinting, touch events, WebGL vendor strings, timing of JS events.

The trick with DataDome is it doesn't always show you a challenge. On a flagged visitor, the site loads normally but every XHR gets silently null-routed or returns spoofed data. You don't know you're blocked until your scraper's data quality tanks.

Camoufox wins on DataDome because it patches Firefox at the C++ level rather than the JavaScript level. Stealth Playwright's patches show up in the fingerprint ("why does navigator.webdriver evaluate to false when Navigator.prototype.webdriver is defined?"). Camoufox doesn't have that trail.

Production hit rate: 88% at Camoufox tier, ~60% at Stealth Playwright, ~0% at pure HTTP.

Akamai Bot Manager — fingerprinting + risk scoring

Akamai's Bot Manager runs at the CDN edge. Every request gets classified into risk categories: declared-bot (good bots — Googlebot, Bingbot), unknown-bot (scraping scripts, click fraud), impersonator-bot (actively evading). Different customer policies route each category differently.

Akamai's fingerprinting is less aggressive than DataDome's but their IP reputation scoring is harsher. Even a perfect Camoufox + residential-proxy setup fails if the residential IP has been used for bot traffic recently. We see ~9% of our Camoufox requests get caught because SmartProxy's residential pool overlaps with other scraping customers' usage.

Production hit rate: 91% Camoufox + residential, ~40% Stealth Playwright + residential, ~5% plain HTTP.

PerimeterX / Human Security — "PX" token verification

PerimeterX (now Human) is used by StockX, SeatGeek, Ticketmaster, Indeed, and most major job boards. Their primary check is a PX token embedded in a cookie, generated by a challenge script that runs on every page load. The token has a ~5-minute validity and rotates.

Breaking PX requires running the challenge JS to completion — their obfuscator makes reverse-engineering slow but not impossible. Commercial solvers exist (CapSolver, 2Captcha PX module) at roughly $0.003/solve, which we surface as our +2 credit CAPTCHA surcharge.

Production hit rate: 84% at Camoufox tier (challenge runs in the real browser), ~70% at Stealth Playwright.

Kasada — WebAssembly challenge + active bot discouragement

Kasada is the hardest. They ship a WebAssembly module that runs a proof-of-work challenge plus a fingerprint capture that specifically targets common browser-automation patches. Their engineers actively reverse-engineer stealth-plugin updates and ship counter-detection within weeks.

Kasada protects (among others): Canada Goose, Hyatt, TicketFly. Camoufox alone beats ~38% of Kasada challenges. With an active challenge solver service (Kasada-specific, ~$0.008/solve) we get to 54%.

This is an active cat-and-mouse where Kasada is winning more than we are. We're honest about this because customers who are told "we beat everything" then hit a Kasada site and churn.

What tier to use — the real decision tree

The tree below matches what our router does automatically. You can override at any step.

code

Is the site behind any CDN with bot detection?
│
├── No → Plain HTTP fetch (1 credit, ~200ms)
│       [Wikipedia, most open-data portals, internal APIs]
│
└── Yes → Is the TLS fingerprint alone enough to get blocked?
         │
         ├── Yes (Cloudflare Bot Fight Mode, some WAFs) →
         │       curl_cffi JA4 impersonation (1 credit, ~200ms)
         │
         └── Needs JS execution →
             │
             ├── Cloudflare Managed Challenge / standard CAPTCHAs →
             │       Stealth Playwright + residential (3 credits + 10)
             │
             ├── DataDome / Akamai / PerimeterX → 
             │       Camoufox + residential (10 credits + 10)
             │
             └── Kasada / "I'm Under Attack" →
                     Camoufox + residential + challenge solver (10 + 10 + 2)

Two notes on this tree:

The router doesn't know the CDN ahead of time. It tries the cheapest tier, watches for block signals, and writes the outcome to Postgres. Second request to the same domain skips ahead. So the tree above is what happens the first time you hit a new domain. After that, cached routing makes the decision in zero extra work.
Credit costs are stacked. A stealth Playwright request with residential proxy is 3 + 10 = 13 credits. A Camoufox request with proxy + CAPTCHA is 10 + 10 + 2 = 22 credits. We flag this clearly in the response so your billing metrics match.

The signal nobody publishes: false-positive rate

Here's a data point competitors don't share. When our router picks a tier that should work and the site STILL blocks us, what's the false-positive rate?

From our March 2026 logs:

Tier the router chose	Outcome: success	Outcome: blocked	False-positive rate
HTTP (1 credit)	1,594,312	18,003	1.1%
JA4 HTTP (1 credit)	478,091	32,844	6.4%
Stealth Playwright (3 credits)	167,223	12,118	6.8%
Camoufox (10 credits)	38,912	4,281	9.9%

The router is accurate enough at HTTP tier (99% of "I said HTTP would work" actually does). At Camoufox, almost 10% of the time we pick that tier and still get blocked — usually because the specific residential IP was already burned.

This is where the auto-escalation saves us. A Camoufox miss retries with a different proxy session; the second attempt catches another 70% of those failures.

Which defenses are getting harder, which are getting easier

Three defenses are escalating in 2026:

Kasada — shipped four detection updates in Q1. Their ML model is actively classifying Camoufox.
Cloudflare's "super strict" Managed Challenge — quietly rolled out in late 2025. Looks identical to standard Managed Challenge but has a 40% lower pass rate.
DataDome's timing fingerprint — they started checking the timing distribution of JS events in February. Naively-throttled scrapers now look more suspicious than unthrottled ones.

Two defenses are getting easier:

reCAPTCHA v3 score gating — the ML model is public enough that commercial solvers get 96%+ scores now. Three years ago this was harder.
Imperva (Incapsula) — their bot detection hasn't had a significant update since mid-2024. Stealth Playwright + residential beats it 95%+.

What this means for your scraper

If you run a scraper that hits more than 5,000 pages a month, the math strongly favors a tiered system. The naive approach — "always use a browser" — means you pay 10x for the 50% of requests that would have worked at plain HTTP. The other naive approach — "always use curl_cffi" — means you get false negatives on 30%+ of your data.

The specific tier that wins depends on your mix, which depends on your target sites, which you probably don't fully enumerate at project start. A router that remembers what worked is the cheapest compromise we've found.

If you want to audit your current scraper's mix, the easiest way is our free scan endpoint:

code

curl -X POST https://dreamscrape.app/scan \
  -H "Authorization: Bearer $DREAMSCRAPE_API_KEY" \
  -d '{"url": "https://target-site.com"}'

/scan does one real scrape and returns what tier actually worked, what protections were detected, and what internal APIs the page loaded that you could hit directly. 3 credits regardless of tier. We don't save your target URLs.

Sources and further reading

Our full scorecard with live pass/fail rates: dreamscrape.app/scorecard
Intel directory with per-domain protection profiles: dreamscrape.app/intel
Our earlier post on JA4 TLS fingerprinting explains exactly how curl_cffi bypasses Bot Fight Mode
Why 70% of scraping doesn't need a browser — the router math in more depth

The 2.3M-request dataset referenced in this post is our production scrape log between 2026-03-15 and 2026-04-15. Raw counts: 1.82M HTTP, 511K JA4, 185K Stealth Playwright, 43K Camoufox. We'll publish a formal methodology and SQL for the numbers on request.