Tutorial

How to scrape Zillow listings in 2026 (with real success rates)

Scrape Zillow listings with 75-85% success on Stealth Playwright, 90%+ on Camoufox. Code, real block rates, error fixes.

Curtis Vaughan12 min read

Zillow runs PerimeterX. That one fact decides which tier of scraper works on it and which gets a blank page after 2 seconds. Naive Selenium and vanilla Puppeteer get through roughly 10-20% of the time on /homes/search URLs in our broader routing data — the rest land as a 200 OK with empty HTML, a 403, or a reCAPTCHA modal that no automated session is going to solve. Our own Zillow sample is small (7 production scrapes in the last 60 days, all on the lighter /homes/{zip}_rb endpoint, all 100% pass) — small enough that we lean on the broader PerimeterX-protected pattern across our routing data rather than pretending the 7-request number is a benchmark.

This post covers what works in 2026: which DreamScrape tier you actually need, success rates anchored against our 7 production Zillow scrapes plus the broader PerimeterX routing data, copy-paste code that runs, and the specific errors you'll hit in production with their fixes. It also covers where this approach fails, because Zillow updates their PerimeterX config quarterly and pretending otherwise wastes your money.

Why Zillow's PerimeterX blocks most scrapers in 2026

PerimeterX (now Human Security) protects Zillow with a challenge script that runs on every page load and scores around 400 fingerprint signals. The signals that matter most for scrapers in 2026:

  • Headless flags. navigator.webdriver === true, missing chrome.runtime, the absence of plugin entries that real Chrome ships with. Vanilla Puppeteer fails all three.
  • CDP port exposure. Chrome DevTools Protocol leaves a TCP listener on a known range. PerimeterX probes for it via timing side-channels in the JS challenge.
  • Canvas fingerprinting. The challenge renders a specific glyph sequence to canvas and reads back the pixel buffer. Headless Chrome produces a hash that doesn't appear in their real-traffic distribution.
  • WebGL renderer strings. Real Chrome on macOS reports ANGLE (Apple, Apple M2, OpenGL 4.1). Headless Chrome on Linux reports Mesa/X.org. Mismatch with the User-Agent is a strong signal.

The 2026 update that caught most stealth-plugin builds off-guard: PerimeterX started cross-checking the canvas hash against the WebGL renderer string against the User-Agent. Spoofing one without the others now lights up faster than spoofing none.

The block rarely shows up as an instant 401. Zillow returns a 200 with a near-empty document, or a 403 several seconds in after the challenge JS scores you, or — on flagged sessions — a fully-rendered listing page where every price reads $0. You have to inspect the response body, not just the status code.

One quirk worth knowing: sub-market endpoints at /homes/{zip}_rb (the by-zip residential pages) sit behind a lighter PerimeterX config than the main /homes/search endpoint. In our 7 Zillow production scrapes — all against /homes/{zip}_rb — every one passed at stealth-playwright tier. Small sample, but the pattern matches what we see across other PerimeterX targets where the by-segment endpoints are softer than the main search route. Start there.

Stealth Playwright: the tier that gets you 75-85% through

DreamScrape's stealth-playwright tier runs Playwright with playwright-extra and the stealth plugin, plus a few patches we've added on top: the CDP listener is removed, the User-Agent rotates from a pool of roughly 30 real strings collected from our own analytics, and the canvas fingerprint is masked to one of roughly 40 real-device hashes that match the spoofed UA.

Production success rate on Zillow /homes/search and /homes/{zip}_rb: 100% on our 7-request /homes/{zip}_rb sample, but that's an early signal — broader PerimeterX-protected sites in our routing data pass at 75-85% on stealth-playwright, which is the realistic planning number for the main search route.

The trade-off is latency. Stealth Playwright averages roughly 3.4s per Zillow page (our last-60-day Zillow average is 3,405ms) versus ~400ms for an unprotected site at HTTP tier. Most of that delay is the PerimeterX challenge script itself — it takes roughly 800-1,200ms to evaluate, and we wait for the resulting cookie before parsing the DOM.

Best fits for stealth-playwright on Zillow:

  • Bulk zip-code scraping for market analysis
  • Historical trend tracking where 6-24h staleness is acceptable
  • Inventory snapshots (how many listings in 90210 today)

Where stealth-playwright stops being enough:

  • Single-property detail pages at /homes/{zpid}/. These hit a separate PerimeterX policy with a tighter score threshold. Block rate climbs to roughly 40-60% even on stealth-playwright in our broader routing data.
  • Pages with dynamic price updates that re-render after the initial DOM settles. The challenge sometimes runs a second time on the XHR refresh and catches the session.

If you're scraping list pages and accepting some staleness, stay on stealth-playwright. If you need detail pages or live data, escalate.

Camoufox + Stealth Playwright: the 90%+ solution (and when it breaks)

Camoufox is a patched Firefox where the anti-detection work happens in C++ rather than via runtime JS injection. That matters because PerimeterX's challenge script can't introspect the patches the same way it can detect playwright-extra's monkey-patched globals. Camoufox also spoofs OS-level signals — device sensors, timezone, locale — and matches them to a distribution of real-user fingerprints rather than a single canned profile.

Combined with our stealth routing on top, Camoufox lands at roughly 92-95% on Zillow main listing pages and 80-85% on detail pages, based on our broader PerimeterX routing data — Zillow specifically is a small sample (7 requests in our last-60-day data, all on lighter /homes/{zip}_rb URLs that don't stress the harder protection).

The trade-offs:

  • Latency. Roughly 6-12s per page average on Camoufox, vs. roughly 3.4s for stealth-playwright alone (our measured Zillow average).
  • Cost. Camoufox costs 10 credits per request on DreamScrape; stealth-playwright costs 3. For a 1,000-listing daily run that's the difference between 10,000 credits/day on Camoufox and 3,000 credits/day on stealth-playwright.

Two failure modes that show up specifically on Camoufox-against-Zillow:

Failure 1: robotic request patterns trigger reCAPTCHA. If you reuse the same User-Agent across requests, hit the site at constant intervals, and skip think-time entirely, PerimeterX escalates from passive scoring to an active reCAPTCHA challenge. Camoufox doesn't solve reCAPTCHA. The fix is to vary think-time (3-7s random between requests, 5-15m random between batches) and rotate the UA pool.

Failure 2: session-level tracking. Camoufox doesn't bypass the fact that Zillow tracks behavior across a session via the PerimeterX cookie. If you reuse an authenticated session past roughly 4 hours, the score decays past threshold and every request fails — not because the fingerprint changed, but because the cumulative session looks bot-shaped. Re-establish the session every 2-3 hours.

Camoufox earns its credit cost on detail pages and high-value list pages where missing data is expensive. For high-volume zip sweeps where some block tolerance is fine, stealth-playwright is the cheaper correct answer.

Working code: scrape Zillow sub-market pages with Stealth Playwright

This pattern targets /homes/{zip}_rb (lighter protection), parses the listing array, paginates, and falls back to Zillow's internal JSON API if HTML parsing comes back empty.

code
import { DreamScrape } from "@dreamscrape/sdk";
 
const ds = new DreamScrape({ apiKey: process.env.DREAMSCRAPE_API_KEY! });
 
interface Listing {
  address: string;
  price: number;
  beds: number;
  baths: number;
  sqft: number;
  listing_url: string;
  days_on_market: number;
  zpid: string;
}
 
async function scrapeZip(zip: string, maxPages = 5): Promise<Listing[]> {
  const results: Listing[] = [];
 
  for (let page = 1; page <= maxPages; page++) {
    const url = `https://www.zillow.com/homes/${zip}_rb/${page}_p/`;
 
    const res = await ds.scrape({
      url,
      tier: "stealth-playwright",
      waitFor: "#search-page-list-container",
      timeout: 20_000,
    });
 
    if (res.status === 401) {
      throw new Error("Auth failed — check API key");
    }
    if (res.status === 429) {
      await sleep(60_000);
      page--; // retry same page
      continue;
    }
    if (res.status === 403 || !res.html?.includes("StaticSearchList")) {
      // Fingerprint mismatch or blank page. Try the JSON fallback.
      const json = await fetchSearchPageState(zip, page, res.cookies);
      if (json) results.push(...parseJsonListings(json));
      else continue; // skip and move on
    } else {
      results.push(...parseHtmlListings(res.html));
    }
 
    // Random think-time: 3-7s between requests, longer between batches
    await sleep(3000 + Math.random() * 4000);
  }
 
  return results;
}
 
async function fetchSearchPageState(
  zip: string,
  page: number,
  cookies: string,
) {
  const res = await ds.scrape({
    url: "https://www.zillow.com/async-create-search-page-state",
    tier: "stealth-playwright",
    method: "PUT",
    headers: { cookie: cookies, "content-type": "application/json" },
    body: JSON.stringify({
      searchQueryState: { usersSearchTerm: zip, pagination: { currentPage: page } },
      wants: { cat1: ["listResults"] },
    }),
  });
  return res.status === 200 ? JSON.parse(res.body) : null;
}
 
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

Three things this code does that matter:

  1. Skip on 403, retry on 429. Treating them the same wastes credits — 429 is recoverable, 403 means this session is burned and the fallback (or skipping) is correct.
  2. Falls back to the internal JSON API. When HTML returns empty (the failure mode where PerimeterX scored you mid-page), we try the JSON endpoint with the cookies we already have. More on this next.
  3. Random think-time, not constant sleeps. A constant await sleep(5000) is itself a fingerprint. Real users vary.

Parse functions (parseHtmlListings, parseJsonListings) are straightforward DOM and JSON traversal — see the Zillow intel page for current selectors, since Zillow ships layout changes roughly every 6-8 weeks based on our intel-page revision history.

Extracting JSON from Zillow's internal /async-create-search-page-state API

Zillow's frontend hits an undocumented internal endpoint at /async-create-search-page-state to populate the listing list. It returns the same data as the rendered HTML in clean JSON, with a few extras: full price history, tax estimates, and Zestimate values that aren't always present in the DOM.

In our routing data, hitting this endpoint via stealth-playwright succeeds at a higher rate than the equivalent HTML scrape — typically 5-10 percentage points better — because PerimeterX scores XHR requests less aggressively than full page loads. Use it as primary if you only need structured data, or as fallback when HTML comes back empty.

The dependency chain matters: this endpoint requires a valid PerimeterX cookie, which you only get from a successful main page load first. Hitting the API cold returns 403 every time. The flow is:

  1. Scrape /homes/{zip}_rb to establish session and grab cookies
  2. Hit /async-create-search-page-state with those cookies for subsequent pages

Two caveats before you build a real-time pipeline on this:

  • The JSON response can lag the frontend by 4-12 hours on price changes. It's a backend cache, not a live feed. Fine for historical analysis, wrong for arbitrage.
  • Zestimate and premium fields require additional headers we don't replay by default. If those fields come back null, that's why.

Common errors, blocks, and how to fix them

Blank page after 2s load. PerimeterX challenge timed out before scoring resolved. Upgrade to Camoufox tier, and add an explicit waitFor: "#search-page-list-container" with a 15s timeout. If it still blanks, the IP is probably flagged — rotate proxy session.

403 Forbidden after roughly 30-50 requests in the same session. Session pattern crossed the anomaly threshold. Insert a random 5-15 minute pause between request clusters, rotate to a fresh User-Agent from a pool of at least 12 real strings, and re-establish the session by hitting the homepage before resuming.

Prices show as $0 or fields are missing. JavaScript rendered partially before the parser ran, or the page is a stealth-block (real-looking page with stubbed data). Confirm headless is disabled in your Playwright config, then escalate to Camoufox which handles Zillow's deeper DOM mutations more reliably.

reCAPTCHA modal appears. PerimeterX escalated to active challenge. Slow down to under 6 requests/minute and add 10s think-time; if it persists, integrate DreamScrape's captcha-solver module (+2 credits per solve).

Listing data incomplete — beds/baths missing on some cards. Lazy-loaded fields didn't render before parse. Scroll the page 3x in 500px increments before parsing, or use Camoufox which triggers IntersectionObserver callbacks more reliably than patched Chromium.

/async-create-search-page-state returns 401. Session cookie expired. Re-fetch the main /homes/{zip}_rb page to mint a new session. Don't reuse cookies older than 2 hours — even if they look valid, the PerimeterX score behind them has decayed.

Where stealth scraping Zillow fails: limits you must accept

Scraping Zillow has a real ceiling, and pretending it doesn't is how customers waste credits.

Rate ceiling. Roughly 30-60 listings/hour per session before secondary blocks kick in, regardless of tier. Zillow infrastructure detects volume spikes at the account level, not just the request level. The official Zillow API (where your use case qualifies) supports higher throughput.

Real-time pricing. Zillow updates prices via internal market-data feeds that don't always re-render the page immediately. Scraped prices may be 4-12 hours stale. If you're building a price-alert product or arbitrage tool, this approach will not work — use the official API or an MLS feed.

Detail pages. /homes/{zpid}/ URLs hit a separate, stricter PerimeterX policy. Even Camoufox sees roughly 15-20% block rates on detail pages in our broader routing data. The cheaper pattern: extract zpid from list pages and fetch detail data via the internal API rather than the user-facing detail URL.

Geo-blocking. Some User-Agent + IP combinations trigger a geo-check that returns regional listings only. Datacenter IPs trip this more than residential. DreamScrape's residential proxy add-on helps but rotating IPs across requests degrades geo-accuracy of returned listings — you'll see results from whichever region the proxy resolved to, not the zip you queried. Pin the proxy region per zip if geo-accuracy matters.

Legal exposure. Zillow's ToS prohibits scraping. Personal market research and academic analysis are lower-risk. Building a competing listing aggregator or a price-undercutting bot on scraped Zillow data is high-risk — Zillow has issued takedowns and pursued litigation. Talk to a lawyer before commercial deployment.

If you need a production real-estate platform, the right path is the Zillow API (limited tier), an MLS feed partnership, or Redfin/RE/MAX data feeds. Scraping is for use cases where staleness, ceiling, and legal gray zone are all acceptable.

Production deployment: monitoring, caching, and cost optimization

A scraper that works in dev and dies in prod usually fails on one of five axes. Address them up front.

Cache by zip + date. Don't re-scrape every listing every day. Most listings change price or status on a 24-72 hour cadence. Fetch new listings daily, refresh existing listings every 48 hours, and only hit detail pages when a list-level field changed.

Monitor block-rate drift. If your stealth-playwright success rate drops from the high 80s to the low 60s within a week, that's a PerimeterX update degrading your fingerprint, not random noise. Alert on a 7-day rolling block rate. Track p99 latency separately — challenges getting slower is an early warning before they start failing.

Cost model. At stealth-playwright cost of 3 credits/request, 1,000 daily scrapes is 3,000 credits/day. At Camoufox cost of 10 credits/request, the same volume is 10,000 credits/day. Run stealth-playwright by default and only escalate the URLs that fail, rather than upgrading the whole job.

Graceful degradation. If block rate exceeds 25%, pause for 12h, then retry the failed URLs on Camoufox. Compare the recovered-listing count against the extra credit spend — if Camoufox only recovers under 50% of the failures, the upgrade isn't worth it for that batch.

Session rotation. Maintain 3-5 distinct PerimeterX sessions, each obtained from a fresh main-page hit. Rotate per batch, never reuse past 2-3 hours. Log auth failures (401, 403) separately from parse failures (empty selector, missing field) so the dashboards tell you whether the problem is bypass or extraction.

Schema. Store (zip, listing_id, price, beds, baths, scraped_at, tier_used, block_attempt_count). The last two columns let you analyze, post-hoc, which tier wins for which zips and whether your block rate is trending — without that data you're guessing.


If you're scraping under 5,000 listings/month and the use case tolerates 24h staleness, stealth-playwright on /homes/{zip}_rb is the right call. Above that, or for detail pages, escalate to Camoufox and budget the extra credits. Above 50,000 listings/month or for any commercial product, get the official API or an MLS feed instead — the scraping path will not scale and the legal exposure stops being theoretical.

Check the current per-tier success rates for Zillow at /intel/zillow.com before you build. The numbers in this post are from our last-60-day routing data as of April 2026 and PerimeterX ships updates roughly quarterly.