Docs

Everything you need to start scraping with DreamScrape.

Quickstart

Get your first scrape in 30 seconds.

1. Get an API key

2. Make your first request

curl

curl -X POST https://api.dreamscrape.app/scrape \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com", "renderMode": "auto"}'

Node.js

const res = await fetch('https://api.dreamscrape.app/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ds_sk_your_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://news.ycombinator.com',
    renderMode: 'auto',
  }),
});

const data = await res.json();
console.log(data.markdown);       // Clean markdown
console.log(data.engineTier);     // "http", "stealth-playwright", etc.
console.log(data.timing.fetchMs); // Latency in ms

Python

import requests

res = requests.post('https://api.dreamscrape.app/scrape',
    headers={'Authorization': 'Bearer ds_sk_your_key'},
    json={'url': 'https://news.ycombinator.com', 'renderMode': 'auto'})

data = res.json()
print(data['markdown'])       # Clean markdown
print(data['engineTier'])     # Which engine tier was used
print(data['timing']['fetchMs'])  # Latency

3. Read the response

Every response includes:

• markdown — Clean extracted content
• engineTier — Which engine served the request ("http", "stealth-playwright", "camoufox")
• timing.fetchMs — Request latency in milliseconds
• metadata — Page title, description, language
• links — All extracted links from the page
• credits — Credits charged, remaining, limit, and reset date

Full response example

{
  "status": "completed",
  "url": "https://news.ycombinator.com",
  "engineTier": "http",
  "markdown": "# Hacker News\n1. Show HN: ...",
  "metadata": {
    "title": "Hacker News",
    "description": "",
    "language": "en"
  },
  "links": [
    "https://news.ycombinator.com/item?id=12345",
    "https://example.com/article"
  ],
  "timing": { "fetchMs": 142, "cleanMs": 45 },
  "credits": {
    "charged": 1,
    "remaining": 999,
    "limit": 1000,
    "resetAt": "2026-05-01T00:00:00.000Z"
  }
}

Async mode

Add async: true to get a 202 response with a job ID, then poll for results:

# Start async scrape
curl -X POST https://api.dreamscrape.app/scrape \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "async": true}'
# Response: {"jobId": "abc-123", "status": "queued"}

# Poll for results
curl https://api.dreamscrape.app/jobs/abc-123 \
  -H "Authorization: Bearer ds_sk_your_key"
# Response: {"status": "completed", "pages": [...], ...}

Multi-page crawl

Crawl multiple pages from a starting URL:

curl -X POST https://api.dreamscrape.app/crawl \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": ["https://docs.example.com"],
    "maxPages": 50,
    "maxDepth": 2,
    "sameOriginOnly": true
  }'
# Returns 202 with crawl ID. Poll GET /crawls/:id for results.

Structured extraction

Use /auto-extract for AI-powered schema detection, or pass an extract object to /scrape:

# Auto-extract (no schema needed)
curl -X POST https://api.dreamscrape.app/auto-extract \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.basketball-reference.com/leagues/NBA_2026_per_game.html",
    "hint": "player stats table"
  }'
# Returns structured JSON with detected data type and fields.

Credits

DreamScrape uses credits instead of flat request counts. Different engine tiers cost different amounts, matching the actual compute cost. The router picks the cheapest engine that works, so most of your credits go toward Tier 0 — the cheapest option.

Credit costs per tier

Engine tier	Credits	When used
HTTP	`1`	Plain fetch — most sites
JA4 HTTP	`1`	TLS fingerprint impersonation
Stealth Browser	`3`	Headless Chromium for JS-heavy sites
Anti-detect Firefox	`10`	Patched Firefox for the hardest targets
+ Residential proxy	`+10`	Added when useProxy: true
+ CAPTCHA solve	`+2`	Added when solveCaptcha: true

Checking usage

Every scrape response includes a credits block:

{
  "status": "completed",
  "engineTier": "http",
  "credits": {
    "charged": 1,
    "remaining": 49999,
    "limit": 50000,
    "resetAt": "2026-05-01T00:00:00Z"
  }
}

You can also check your full usage with GET /usage, which returns tier breakdown and efficiency metrics. Or visit the dashboard.

What happens when you run out

When your credits are exhausted, the API returns 402 Payment Required with:

{
  "error": {
    "code": "INSUFFICIENT_CREDITS",
    "message": "Credit quota exhausted (50000/50000 credits used)",
    "used": 50000,
    "limit": 50000,
    "resets": "2026-05-01T00:00:00Z",
    "suggestion": "Upgrade your plan for more credits"
  }
}

Credits reset automatically at the start of each billing month. Failed scrapes are not charged.

API Reference

POST /scrape

Scrape a URL and get clean markdown + metadata.

Parameter	Type	Default	Description
url	string	(required)	URL to scrape
renderMode	"auto" \| "http" \| "browser"	"auto"	How to fetch the page
engine	"auto" \| "stealth-playwright" \| "camoufox"	—	Force a specific engine
timeout	number	30000	Request timeout in ms
async	boolean	false	Return 202 + job ID for polling
impersonateBrowser	"chrome" \| "firefox"	—	TLS fingerprint impersonation
tlsFingerprint	"ja3" \| "ja4"	—	JA3 (legacy) or JA4 (recommended)
acceptContentType	"html" \| "json" \| "any"	"html"	Accept JSON API responses
solveCaptcha	boolean	false	Auto-solve CAPTCHAs
useProxy	boolean	false	Use residential proxy
headers	object	—	Custom request headers
extract	object	—	Structured extraction schema

GET /jobs/:id

Poll for async scrape results (when async: true).

GET /usage

Get your current month's usage by engine tier.

GET /health

Check API status.

POST /race

Fire all engine tiers in parallel on a URL. Returns results for each tier with pass/fail, latency, content length, and a winner. Costs 3 credits.

curl -X POST https://api.dreamscrape.app/race \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.zillow.com"}'
# Returns: { results: [{tier, status, worked, latencyMs, credits}...], winner: {tier, latencyMs, savings} }

POST /scan

Analyze a site's anti-bot protections. Probes HTTP, JA4, and browser tiers. Returns protection profile + discovered internal APIs. Costs 3 credits.

curl -X POST https://api.dreamscrape.app/scan \
  -H "Authorization: Bearer ds_sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.zillow.com"}'
# Returns: { protections: {httpBlocked, tlsFingerprintRequired, captchaDetected}, minimumEngine, discoveredApis: [...] }

GET /intel/:domain

Public — no auth required. Look up any domain's protection profile, routing intelligence, and discovered APIs.

curl https://api.dreamscrape.app/intel/coinmarketcap.com
# Returns: { routing, discoveredApis: [{urlPattern, method, score}...], recentRequests }

GET /intel

Public — no auth required. Browse the full intelligence database. Returns all domains with request counts, success rates, and discovered API counts.

Error Taxonomy

Every error includes a blockReason that tells you exactly why it failed and a suggestion with what to try next. No more guessing at 403s.

ipDatacenter IP blocked — needs residential proxy

Fix: Use useProxy: true with a residential proxy configured

stealthFingerprint detected — needs browser engine

Fix: Try engine: "stealth-playwright" or "camoufox"

captchaCAPTCHA challenge detected

Fix: Use engine: "camoufox" with solveCaptcha: true

authSite requires authentication

Fix: Pass auth cookies/headers via the headers option

rate_limitTarget site is throttling requests

Fix: Reduce request frequency or use proxy rotation

timeoutRequest timed out

Fix: Increase timeout or try a browser engine for JS-heavy sites

robots_txtDisallowed by robots.txt

Fix: Check the target's robots.txt and respect their rules

geoContent is geo-restricted

Fix: Use a proxy in the target region

paywallContent behind a paywall

Fix: Pass subscription auth headers

not_found404 — page does not exist

Fix: Verify the URL is correct

unknownUnclassified failure

Fix: Check the error message for details and contact support

insufficient_creditsMonthly credit quota exhausted (402)

Fix: Upgrade your plan or wait for credits to reset at the start of next month

Engine Tiers

DreamScrape automatically picks the cheapest engine that works for each domain. Every response includes engineTier so you always know which one was used.

HTTPhttp

~200ms$0.0001/req

Plain fetch with JA4 TLS fingerprint. Handles 70%+ of sites.

Stealth Browserstealth-playwright

~3s$0.0012/req

Headless Chromium with fingerprint injection. For JS-heavy SPAs.

Anti-detect Firefoxcamoufox

~10s$0.006/req

Patched Firefox with C++ fingerprinting. For the hardest targets.

Changelog

2026-04-13

API Discovery + Tier Race + Intel Database

• API Discovery: Browser scrapes now auto-discover internal JSON APIs. Future requests replay the API directly — HTTP speed, 1 credit.
• Tier Race (POST /race): Fire all engines in parallel, see which wins and why.
• Scan Protections (POST /scan): Analyze any site's anti-bot defenses before scraping.
• Intel Database (/intel/:domain): Public lookup of any domain's protection profile + discovered APIs.
• Playground now has 3 tabs: Scrape, Tier Race, Scan Protections
• New pricing: Lite $9/mo (100K HTTP credits), Starter $19/mo, Pro $59/mo, Scale $179/mo

2026-04-11

Credit-based pricing

• Switched from flat scrape counts to credit-based pricing
• Credits match actual cost per tier: 1 for HTTP, 3 for stealth browser, 10 for anti-detect Firefox
• Every response includes credits block (charged, remaining, limit, resetAt)
• GET /usage returns tier breakdown + efficiency metrics
• Credit dashboard at /dashboard
• 402 INSUFFICIENT_CREDITS error when quota exhausted
• Existing keys grandfathered at 1:1 credit conversion

2026-04-10

Launch

• API keys with per-key rate limits, quotas, and engine ACLs
• 5 engine tiers: HTTP, JA4, StealthPlaywright, Camoufox, Browserless
• Async scrape via BullMQ (POST with async: true)
• 12-value error taxonomy with actionable suggestions
• CAPTCHA detection as hard failure (no silent garbage)
• Live scorecard at dreamscrape.app
• Domain-specific data packages (sports, crypto) via MCP

Roadmap

Coming soon:

Python SDK (pip install dreamscrape) with batch helpers and retry logic
Node.js SDK
Webhook callbacks for async jobs
Batch endpoint for multiple URLs in one call
Scheduled/cron scrapes

Want to influence priorities? Email me what you need.

Questions? Email us — I reply fast.