Docs
Everything you need to start scraping with DreamScrape.
Quickstart
Get your first scrape in 30 seconds.
1. Get an API key
Sign up for a free API key or use the demo in the playground to test.
2. Make your first request
curl
curl -X POST https://api.dreamscrape.app/scrape \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://news.ycombinator.com", "renderMode": "auto"}'Node.js
const res = await fetch('https://api.dreamscrape.app/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer ds_sk_your_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://news.ycombinator.com',
renderMode: 'auto',
}),
});
const data = await res.json();
console.log(data.markdown); // Clean markdown
console.log(data.engineTier); // "http", "stealth-playwright", etc.
console.log(data.timing.fetchMs); // Latency in msPython
import requests
res = requests.post('https://api.dreamscrape.app/scrape',
headers={'Authorization': 'Bearer ds_sk_your_key'},
json={'url': 'https://news.ycombinator.com', 'renderMode': 'auto'})
data = res.json()
print(data['markdown']) # Clean markdown
print(data['engineTier']) # Which engine tier was used
print(data['timing']['fetchMs']) # Latency3. Read the response
Every response includes:
- •
markdown— Clean extracted content - •
engineTier— Which engine served the request ("http", "stealth-playwright", "camoufox") - •
timing.fetchMs— Request latency in milliseconds - •
metadata— Page title, description, language - •
links— All extracted links from the page - •
credits— Credits charged, remaining, limit, and reset date
Full response example
{
"status": "completed",
"url": "https://news.ycombinator.com",
"engineTier": "http",
"markdown": "# Hacker News\n1. Show HN: ...",
"metadata": {
"title": "Hacker News",
"description": "",
"language": "en"
},
"links": [
"https://news.ycombinator.com/item?id=12345",
"https://example.com/article"
],
"timing": { "fetchMs": 142, "cleanMs": 45 },
"credits": {
"charged": 1,
"remaining": 999,
"limit": 1000,
"resetAt": "2026-05-01T00:00:00.000Z"
}
}Async mode
Add async: true to get a 202 response with a job ID, then poll for results:
# Start async scrape
curl -X POST https://api.dreamscrape.app/scrape \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "async": true}'
# Response: {"jobId": "abc-123", "status": "queued"}
# Poll for results
curl https://api.dreamscrape.app/jobs/abc-123 \
-H "Authorization: Bearer ds_sk_your_key"
# Response: {"status": "completed", "pages": [...], ...}Multi-page crawl
Crawl multiple pages from a starting URL:
curl -X POST https://api.dreamscrape.app/crawl \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"startUrls": ["https://docs.example.com"],
"maxPages": 50,
"maxDepth": 2,
"sameOriginOnly": true
}'
# Returns 202 with crawl ID. Poll GET /crawls/:id for results.Structured extraction
Use /auto-extract for AI-powered schema detection, or pass an extract object to /scrape:
# Auto-extract (no schema needed)
curl -X POST https://api.dreamscrape.app/auto-extract \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.basketball-reference.com/leagues/NBA_2026_per_game.html",
"hint": "player stats table"
}'
# Returns structured JSON with detected data type and fields.Credits
DreamScrape uses credits instead of flat request counts. Different engine tiers cost different amounts, matching the actual compute cost. The router picks the cheapest engine that works, so most of your credits go toward Tier 0 — the cheapest option.
Credit costs per tier
| Engine tier | Credits | When used |
|---|---|---|
| HTTP | 1 | Plain fetch — most sites |
| JA4 HTTP | 1 | TLS fingerprint impersonation |
| Stealth Browser | 3 | Headless Chromium for JS-heavy sites |
| Anti-detect Firefox | 10 | Patched Firefox for the hardest targets |
| + Residential proxy | +10 | Added when useProxy: true |
| + CAPTCHA solve | +2 | Added when solveCaptcha: true |
Checking usage
Every scrape response includes a credits block:
{
"status": "completed",
"engineTier": "http",
"credits": {
"charged": 1,
"remaining": 49999,
"limit": 50000,
"resetAt": "2026-05-01T00:00:00Z"
}
}You can also check your full usage with GET /usage, which returns tier breakdown and efficiency metrics. Or visit the dashboard.
What happens when you run out
When your credits are exhausted, the API returns 402 Payment Required with:
{
"error": {
"code": "INSUFFICIENT_CREDITS",
"message": "Credit quota exhausted (50000/50000 credits used)",
"used": 50000,
"limit": 50000,
"resets": "2026-05-01T00:00:00Z",
"suggestion": "Upgrade your plan for more credits"
}
}Credits reset automatically at the start of each billing month. Failed scrapes are not charged.
API Reference
POST /scrape
Scrape a URL and get clean markdown + metadata.
| Parameter | Type | Default | Description |
|---|---|---|---|
| url | string | (required) | URL to scrape |
| renderMode | "auto" | "http" | "browser" | "auto" | How to fetch the page |
| engine | "auto" | "stealth-playwright" | "camoufox" | — | Force a specific engine |
| timeout | number | 30000 | Request timeout in ms |
| async | boolean | false | Return 202 + job ID for polling |
| impersonateBrowser | "chrome" | "firefox" | — | TLS fingerprint impersonation |
| tlsFingerprint | "ja3" | "ja4" | — | JA3 (legacy) or JA4 (recommended) |
| acceptContentType | "html" | "json" | "any" | "html" | Accept JSON API responses |
| solveCaptcha | boolean | false | Auto-solve CAPTCHAs |
| useProxy | boolean | false | Use residential proxy |
| headers | object | — | Custom request headers |
| extract | object | — | Structured extraction schema |
GET /jobs/:id
Poll for async scrape results (when async: true).
GET /usage
Get your current month's usage by engine tier.
GET /health
Check API status.
POST /race
Fire all engine tiers in parallel on a URL. Returns results for each tier with pass/fail, latency, content length, and a winner. Costs 3 credits.
curl -X POST https://api.dreamscrape.app/race \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.zillow.com"}'
# Returns: { results: [{tier, status, worked, latencyMs, credits}...], winner: {tier, latencyMs, savings} }POST /scan
Analyze a site's anti-bot protections. Probes HTTP, JA4, and browser tiers. Returns protection profile + discovered internal APIs. Costs 3 credits.
curl -X POST https://api.dreamscrape.app/scan \
-H "Authorization: Bearer ds_sk_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.zillow.com"}'
# Returns: { protections: {httpBlocked, tlsFingerprintRequired, captchaDetected}, minimumEngine, discoveredApis: [...] }GET /intel/:domain
Public — no auth required. Look up any domain's protection profile, routing intelligence, and discovered APIs.
curl https://api.dreamscrape.app/intel/coinmarketcap.com
# Returns: { routing, discoveredApis: [{urlPattern, method, score}...], recentRequests }GET /intel
Public — no auth required. Browse the full intelligence database. Returns all domains with request counts, success rates, and discovered API counts.
Error Taxonomy
Every error includes a blockReason that tells you exactly why it failed and a suggestion with what to try next. No more guessing at 403s.
ipDatacenter IP blocked — needs residential proxyFix: Use useProxy: true with a residential proxy configured
stealthFingerprint detected — needs browser engineFix: Try engine: "stealth-playwright" or "camoufox"
captchaCAPTCHA challenge detectedFix: Use engine: "camoufox" with solveCaptcha: true
authSite requires authenticationFix: Pass auth cookies/headers via the headers option
rate_limitTarget site is throttling requestsFix: Reduce request frequency or use proxy rotation
timeoutRequest timed outFix: Increase timeout or try a browser engine for JS-heavy sites
robots_txtDisallowed by robots.txtFix: Check the target's robots.txt and respect their rules
geoContent is geo-restrictedFix: Use a proxy in the target region
paywallContent behind a paywallFix: Pass subscription auth headers
not_found404 — page does not existFix: Verify the URL is correct
unknownUnclassified failureFix: Check the error message for details and contact support
insufficient_creditsMonthly credit quota exhausted (402)Fix: Upgrade your plan or wait for credits to reset at the start of next month
Engine Tiers
DreamScrape automatically picks the cheapest engine that works for each domain. Every response includes engineTier so you always know which one was used.
Plain fetch with JA4 TLS fingerprint. Handles 70%+ of sites.
Headless Chromium with fingerprint injection. For JS-heavy SPAs.
Patched Firefox with C++ fingerprinting. For the hardest targets.
Changelog
2026-04-13
API Discovery + Tier Race + Intel Database
- • API Discovery: Browser scrapes now auto-discover internal JSON APIs. Future requests replay the API directly — HTTP speed, 1 credit.
- • Tier Race (
POST /race): Fire all engines in parallel, see which wins and why. - • Scan Protections (
POST /scan): Analyze any site's anti-bot defenses before scraping. - • Intel Database (/intel/:domain): Public lookup of any domain's protection profile + discovered APIs.
- • Playground now has 3 tabs: Scrape, Tier Race, Scan Protections
- • New pricing: Lite $9/mo (100K HTTP credits), Starter $19/mo, Pro $59/mo, Scale $179/mo
2026-04-11
Credit-based pricing
- • Switched from flat scrape counts to credit-based pricing
- • Credits match actual cost per tier: 1 for HTTP, 3 for stealth browser, 10 for anti-detect Firefox
- • Every response includes
creditsblock (charged, remaining, limit, resetAt) - •
GET /usagereturns tier breakdown + efficiency metrics - • Credit dashboard at /dashboard
- •
402 INSUFFICIENT_CREDITSerror when quota exhausted - • Existing keys grandfathered at 1:1 credit conversion
2026-04-10
Launch
- • API keys with per-key rate limits, quotas, and engine ACLs
- • 5 engine tiers: HTTP, JA4, StealthPlaywright, Camoufox, Browserless
- • Async scrape via BullMQ (POST with async: true)
- • 12-value error taxonomy with actionable suggestions
- • CAPTCHA detection as hard failure (no silent garbage)
- • Live scorecard at dreamscrape.app
- • Domain-specific data packages (sports, crypto) via MCP
Roadmap
Coming soon:
- Python SDK (
pip install dreamscrape) with batch helpers and retry logic - Node.js SDK
- Webhook callbacks for async jobs
- Batch endpoint for multiple URLs in one call
- Scheduled/cron scrapes
Want to influence priorities? Email me what you need.
Questions? Email us — I reply fast.