Tutorial

How to Scrape Twitter/X in 2026 Without Paying $200/month

Scrape Twitter/X without the $200 API fee using guest tokens or stealth-playwright. Cost breakdown, working code, common errors, real success rates.

Curtis Vaughan12 min read

X's Basic API tier costs $200/month for 10,000 tweets, the Pro tier is $5,000/month, and Enterprise starts at $42,000/month. If you're scraping fewer than 50K public tweets per month for research, monitoring, or a startup prototype, you're paying for SLA guarantees and historical search you don't need.

This guide shows two scraping paths that work in 2026: guest-token HTTP requests (cheap, fast, public-only) and stealth-playwright with residential IPs (expensive, slower, handles profile pages). Production numbers from our intel database: guest tokens hit [TODO: insert guest-token success rate from logs] on public search and profile JSON endpoints, stealth-playwright hits 65% on rendered profile pages with 3.8s average latency at 13 credits per request. Full per-endpoint data lives at /intel/twitter.com and /intel/x.com.

Why the Official Twitter/X API Costs $200/mo and What Changed in 2025-2026

When Elon Musk took over Twitter in late 2022, the free API tier disappeared within months. By February 2023, the cheapest tier became $100/month with severe rate limits. The 2024 restructure pushed Basic to $200/month (10K tweets read, 50K posts), Pro to $5,000/month (1M tweets), and Enterprise to custom pricing starting around $42,000/month.

What the paid API actually gates:

  • Search volume — Basic tier caps you at 10K reads/month, which dies in a day for most monitoring use cases
  • Streaming — the firehose and filtered stream endpoints are Pro-tier only ($5K/month minimum)
  • Historical search — anything beyond ~7 days requires Pro tier; full archive search is Enterprise-only
  • High-volume posting — write endpoints scale with tier

There are real reasons to pay. If you need 99.9% uptime SLAs, support contracts, or are running a production app that depends on tweet data flowing reliably, scraping is the wrong tool. X actively blocks scrapers and the cat-and-mouse game eats engineering hours.

Scraping makes sense for: personal research projects, academic studies under 50K tweets, startup prototypes validating an idea before committing $200/month, monitoring 50-500 specific public accounts, and journalism workflows where you need a few thousand tweets per investigation.

The catch: most non-search content on X requires login. Home timeline, who-to-follow, recommendations — none of it works without authenticated scraping, which violates ToS Section 2.1 and risks account suspension. Public profiles, public search, and tweet detail pages still work without login through guest tokens or rendered pages.

The Viability Table: Guest Token vs. Stealth Playwright vs. Paid API

ApproachCostSuccess rateLatencyLogin requiredRate limitBest for
Guest token (HTTP)~1-2 credits/req[TODO: guest-token success rate from logs]~1.5sNo~50 req/tokenUnder 10K public tweets/month
Stealth Playwright + residential13 credits/req65% on profile pages3.8sNoPer-IP, ~100/hrUnder 100K profile pages/month
Official API Basic$200/month flat100%<500msYes (OAuth)10K reads/monthProduction reliability
Official API Pro$5,000/month flat100%<500msYes1M reads/monthStreaming, scale

Failure modes for each:

  • Guest token dies on a 401 Unauthorized after 60-90 minutes. Production code needs a refresh loop. The token also can't access protected accounts, home timeline, or anything login-walled.
  • Stealth Playwright is IP-dependent. The 35% failure rate is mostly IPs that X has flagged from prior scraping traffic. Retries on a fresh residential session catch about 70% of the failed requests.
  • Official API costs scale with volume and the rate limits are hard. 10K reads on Basic disappears in 4 hours of moderate monitoring.

The decision rule:

  • Need under 10K public tweets/month and can tolerate token refresh logic → guest token
  • Need rendered profile pages, follower counts, or content that lives behind X's frontend JS → stealth-playwright
  • Need streaming, historical search beyond 7 days, or production SLAs → pay the $200 (or $5K)

Working Code: Guest Token Scraping Without Login (HTTP Tier)

The guest-token flow has two steps: fetch a token from /1.1/guest/activate.json, then use it in the x-guest-token header on subsequent requests to public endpoints.

code
import time
import httpx
from datetime import datetime
 
DREAMSCRAPE_KEY = "ds_live_..."
DREAMSCRAPE_URL = "https://api.dreamscrape.app/scrape"
 
def fetch_guest_token():
    """Get a fresh guest token via DreamScrape HTTP tier."""
    resp = httpx.post(
        DREAMSCRAPE_URL,
        headers={"Authorization": f"Bearer {DREAMSCRAPE_KEY}"},
        json={
            "url": "https://api.x.com/1.1/guest/activate.json",
            "method": "POST",
            "engine": "http",
            "headers": {
                "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA",
            }
        }
    )
    data = resp.json()
    return {
        "token": data["body"]["guest_token"],
        "fetched_at": datetime.utcnow(),
    }
 
def search_tweets(query, guest):
    """Search public tweets with a guest token."""
    resp = httpx.post(
        DREAMSCRAPE_URL,
        headers={"Authorization": f"Bearer {DREAMSCRAPE_KEY}"},
        json={
            "url": f"https://api.x.com/2/search/adaptive.json?q={query}&count=40",
            "engine": "http",
            "headers": {
                "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA",
                "x-guest-token": guest["token"],
            }
        }
    )
    return resp.json()
 
# Refresh loop
guest = fetch_guest_token()
for query in ["dreamscrape", "web scraping", "ja4 fingerprint"]:
    result = search_tweets(query, guest)
    if result.get("status") == 401:
        # Token expired — refetch and retry once
        time.sleep(5)
        guest = fetch_guest_token()
        result = search_tweets(query, guest)
    print(f"{query}: {len(result['body'].get('globalObjects', {}).get('tweets', {}))} tweets")

The 401 you'll see when the token expires looks like this:

code
{
  "errors": [{
    "code": 239,
    "message": "Bad guest token"
  }]
}

Catch this specific error code, refetch the token, retry the request once. Don't retry in a tight loop — X rate-limits the activate endpoint to roughly [TODO: confirm guest-token activate rate limit] requests per IP per hour.

URL patterns that work with guest tokens:

  • api.x.com/2/search/adaptive.json?q=... — public search, works
  • api.x.com/graphql/<hash>/UserByScreenName?variables=... — public profile JSON, works
  • api.x.com/graphql/<hash>/TweetDetail?variables=... — single tweet with replies, works on public tweets
  • api.x.com/2/timeline/home.json — home timeline, blocked (requires auth)
  • Anything from a protected account — blocked

Credit math: a guest-token search returning 40 tweets costs ~2 credits on DreamScrape's HTTP tier. 100 tweets ≈ 5 credits. The same 100 tweets via stealth-playwright on rendered pages would cost 13 credits per page request, so ~26-39 credits depending on pagination. Guest token is 5-8x cheaper for public data.

Working Code: Stealth Playwright for Profile Pages and Login-Walled Content

When you need the rendered profile page — follower counts, bio, pinned tweet, tweet engagement stats — guest-token JSON often misses fields that the frontend JS computes. This is where stealth-playwright with a residential IP earns its 13 credits.

code
import httpx
 
DREAMSCRAPE_KEY = "ds_live_..."
 
def scrape_profile(handle):
    resp = httpx.post(
        "https://api.dreamscrape.app/scrape",
        headers={"Authorization": f"Bearer {DREAMSCRAPE_KEY}"},
        json={
            "url": f"https://x.com/{handle}",
            "engine": "stealth-playwright",
            "useProxy": True,
            "proxyType": "residential",
            "waitFor": "[data-testid='UserName']",
            "timeout": 15000,
            "extract": {
                "displayName": "[data-testid='UserName'] span",
                "bio": "[data-testid='UserDescription']",
                "followers": "a[href$='/verified_followers'] span",
                "following": "a[href$='/following'] span",
                "tweets": {
                    "selector": "article[data-testid='tweet']",
                    "type": "list",
                    "fields": {
                        "text": "[data-testid='tweetText']",
                        "likes": "[data-testid='like'] span",
                        "retweets": "[data-testid='retweet'] span",
                    }
                }
            }
        },
        timeout=30
    )
    return resp.json()
 
result = scrape_profile("elonmusk")
if result.get("status") >= 400 or not result.get("data", {}).get("displayName"):
    # Failure case: timeout, bot detection, or rendered shell with no data
    print(f"Failed: {result.get('error', 'unknown')}")
    # Retry with fresh proxy session
    result = scrape_profile("elonmusk")
print(result["data"])

The 35% of requests that fail break down roughly:

  • ~20% hit a rendered "Something went wrong" error page (X's soft-block for flagged IPs)
  • ~10% time out at 15 seconds because the JS challenge doesn't complete
  • ~5% return the page shell with empty data containers — the JS detected automation mid-render

Latency budget at 3.8s average:

  • ~600ms TLS + initial HTML
  • ~1.8s React bundle parse and execute
  • ~1.4s data XHRs and DOM hydration

The waitFor selector is critical. Without it, the scraper returns before React has populated the DOM and you get an empty result that looks like a soft block but isn't. Always wait for a specific data-testid the page must render.

For login-walled content (DMs, notifications, advanced search filters): scraping won't help. You'd need to provide session cookies from a real account, which violates ToS Section 2.1 and risks suspension of that account. Use the official API.

Common Errors: Guest Token Expiry, IP Blocks, and Bot Detection

Error 1: Guest token expired (401 Unauthorized after 60-90 min)

code
{"errors": [{"code": 239, "message": "Bad guest token"}]}

Fix: refresh on 401 with exponential backoff. First retry after 5s, second after 15s, third after 45s. If three fetches in a row fail, the activate endpoint itself is rate-limited — back off for 10 minutes.

Error 2: Rate limit on guest-token search (429)

code
{"errors": [{"code": 88, "message": "Rate limit exceeded"}]}

Each guest token gets ~50 requests before X starts returning 429s. Fix: rotate tokens (fetch a new one before hitting the limit), add 200-400ms jitter between requests, and spread requests across DreamScrape's residential IP pool by setting useProxy: true even on HTTP tier.

Error 3: Bot detected on stealth-playwright (rendered "Something went wrong")

The page loads but you get X's generic error screen instead of profile content. Detection signals: rapid sequential requests from the same residential IP, missing browser signals, or IP reputation. Fix: rotate residential IPs between requests (DreamScrape does this automatically when you set proxyRotation: "per-request"), add 2-5s random delays for sensitive targets, and avoid hitting more than ~30 profiles per IP per hour.

Error 4: Content not available (null fields)

code
{"data": {"displayName": "Account suspended", "tweets": []}}

Account suspended, deleted, protected, or content geo-blocked. No fix — this is real data. Handle nulls gracefully:

code
if not result["data"].get("tweets"):
    log.info(f"No tweets for {handle} — likely protected/suspended")
    continue

Error 5: Cookie jar mismatch on guest-token requests

If you reuse a token after the underlying DreamScrape session has rotated proxies, you may see:

code
{"errors": [{"code": 200, "message": "Forbidden"}]}

Fix: refetch the guest token whenever you change proxy session, and tag tokens with the session ID they were fetched under.

Where Scraping Fails and When You Need the Official API

Scraping is the wrong choice for these specific workloads:

  • Real-time streaming above ~100 tweets/sec — the firehose and filtered stream endpoints exist on Pro tier ($5K/month) for a reason. Scraping search adaptively misses tweets in the gap between polls and can't match streaming throughput.
  • Protected/private accounts — guest tokens cannot bypass the privacy flag. Stealth-playwright without authenticated cookies sees the same "These tweets are protected" screen a logged-out user sees.
  • Historical search beyond 7 days — X's search index for unauthenticated and guest-token requests caps at roughly the last week. Full-archive search requires Pro tier or higher.
  • Production SLA requirements — X actively breaks scrapers. We've seen DOM selectors change quarterly, GraphQL hash IDs rotate without notice, and JS challenges escalate. If your business depends on 99.9% uptime, scraping will fail you eventually.
  • Volume above 500K tweets/month — at this scale, stealth-playwright credits + residential IP rotation + maintenance labor exceeds the $200/month Basic API tier. Run the math, then pay the API.

X's ToS Section 2.1 explicitly prohibits automated scraping. Risk profile: scraping public unauthenticated endpoints (guest token, public profile pages) carries IP-ban risk but no account risk. Authenticated scraping (using your account's session cookies) carries account-suspension risk. We don't recommend authenticated scraping at any scale.

Cost Comparison: Guest Token vs. API vs. Stealth Playwright

Three scenarios with real credit math at DreamScrape's [TODO: confirm current credit price] per credit pricing:

Scenario 1: 50K public tweets/month via search

  • Guest token: ~50K / 40 tweets per request = 1,250 requests × 2 credits = 2,500 credits ≈ [TODO: dollar cost at current rate]
  • Official API Basic: $200 flat, but Basic only gives 10K reads/month — you'd need Pro ($5,000)
  • Stealth-playwright equivalent: 1,250 × 13 = 16,250 credits, ~6.5x more expensive

Winner: guest token, by a wide margin.

Scenario 2: 10K profile pages with rendered data per month

  • Stealth-playwright + residential: 10K × 13 = 130,000 credits ≈ [TODO: dollar cost]
  • Official API Basic ($200): doesn't return rendered profile JS state — you'd parse user objects, but engagement counts and pinned tweet rendering differ
  • Guest token: works for ~80% of fields, fails on rendered-only data

Winner: depends on whether you need the rendered fields. If yes, stealth-playwright. If raw user object is enough, guest token at 1/6 the cost.

Scenario 3: Real-time monitoring of 100 accounts, polled every 15 minutes

  • 100 accounts × 96 polls/day × 30 days = 288,000 requests/month
  • Stealth-playwright: 288K × 13 = 3.7M credits — [TODO: dollar cost, will exceed $200 API tier]
  • Guest token: 288K × 2 = 576K credits — [TODO: dollar cost vs. API tier]
  • Official API Pro: $5,000 flat with streaming, no polling needed

Above ~50K requests/month, the math starts favoring the API. Above 200K, it's not close.

Don't forget the labor cost. Scraping eats 2-5 hours per month on token refresh debugging, selector updates when X ships a frontend change, and IP-pool tuning. At $50/hour engineering cost, that's $100-250/month before credits.

Maintenance Reality: Token Refreshes, IP Rotation, and X's Anti-Scraping Evolution

Scraping X is not a "set it and forget it" workload. The ongoing operational tax:

Guest token refresh runs every 60-90 minutes. Production code needs a token cache with TTL, refresh logic on 401, and exponential backoff when the activate endpoint itself rate-limits. If your refresh logic breaks at 3am, your entire pipeline goes dark.

IP rotation maintenance for stealth-playwright. Residential IPs from any pool — DreamScrape's included — overlap with other customers' traffic. Some IPs arrive already burned by X. You need to monitor success rates per IP session and let DreamScrape rotate flagged ones out of the pool.

X's bot detection improved measurably between 2024 and 2026. Stealth-playwright success on profile pages dropped from [TODO: 2024 baseline success rate] to 65% in our current data. The trajectory is downward — expect another 10-15 point drop by end of 2026 as X invests in fingerprinting.

Frontend changes hit quarterly. X has changed data-testid values, GraphQL operation hashes, and rendered DOM structure roughly every 90 days for the past two years. Your selectors will break. Plan for it: write tests that run weekly against a known-good profile and alert when fields go null.

Realistic labor estimate for a 50K-tweet/month scraper:

  • Token refresh debugging: ~1 hour/month
  • Selector updates after frontend changes: ~2 hours/quarter (~0.7 hours/month average)
  • IP-pool tuning and proxy debugging: ~1-2 hours/month
  • Total: 2-5 hours/month

At $50/hour, that's $100-250/month in labor before you've spent a single credit. Combined with credits, the breakeven against $200 API hits faster than the credit math alone suggests.

Getting Started: Step-by-Step Setup on DreamScrape

The fastest path from zero to working scraper:

Step 1. Sign up at dreamscrape.app and fund the account with $20. At HTTP-tier pricing, that's roughly [TODO: number of guest-token requests for $20] guest-token requests — enough for a meaningful pilot.

Step 2. Fetch your first guest token:

code
curl -X POST https://api.dreamscrape.app/scrape \
  -H "Authorization: Bearer $DREAMSCRAPE_KEY" \
  -d '{
    "url": "https://api.x.com/1.1/guest/activate.json",
    "method": "POST",
    "engine": "http",
    "headers": {
      "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA"
    }
  }'

Store the returned guest_token with a timestamp.

Step 3. Use the token to query public search:

code
curl -X POST https://api.dreamscrape.app/scrape \
  -H "Authorization: Bearer $DREAMSCRAPE_KEY" \
  -d '{
    "url": "https://api.x.com/2/search/adaptive.json?q=dreamscrape&count=40",
    "engine": "http",
    "headers": {
      "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA",
      "x-guest-token": "<your-token>"
    }
  }'

Step 4. Implement refresh. On any 401 with code 239, sleep 5 seconds, refetch the token, retry the request once. If the retry also 401s, sleep 60 seconds and refetch again.

Step 5. Test against a known public profile. @elonmusk and @DreamScrape both work as smoke tests — they're public, high-traffic, and unlikely to be deleted. Confirm you get tweet data back.

Step 6. Scale up. Add structured logging (which queries succeed, which 401, which 429), set credit-spend alerts in the DreamScrape dashboard at the $5 / $10 / $20 thresholds so you don't burn through the pilot budget unnoticed, and queue your target list.

For full endpoint documentation including the current GraphQL hash IDs and per-endpoint success rates, see /intel/twitter.com and /intel/x.com. Both pages update weekly with the latest selectors and any frontend changes we've detected.

If your monthly volume is under 50K public tweets and you can tolerate the maintenance overhead, start with the guest-token approach — credits will run an order of magnitude cheaper than $200/month. If you need rendered profile pages, go straight to stealth-playwright with useProxy: true