How to Scrape Any Site at Scale Without Getting Blocked

The exact stack and tactics we use to scrape Cloudflare-protected sites at 100K+ requests/day with a 90%+ success rate.

Scraping at scale isn't about smarter selectors. It's about looking less like a bot. Here's the production stack.

The four pillars

  1. Browser, not requests — Playwright/Puppeteer for anything past basic HTML
  2. Residential proxies — datacenter IPs are flagged within 100 requests
  3. Stealth plugins — patch the obvious "I'm a headless browser" tells
  4. Pacing — humans don't make 50 req/sec

Stack we use

Code skeleton (Python)

from playwright.async_api import async_playwright
import asyncio, random

async def scrape(url): async with async_playwright() as p: browser = await p.chromium.launch( headless=True, proxy={"server": "http://gate.smartproxy.com:7000", "username": "...", "password": "..."} ) ctx = await browser.new_context( user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...", viewport={"width": 1280, "height": 800}, ) page = await ctx.new_page() await page.goto(url, wait_until="networkidle") await asyncio.sleep(random.uniform(1, 3)) # ... extract data await browser.close() ```

Pacing strategy

Anti-detection extras

Legal disclaimer

Respect robots.txt. Don't scrape personal data (GDPR). Don't violate ToS for sites with explicit anti-scraping clauses. We at Nexora reject scraping gigs that cross these lines.

Hire a scraping expert →

Need this built for you?

Hire a vetted Nexora expert. Escrow-protected. Fixed price. From $65.

Browse automation services →