How to Scrape Any Site at Scale Without Getting Blocked

Scraping at scale isn't about smarter selectors. It's about looking less like a bot. Here's the production stack.

The four pillars

Browser, not requests — Playwright/Puppeteer for anything past basic HTML
Residential proxies — datacenter IPs are flagged within 100 requests
Stealth plugins — patch the obvious "I'm a headless browser" tells
Pacing — humans don't make 50 req/sec

Stack we use

Playwright + playwright-stealth
SmartProxy residential ($8.50/GB)
2Captcha for image captchas ($3/1000)
A small VPS for the runner ($5/mo)

Code skeleton (Python)

from playwright.async_api import async_playwright
import asyncio, random

async def scrape(url): async with async_playwright() as p: browser = await p.chromium.launch( headless=True, proxy={"server": "http://gate.smartproxy.com:7000", "username": "...", "password": "..."} ) ctx = await browser.new_context( user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...", viewport={"width": 1280, "height": 800}, ) page = await ctx.new_page() await page.goto(url, wait_until="networkidle") await asyncio.sleep(random.uniform(1, 3)) # ... extract data await browser.close() ```

Pacing strategy

1-3 sec random delay between page loads
Max 10 concurrent browsers
Round-robin proxy IPs every 5 requests
Back off exponentially on 429 (5s, 10s, 20s, 40s)

Anti-detection extras

Rotate viewport sizes (1280x800, 1920x1080, 1366x768)
Inject realistic mouse movements before clicks (page.mouse.move)
Set permissions=[] and accept-language to match user-agent country

Legal disclaimer

Respect robots.txt. Don't scrape personal data (GDPR). Don't violate ToS for sites with explicit anti-scraping clauses. We at Nexora reject scraping gigs that cross these lines.

Hire a scraping expert →

Need this built for you?

Hire a vetted Nexora expert. Escrow-protected. Fixed price. From $65.

Browse automation services →