Scraping at scale isn't about smarter selectors. It's about looking less like a bot. Here's the production stack.
The four pillars
- Browser, not requests — Playwright/Puppeteer for anything past basic HTML
- Residential proxies — datacenter IPs are flagged within 100 requests
- Stealth plugins — patch the obvious "I'm a headless browser" tells
- Pacing — humans don't make 50 req/sec
Stack we use
- Playwright + playwright-stealth
- SmartProxy residential ($8.50/GB)
- 2Captcha for image captchas ($3/1000)
- A small VPS for the runner ($5/mo)
Code skeleton (Python)
from playwright.async_api import async_playwright
import asyncio, random
async def scrape(url): async with async_playwright() as p: browser = await p.chromium.launch( headless=True, proxy={"server": "http://gate.smartproxy.com:7000", "username": "...", "password": "..."} ) ctx = await browser.new_context( user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...", viewport={"width": 1280, "height": 800}, ) page = await ctx.new_page() await page.goto(url, wait_until="networkidle") await asyncio.sleep(random.uniform(1, 3)) # ... extract data await browser.close() ```
Pacing strategy
- 1-3 sec random delay between page loads
- Max 10 concurrent browsers
- Round-robin proxy IPs every 5 requests
- Back off exponentially on 429 (5s, 10s, 20s, 40s)
Anti-detection extras
- Rotate viewport sizes (1280x800, 1920x1080, 1366x768)
- Inject realistic mouse movements before clicks (
page.mouse.move) - Set
permissions=[]and accept-language to match user-agent country
Legal disclaimer
Respect robots.txt. Don't scrape personal data (GDPR). Don't violate ToS for sites with explicit anti-scraping clauses. We at Nexora reject scraping gigs that cross these lines.
Need this built for you?
Hire a vetted Nexora expert. Escrow-protected. Fixed price. From $65.
Browse automation services →