GlobeScraper
A full-stack content, community, and rental data platform for English teachers relocating to Southeast Asia. Built from scratch with Next.js 14, a 970-line Prisma schema, 7-source scraping pipeline, and AI content engine powered by Google Gemini.
API routes
55
DB models
30+
Scraper sources
7
Deploy
Vercel + Hetzner
How it works
Rental Pipeline
Discover phase crawls category pages and enqueues URLs. Workers atomically claim items (SQL UPDATE…LIMIT, no locks), fetch and parse content, then upsert with content fingerprinting to prevent duplicates. AI review stage uses Gemini to classify residential vs non-residential, correct property types, and rewrite descriptions.
Platform Features
Rental marketplace with search filters, pagination, and image carousels. Community features: profiles, connections, DMs, meetups, trust panels, and moderation. AI content engine researches competitors via Serper.dev, generates articles with Gemini, creates images with Imagen 4.0, and auto-publishes with SEO scoring. Analytics heatmap covering 300+ Cambodia districts.
Key Decisions
No Tailwind — vanilla CSS + BEM for full control. Playwright for Cloudflare bypass (Khmer24 blocks HTTP scrapers). Human-like pacing with jittered delays (1.2-2s). Atomic queue claiming prevents race conditions. Gemini over GPT for speed, cost, and reliable JSON output. Hetzner VPS for browser automation (can't run Playwright on Vercel serverless).
Security
- Human-like pacing prevents scraper detection and bans
- Content fingerprinting prevents duplicate records
- Authentication via Auth.js v5
- Atomic queue claiming prevents race conditions
Want something like this built for your business?
I'll look at your problem, figure out the right approach, and ship working software. No slideshows.
Book a free consultation