TypeScriptNext.jsPrismaAI

GlobeScraper

A full-stack content, community, and rental data platform for English teachers relocating to Southeast Asia. Built from scratch with Next.js 14, a 970-line Prisma schema, 7-source scraping pipeline, and AI content engine powered by Google Gemini.

API routes

55

DB models

30+

Scraper sources

7

Deploy

Vercel + Hetzner

How it works

17 Scraper Sources
2Crawl & Discover
3ScrapeQueue
4Parallel Workers
5Content Fingerprint
6DB Upsert
7AI Review
8Analytics Index

Rental Pipeline

Discover phase crawls category pages and enqueues URLs. Workers atomically claim items (SQL UPDATE…LIMIT, no locks), fetch and parse content, then upsert with content fingerprinting to prevent duplicates. AI review stage uses Gemini to classify residential vs non-residential, correct property types, and rewrite descriptions.

Platform Features

Rental marketplace with search filters, pagination, and image carousels. Community features: profiles, connections, DMs, meetups, trust panels, and moderation. AI content engine researches competitors via Serper.dev, generates articles with Gemini, creates images with Imagen 4.0, and auto-publishes with SEO scoring. Analytics heatmap covering 300+ Cambodia districts.

Key Decisions

No Tailwind — vanilla CSS + BEM for full control. Playwright for Cloudflare bypass (Khmer24 blocks HTTP scrapers). Human-like pacing with jittered delays (1.2-2s). Atomic queue claiming prevents race conditions. Gemini over GPT for speed, cost, and reliable JSON output. Hetzner VPS for browser automation (can't run Playwright on Vercel serverless).

Security

  • Human-like pacing prevents scraper detection and bans
  • Content fingerprinting prevents duplicate records
  • Authentication via Auth.js v5
  • Atomic queue claiming prevents race conditions

Want something like this built for your business?

I'll look at your problem, figure out the right approach, and ship working software. No slideshows.

Book a free consultation