PythonRAGChromaDBLLM

Document Q&A

Upload a PDF, ask questions in plain English, get answers pulled directly from the document with page-number citations. Built on a Retrieval-Augmented Generation pipeline: document is chunked, embedded, stored in a vector database, and relevant chunks retrieved to ground LLM answers in actual content.

Pipeline

RAG (retrieval-augmented)

Vector DB

ChromaDB

Session expiry

30 min auto-delete

Max document

200 pages

How it works

1PDF Upload
2Text Extract
3Semantic Chunking
4Embed
5ChromaDB Store
6Query Embed
7Vector Retrieval
8Grounded Generation
9Answer + Citations

RAG Architecture

Two phases: ingestion (upload and index once) and query (ask questions, retrieve relevant chunks, generate grounded answers). Structure-aware chunking respects headings and paragraph boundaries — never splits mid-sentence. Adjacent chunks share 15% overlap for context preservation.

Retrieval & Generation

Questions are embedded with the same model (text-embedding-3-small, 1536-dim vectors). Stored chunks retrieved via cosine similarity (top-5, with 0.65 relevance threshold). The LLM system prompt explicitly states 'Answer ONLY based on context' and is instructed to cite page numbers in every claim.

Security

  • File validation by magic bytes, 20 MB limit, 200 page limit
  • Session-scoped data isolation — collections auto-deleted after 30 min inactivity
  • PII redacted before embedding
  • Prompt injection defence via XML delimiters
  • Rate limiting: 3 uploads/hour, 50 questions/session

Want something like this built for your business?

I'll look at your problem, figure out the right approach, and ship working software. No slideshows.

Book a free consultation