Skip to content
PythonRAGChromaDBLLM

Document Q&A

Upload a PDF, ask questions in plain English, get answers pulled directly from the document with page-number citations. Built on a Retrieval-Augmented Generation pipeline: document is chunked, embedded, stored in a vector database, and relevant chunks retrieved to ground LLM answers in actual content.

Open book with purple magnifying glass and light beams connecting passages to an answer card, representing RAG document Q&A.

Pipeline

RAG (retrieval-augmented)

Vector DB

ChromaDB

Session expiry

30 min auto-delete

Max document

200 pages

How it works

1PDF Upload
2Text Extract
3Semantic Chunking
4Embed
5ChromaDB Store
6Query Embed
7Vector Retrieval
8Grounded Generation
9Answer + Citations

RAG Architecture

Two phases: ingestion (upload and index once) and query (ask questions, retrieve relevant chunks, generate grounded answers). Structure-aware chunking respects headings and paragraph boundaries - never splits mid-sentence. Adjacent chunks share 15% overlap for context preservation.

Retrieval & Generation

Questions are embedded with the same model (text-embedding-3-small, 1536-dim vectors). Stored chunks retrieved via cosine similarity (top-5, with 0.65 relevance threshold). The LLM system prompt explicitly states 'Answer ONLY based on context' and is instructed to cite page numbers in every claim.

Security

  • File validation by magic bytes, 20 MB limit, 200 page limit
  • Session-scoped data isolation - collections auto-deleted after 30 min inactivity
  • PII redacted before embedding
  • Prompt injection defence via XML delimiters
  • Rate limiting: 3 uploads/hour, 50 questions/session

Related service

This project demonstrates the kind of work I do under Custom AI Solutions.

Want something like this built for your business?

I'll look at your problem, figure out the right approach, and ship working software. No slideshows.

Book a free consultation