PythonRAGChromaDBLLM

Document Q&A

Upload a PDF, ask questions in plain English, get answers pulled directly from the document with page-number citations. Built on a Retrieval-Augmented Generation pipeline: document is chunked, embedded, stored in a vector database, and relevant chunks retrieved to ground LLM answers in actual content.

Pipeline

RAG (retrieval-augmented)

Vector DB

ChromaDB

Session expiry

30 min auto-delete

Max document

200 pages

How it works

1PDF Upload

→

2Text Extract

→

3Semantic Chunking

→

4Embed

→

5ChromaDB Store

→

6Query Embed

→

7Vector Retrieval

→

8Grounded Generation

→

9Answer + Citations

RAG Architecture

Two phases: ingestion (upload and index once) and query (ask questions, retrieve relevant chunks, generate grounded answers). Structure-aware chunking respects headings and paragraph boundaries - never splits mid-sentence. Adjacent chunks share 15% overlap for context preservation.

Retrieval & Generation

Questions are embedded with the same model (text-embedding-3-small, 1536-dim vectors). Stored chunks retrieved via cosine similarity (top-5, with 0.65 relevance threshold). The LLM system prompt explicitly states 'Answer ONLY based on context' and is instructed to cite page numbers in every claim.

Security

File validation by magic bytes, 20 MB limit, 200 page limit
Session-scoped data isolation - collections auto-deleted after 30 min inactivity
PII redacted before embedding
Prompt injection defence via XML delimiters
Rate limiting: 3 uploads/hour, 50 questions/session

Related service

This project demonstrates the kind of work I do under Custom AI Solutions.

Learn more →

← Back to all projects

Want something like this built for your business?

I'll look at your problem, figure out the right approach, and ship working software. No slideshows.

Book a free consultation