Document Q&A
Upload a PDF, ask questions in plain English, get answers pulled directly from the document with page-number citations. Built on a Retrieval-Augmented Generation pipeline: document is chunked, embedded, stored in a vector database, and relevant chunks retrieved to ground LLM answers in actual content.
Pipeline
RAG (retrieval-augmented)
Vector DB
ChromaDB
Session expiry
30 min auto-delete
Max document
200 pages
How it works
RAG Architecture
Two phases: ingestion (upload and index once) and query (ask questions, retrieve relevant chunks, generate grounded answers). Structure-aware chunking respects headings and paragraph boundaries — never splits mid-sentence. Adjacent chunks share 15% overlap for context preservation.
Retrieval & Generation
Questions are embedded with the same model (text-embedding-3-small, 1536-dim vectors). Stored chunks retrieved via cosine similarity (top-5, with 0.65 relevance threshold). The LLM system prompt explicitly states 'Answer ONLY based on context' and is instructed to cite page numbers in every claim.
Security
- File validation by magic bytes, 20 MB limit, 200 page limit
- Session-scoped data isolation — collections auto-deleted after 30 min inactivity
- PII redacted before embedding
- Prompt injection defence via XML delimiters
- Rate limiting: 3 uploads/hour, 50 questions/session
Want something like this built for your business?
I'll look at your problem, figure out the right approach, and ship working software. No slideshows.
Book a free consultation