Document-based question answering using OpenAI's API.
π Document Q&A MCP Server
A Python-based Model Context Protocol (MCP) server that provides document-based question answering using OpenAI's API. Upload documents, ask questions, and get answers based strictly on document content with zero hallucinations.
π Live Demo
Web Interface: Start the server and visit http://localhost:8000
β‘ Quick Start
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# 3. Start the web server
python web_server.py
# 4. Open http://localhost:8000 in your browser
# 5. Upload a document and start asking questions!
π― Features
- π€ Web File Upload: Drag & drop PDF, TXT, Markdown files
- π€ Smart Q&A: GPT-4 powered answers based strictly on your documents
- π Semantic Search: OpenAI embeddings with cosine similarity
- π« Zero Hallucinations: Only answers from document content
- π Real-time Dashboard: Live status, confidence scores, source attribution
- ποΈ MCP Compliant: Standard protocol for AI integration
- β‘ Production Ready: Error handling, logging, async support
ποΈ Architecture
- Multi-format Support: PDF, TXT, and Markdown files
- Intelligent Chunking: Semantic document splitting with overlap
- Vector Search: OpenAI embeddings with cosine similarity
- Hallucination Prevention: Strict adherence to document content
- MCP Compliant: Standard protocol endpoints
- Production Ready: Clean architecture with error handling
Architecture
βββββββββββββββββββ HTTP/Upload βββββββββββββββββββ MCP Protocol βββββββββββββββββββ
β Web Browser β ββββββββββββββββββΊ β Web Server β βββββββββββββββββββΊ β Document Q&A β
β β β β β MCP Server β
β β’ File Upload β β β’ File Handlingβ β β
β β’ Q&A Interfaceβ β β’ HTTP Endpointsβ β βββββββββββββ β
β β’ Results β β β’ JSON API β β βDocumentLoaderβ β
βββββββββββββββββββ βββββββββββββββββββ β βββββββββββββ β
β βββββββββββββ β
β β Chunker β β
β βββββββββββββ β
β βββββββββββββ β
β βEmbedding β β
β β Store β β
β βββββββββββββ β
β βββββββββββββ β
β β Query β β
β β Handler β β
β βββββββββββββ β
βββββββββββββββββββ
The server consists of five main components:
- DocumentLoader: Handles PDF, TXT, and Markdown file parsing
- DocumentChunker: Intelligently splits documents into semantic chunks
- EmbeddingStore: Manages vector embeddings for similarity search
- QueryHandler: Processes questions and generates context-aware answers
- MCPServer: Exposes MCP-compliant endpoints
π Usage Options
Option 1: Web Interface (Recommended)
python web_server.py
# Visit http://localhost:8000
Option 2: Interactive CLI
python interactive_client.py
Option 3: Simple Version (No MCP)
python simple_document_qa.py
# Visit http://localhost:8001
Option 4: Run Tests
python test_server.py
π± Web Interface Features
- π€ File Upload: Click "Choose File" or drag & drop documents
- β Question Input: Type questions in the text area
- π Live Dashboard: Real-time status and document info
- π― Confidence Scores: See how confident the AI is in each answer
- π Source Attribution: Know exactly which document parts were used
- β‘ Real-time Processing: Instan
Environment Variables
OPENAI_API_KEYrequiredAPI key for accessing OpenAI's GPT-4 and embedding servicesConfiguration
{"mcpServers": {"document-qa": {"command": "python", "args": ["/path/to/web_server.py"], "env": {"OPENAI_API_KEY": "your-api-key-here"}}}}