Back to Projects
AskTheDoc - An AI Powered PDF Assistant
Chat with multiple PDF documents using LangChain, OpenAI, and FAISS through a clean Streamlit interface.
PythonLangChainFAISSOpenAIStreamlitPyPDFLoader
Project Overview
AskTheDoc is a powerful AI assistant that enables users to interact with one or more PDF documents in real time. Built using LangChain's RAG pipeline, OpenAI LLMs, and FAISS for fast vector search, it supports both a Streamlit web interface and CLI.
Documents are embedded using OpenAIEmbeddings, chunked, and stored in a persistent FAISS vector store. It supports multi-file querying, per-file vector store creation, chat export, and detailed session stats. This project highlights effective use of LLMs for document QA and showcases retrieval-augmented generation in action.
Key Features
- •Multi-PDF upload and selection interface
- •FAISS-based per-file and merged vector storage
- •Real-time chat with source chunk citations
- •Exportable chat history logs
- •Session reset and stats panel for analytics
- •Persistent vector store saved to disk
Technical Challenges
- →Maintaining context across multiple documents
- →Balancing chunk size and retrieval accuracy
- →Handling duplicate or redundant chunk answers
- →Ensuring performance with large PDFs or many files