Authors :
Madhav Thigale; Aditya Kumar; Chetna Girme; Apurva Gargote
Volume/Issue :
Volume 10 - 2025, Issue 4 - April
Google Scholar :
https://tinyurl.com/bdzf43w7
Scribd :
https://tinyurl.com/tu6xvnzk
DOI :
https://doi.org/10.38124/ijisrt/25apr956
Google Scholar
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 15 to 20 days to display the article.
Abstract :
This project builds an interactive application where users can upload multiple PDF documents and ask questions
about them, offering a dynamic way to explore and retrieve information from large texts. The system processes the PDFs by
extracting their text, chunking it into smaller sections, and converting these sections into numerical embeddings using
advanced language models. These embeddings are stored in a FAISS vector database, enabling efficient similarity search
and fast retrieval of relevant information based on user queries. The project uses Stream lit as the frontend framework to
create a user-friendly web app, enabling users to interact with the system, upload PDFs, and receive chatbot responses.
Accessed via API, powers the conversational AI, generating responses by creating text embeddings for similarity search,
which are stored in FAISS for efficient retrieval. Lang Chain orchestrates the interactions between the AI model, memory,
and retrieval systems, while utilities like PyPDF2 extract text from PDFs, and dotenv manages environment variables. The
chatbot uses Open AI embeddings for text conversion and Conversation Buffer Memory to maintain context throughout
user interactions.
Keywords :
PDF Interaction, Conversational AI, NLP, Text Extraction, Semantic Search, Intelligent Search, PyPDF2, User-Friendly Interface, Document Analysis, Information Retrieval.
References :
- “Massive Open Online Course Study Group: Interaction Patterns in Face- to-Face and Online (Facebook) Discussions” by Pin-Ju Chen and Yang- Hsueh Chen https://www.frontiersin.org/ journals/psychology/articles/10.3389/fpsyg.2 021.670533/full
- “AN EVALUATION OF STUDENTS EXPERIENCES OF USING VIRTUAL STUDY SPACES” by UCL LIBRARY SERVICES with INFORMATION SERVICES DIVISION, FACULTIES and DEPARTMENTS https://discovery.ucl.ac.uk/id/eprint/10132327/1/An%20Evaluation%20of%20UCL%20Virtual%20Learning%20Spaces%20-%20Final%20Report%20July%202021.pdf
- “Web-based Collaborative Learning” by Fan Qing, Lin Li https://www.sciencedirect.com/science/ article/pii/S1878029611008528?ref=pdf_download&fr=RR-2&rr=8d7c43483bde3b4f
- “Exploring the role of social media in collaborative learning the new domain of learning” by Jamal Abdul Nasir Ansari and Nawab Ali Kha. https://slejournal.springeropen.com/articles/10.1186/s40561-020-00118-7
This project builds an interactive application where users can upload multiple PDF documents and ask questions
about them, offering a dynamic way to explore and retrieve information from large texts. The system processes the PDFs by
extracting their text, chunking it into smaller sections, and converting these sections into numerical embeddings using
advanced language models. These embeddings are stored in a FAISS vector database, enabling efficient similarity search
and fast retrieval of relevant information based on user queries. The project uses Stream lit as the frontend framework to
create a user-friendly web app, enabling users to interact with the system, upload PDFs, and receive chatbot responses.
Accessed via API, powers the conversational AI, generating responses by creating text embeddings for similarity search,
which are stored in FAISS for efficient retrieval. Lang Chain orchestrates the interactions between the AI model, memory,
and retrieval systems, while utilities like PyPDF2 extract text from PDFs, and dotenv manages environment variables. The
chatbot uses Open AI embeddings for text conversion and Conversation Buffer Memory to maintain context throughout
user interactions.
Keywords :
PDF Interaction, Conversational AI, NLP, Text Extraction, Semantic Search, Intelligent Search, PyPDF2, User-Friendly Interface, Document Analysis, Information Retrieval.