Designing a LLM with RAG Integrated Pipeline for Legal Domain
Keywords:
RAG, Large Language Model, Vector Database, Legal NLP, BERTScore, Question AnsweringAbstract
The rapid growth of legal data, combined with the inherent complexity of statutes, judicial decisions, and constitutional provisions, makes it increasingly difficult for individuals to understand and navigate legal information without expert assistance. Traditional keyword-based search systems fail to capture the contextual and semantic meaning embedded in legal texts, resulting in incomplete, irrelevant, or misleading retrieval outcomes that hinder access to justice for the general public. This paper presents a Legal Chatbot built on a Retrieval-Augmented Generation (RAG) pipeline that tightly integrates semantic vectorbased retrieval with an LLM-powered response generation module to significantly enhance interpretability, factual accuracy, and overall accessibility of legal information. Legal documents including Indian acts, amendments, and Supreme Court judgments are systematically collected, cleaned, chunked into overlapping segments, and embedded using transformer-based models, then stored in the Pinecone vector database for efficient approximate nearest-neighbour similarity search at inference time. The system retrieves the top-k semantically relevant chunks for any given user query and supplies them as grounded context to a Large Language Model, thereby substantially reducing hallucination and improving response fidelity to authoritative legal sources. A rigorous comparative evaluation of six state-of-the-art embedding models namely MiniLM, MPNet, E5-Mistral, BGE-M3, DistilBERT, and Cohere Embed v3 is conducted across four standard NLG metrics: ROUGE-L, BLEU, METEOR, and BERTScore F1. Experimental results on three real Supreme Court of India case queries confirm that Cohere v3 leads on all generation quality metrics, achieving ROUGE-L of 81.99%, BLEU of 67.43%, METEOR of 74.07%, and BERTScore F1 of 97.17%, while BGE-M3 provides the best retrieval efficiency trade-off with near-instantaneous query latency. These findings demonstrate that combining retrieval-based pipelines with a controlled LLM layer significantly improves the accessibility, accuracy, and interpretability of legal information, and provide actionable guidance for practitioners selecting embedding models based on deployment objectives such as accuracy, latency, or resource efficiency.