Agentic Chunking: Enhancing RAG Answers for Completeness and Accuracy
Agentic Chunking improves context preservation in Retrieval Augmented Generation.
Introduction
Generating accurate and complete answers using Large Language Models (LLMs) depends on two main factors:
1. The quality of the LLM.
2. The quality of the context provided to the LLM.
In this post, we’ll focus on the second factor—how to prepare documents to provide high-quality context. This involves processing, chunking, embedding, and retrieving documents using semantic search. One key part of this is chunking, or breaking down documents into smaller, manageable pieces.
Traditional chunking methods often fall short, losing context and leading to incomplete answers. We’ll quickly go over the limitations of these old methods and introduce a new approach: Agentic Chunking. This method leverages LLMs to create context-rich, meaningful chunks, resulting in more accurate and complete responses.
Traditional Chunking Methods and their limitations
Method | Description | Limitations |
---|---|---|
Fixed-Size Character Chunking | Divides text into fixed-size chunks with overlaps. | - Cuts off sentences/words, losing meaning. - Ignores document structure (headings, lists). - Arbitrary breaks mix unrelated topics, disrupting flow. |
Recursive Text Splitting | Splits text by hierarchical separators (paragraphs, sentences, words). | - Overlooks complex structures (headers, tables). - Cuts off paragraphs, bullet points mid-way - Doesn't take semantic meaning into account |
Semantic Chunking | Groups sentences based on embedding similarity to create context-aware chunks. | - Coherence issues when sentences within a paragraph differ semantically. - Computationally intensive, especially with large documents. |
Note: After trying several "state-of-the-art" chunking solutions in the market, we, at Gleen.ai, found them unsatisfactory. This led us to develop our in-house solution: Agentic Chunking.
Introducing Agentic Chunking
Agentic Chunking is an LLM-based approach that simulates human judgment in text segmentation to create semantically coherent chunks. It overcomes the limitations of traditional methods by intelligently grouping textual elements based on meaning and context.
Key Steps in Agentic Chunking
- Mini-Chunk Creation: The document is initially split into mini-chunks (e.g., ~300 characters each) using Recursive Text Splitting. This ensures that mini-chunks do not cut off sentences
- Marking Mini-Chunks: Each mini-chunk is annotated with unique markers to aid the LLM in recognizing chunk divisions. LLMs cannot count characters precisely but can recognize patterns.
- LLM-Assisted Chunk Grouping:
- The annotated document is provided to an LLM along with specific instructions.
- The LLM is tasked with analyzing the sequence of mini-chunks and grouping them into larger, semantically coherent chunks.
- Constraints, such as a maximum number of mini-chunks per chunk, are imposed.
- Chunk Assembly: The final chunks are formed by combining the mini-chunks selected by the LLM. Attach relevant metadata to each chunk, such as source information and chunk indices.
- Chunk Overlap for Context Preservation: Include the previous and/or next mini-chunk when forming each chunk to create overlapping between chunks.
- Guardrails and Fallback Mechanisms
- Chunk Size Limits: Enforce maximum chunk sizes to stay within LLM input constraints.
- Context Window Management: For documents exceeding the LLM's context window, split and process them in manageable parts.
- Validation: Verify that all mini-chunks are included.
- Fallback to Recursive Splitting: If LLM processing fails or is unavailable, default to using recursive text splitting.
- Parallel Processing: Utilize multi-threading for faster processing.
Benefits of Agentic Chunking
- Semantic Coherence: Generates semantically meaningful chunks, enhancing the relevance and accuracy of retrieved information.
- Context Preservation: Maintains the flow within chunks, enabling LLMs to produce more accurate and contextually appropriate responses.
- Flexibility: Adapts to documents of varying lengths, structures, and content types, making it suitable for diverse applications.
- Robustness: Features guardrails and fallback mechanisms to ensure consistent performance, even with unexpected document structures or LLM limitations.
- Adaptability: Integrates seamlessly with different LLMs and can be fine-tuned in-house to create coherent chunks.
Results
Implementing Agentic Chunking has significantly improved the completeness and accuracy of answers generated by Retrieval Augmented Generation (RAG) systems:
- 92% Reduction in Incorrect Assumptions: Previously, improper chunking led to AI making incorrect assumptions by chopping concepts midway, resulting in inaccurate customer responses. Agentic Chunking reduced these errors by 92%, greatly enhancing reliability.
- Improved Answer Completeness: Incomplete answers were common, especially for lengthy tutorials or guides. Agentic Chunking ensures comprehensive answers remain intact, providing more complete and satisfactory responses to users.
Stay tuned for our next engineering blog post, where we'll dive deeper into how Agentic Chunking serves as a foundational approach in building knowledge graphs to support graph-based Retrieval-Augmented Generation (Graph RAG) and how it improves upon traditional RAG methods.
Build vs. Buy: Implementing Agentic Chunking
While Agentic Chunking is a small component, it plays a crucial role in the AI agent pipeline by producing semantically coherent chunks that enhance answer accuracy and completeness. Here's a look at the pros and cons of building it in-house versus leveraging expert solutions.
Pros of Building In-House
- Control and Customization: Tailor the solution specifically to your use case, designing prompts and algorithms that align perfectly with your needs.
- Specificity: Develop chunking strategies uniquely suited to your data and applications.
Cons of Building In-House
- High Engineering Costs: Requires significant technical expertise and a substantial time investment, leading to increased costs.
- Unpredictable LLM Behavior: Predicting how large language models behave can be challenging. Building solely for your use case may not reveal all potential failures.
- Maintenance Overhead: Ongoing maintenance is needed as generative AI evolves rapidly.
- Production Challenges: Achieving prototype-level accuracy is one thing; making it production-ready with high accuracy (99%+) is a significant challenge requiring substantial effort.
Leave It to the Experts
Implementing Agentic Chunking effectively demands expertise and resources. Instead of building it from scratch, consider leveraging solutions from experts like Gleen.
- Learn More: Gleen AI Products
- Create and Test Your Own AI Agent for Free: Sign Up
- Read Customer Success Stories: Customer Success Stories