How to Observe Your RAG Applications in Production: A Comprehensive Guide with Code Examples

How to Observe Your RAG Applications in Production: A Comprehensive Guide with Code Examples

Introduction

Retrieval-Augmented Generation (RAG) systems have become foundational in enterprise AI, combining retrieval from proprietary knowledge-bases with large language model generation to deliver grounded, contextual answers to queries. As these systems become mission critical and deeply embedded in enterprise workflows, the challenge shifts from just building functional systems to ensuring end-to-end observability at scale to ensure visibility into failure modes and detect regressions in performance.

This guide provides a practical, end-to-end approach for observing RAG applications in production using Maxim AI. It covers architectural principles, technical instrumentation, best practices, and actionable code examples, equipping engineering and product teams to monitor, debug, and optimize their RAG pipelines.

Why Observability Matters for RAG Applications

RAG systems being powered by LLMs are inherently non-deterministic and complex, blending retrieval logic (searching external sources) with generative models (producing context-driven responses). Quality depends on a multitude of factors: indexing, chunking, embeddings, re-ranking, prompt templates, model versions, and evaluation strategies. As content evolves and prompts change, regressions can creep in unnoticed. Effective observability enables teams to:

  • Monitor retrieval and generation separately to pinpoint root causes of failures.
  • Track performance and reliability over time, including latency, token usage, and cost metrics.
  • Detect and resolve issues proactively before they impact users.
  • Evaluate agent response quality using both automated and human-in-the-loop evals.
  • Implement end-to-end distributed tracing covering both traditional systems and LLM calls.

For more on RAG evaluation fundamentals, see Mastering RAG Evaluation Using Maxim AI and LLM Observability: How to Monitor Large Language Models in Production.

Key Challenges for RAG applications in Production

  1. Retrieval Accuracy: Are the right documents surfaced, with adequate coverage and minimal redundancy?
  2. Generation Groundedness: Are answers faithful to the retrieved evidence, complete, and properly cited?
  3. Long Context Sensitivity: Does accuracy degrade as evidence moves within longer contexts?
  4. Fairness and Segment Analysis: Are certain topics, demographics, or dialects favored or disadvantaged in retrieval or generation?
  5. Session-Level Tracking: Can multi-turn conversations and workflows be tracked and debugged holistically?

Maxim AI addresses these challenges with a robust tracing and evaluation framework, supporting granular instrumentation, metadata tagging, and hybrid evaluation strategies. For an in-depth discussion, refer to Session-Level Observability for Conversational AI.

Maxim AI’s Architecture for RAG Observability

Maxim AI is purpose-built for GenAI observability, offering a full-stack platform that spans:

  • Distributed Tracing: Capture every step of a request lifecycle, from retrieval to generation and tool calls. See Tracing Overview.
  • Session-Level Context: Track multi-turn dialogues and workflows across multiple traces and spans.
  • Rich Metadata and Tagging: Add custom key-value pairs for advanced filtering, debugging, and compliance.
  • Real-Time Dashboards and Alerts: Monitor production metrics, set up custom alerts, and visualize trends and patterns.
  • Hybrid Evaluation: Combine LLM-as-judge, statistical, and human evaluations for robust quality assessment.
  • OpenTelemetry Integration: Forward traces to external OTel compatible platforms (e.g., New Relic, Snowflake) for unified monitoring. See Forwarding via Data Connectors.

Instrumenting RAG Applications: Step-by-Step Guide

1. Set Up Your Maxim Log Repository

A log repository is the central component for storing and analyzing logs. Create separate repositories for production and development, or split by application and service for granular control.

See Tracing Concepts.

// JS/TS example
import { Maxim } from "@maximai/maxim-js";
const maxim = new Maxim({ apiKey: "your-api-key" });
const logger = await maxim.logger({ id: "your-log-repository-id" });

2. Create and Correlate Traces

Each trace represents a complete request lifecycle, including retrieval, generation, and output.

const trace = logger.trace({
  id: "trace-id", // Unique per request
  name: "user-query",
  tags: { userId: "user-123", environment: "production" },
  metadata: { sessionId: "session-456", model: "gpt-4", temperature: 0.7 }
});
trace.input("What are the best places to visit in 2025?");

3. Log Retrievals in the RAG Pipeline

Instrument retrieval steps to capture queries to knowledge bases or vector databases.

const retrieval = trace.retrieval({
  id: "retrieval-id",
  name: "National Geographic survey report 2025.pdf",
  metadata: { vectorDb: "pinecone", indexName: "travel-reports" }
});
retrieval.input("best places 2025");
retrieval.output([
  "Tokyo, Japan",
  "Barcelona, Spain",
  "Singapore",
  "Copenhagen, Denmark",
  "Pune, India",
  "Seoul, South Korea"
]);

See Retrieval Entity Documentation.

4. Log Generations and Model Outputs

Track LLM calls and their results, including model parameters and outputs.

const generation = trace.generation({
  id: "generation-id",
  name: "customer-support--gather-information",
  provider: "openai",
  model: "gpt-4o",
  modelParameters: { temperature: 0.7 },
  messages: [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "My internet is not working." }
  ]
});
generation.result({
  id: "chatcmpl-123",
  object: "chat.completion",
  created: Date.now(),
  model: "gpt-4o",
  choices: [{
    index: 0,
    message: {
      role: "assistant",
      content: "Apologies for the inconvenience. Can you please share your customer id?"
    },
    finish_reason: "stop"
  }],
  usage: { prompt_tokens: 100, completion_tokens: 50, total_tokens: 150 }
});

See Generation Entity Documentation.

5. Add Spans, Events, and Tool Calls

Use spans to group logical units of work, events to mark milestones, and tool calls to track external service interactions.

const span = trace.span({
  id: "span-id",
  name: "plan-query",
  tags: { userId: "user-123" }
});
await span.event({
  id: "event-id",
  name: "retrieval-completed",
  tags: { "status": "success" }
});
const toolCall = trace.toolCall({
  id: "tool-call-id",
  name: "get-current-temperature",
  description: "Get current temperature for a given location.",
  args: { location: "Tokyo" },
  tags: { location: "Tokyo" }
});
toolCall.result({ temperature: "22°C" });

See Spans, Events, and Tool Calls.

6. Attach Files, URLs, and Data Blobs for Context

Enhance trace observability by attaching relevant files, images, or data.

trace.addAttachment({
  id: "attachment-id",
  type: "file",
  path: "./files/survey2025.pdf"
});
trace.addAttachment({
  id: "attachment-id",
  type: "url",
  url: "<https://sample-image.com/test-image>"
});

See Attachments Documentation.

7. Collect and Log User Feedback

Integrate user feedback for continuous improvement and subjective metric tracking.

trace.feedback({
  score: 5,
  feedback: "Great job!",
  metadata: { flow: "support", properties: { name: "John Doe" } }
});

See User Feedback Documentation.

8. Enable Session-Level Observability

Track multi-turn conversations and workflows by grouping related traces into sessions.

const session = logger.session({
  id: "session-id",
  name: "customer-support-session"
});
const trace = session.trace({
  id: "trace-id",
  name: "user-query"
});

See Sessions Documentation.

9. Export and Forward Traces for Unified Observability

Maxim supports forwarding traces to external platforms like New Relic, Snowflake, or any OpenTelemetry collector for unified observability and compliance.

Best Practices for RAG Observability in Production

  • Instrument observability early and consistently across all services and workflows.
  • Use rich metadata and tagging for advanced filtering and debugging.
  • Monitor key metrics: latency, error rate, token usage, cost, and evaluation scores.
  • Set up real-time alerts for anomalies, failures, or performance regressions.
  • Leverage hybrid evaluation: combine automated and human-in-the-loop assessments.
  • Track metrics across the pipeline right from embeddings to retrieval quality to generations.
  • Export data for compliance, external analytics, and refining workflows.

For a deeper exploration, refer to Mastering RAG Evaluation Using Maxim AI and Session-Level Observability for Conversational AI.

Code Example: End-to-End RAG Observability Flow

Below is a simplified end-to-end code example for instrumenting a RAG application using Maxim AI’s SDK (JavaScript/TypeScript):

import { Maxim } from "@maximai/maxim-js";
const maxim = new Maxim({ apiKey: "your-api-key" });
const logger = await maxim.logger({ id: "your-log-repository-id" });

const session = logger.session({ id: "session-id", name: "support-session" });
const trace = session.trace({ id: "trace-id", name: "user-query" });

trace.input("What are the best places to visit in 2025?");
const retrieval = trace.retrieval({ id: "retrieval-id", name: "travel-reports" });
retrieval.input("best places 2025");
retrieval.output(["Tokyo", "Barcelona", "Singapore", "Copenhagen", "Pune", "Seoul"]);

const generation = trace.generation({
  id: "generation-id",
  name: "recommendation-generation",
  provider: "openai",
  model: "gpt-4o",
  modelParameters: { temperature: 0.7 },
  messages: [
    { role: "system", content: "You are a travel assistant." },
    { role: "user", content: "What are the best places to visit in 2025?" }
  ]
});
generation.result({
  id: "chatcmpl-123",
  object: "chat.completion",
  created: Date.now(),
  model: "gpt-4o",
  choices: [{
    index: 0,
    message: { role: "assistant", content: "Based on recent reports, top destinations include Tokyo, Barcelona, Singapore, Copenhagen, Pune, and Seoul." },
    finish_reason: "stop"
  }],
  usage: { prompt_tokens: 100, completion_tokens: 50, total_tokens: 150 }
});

trace.feedback({ score: 5, feedback: "Excellent recommendations!", metadata: { flow: "travel" } });
trace.end();

Conclusion and Next Steps

Observing RAG applications in production is critical for delivering trustworthy, high-quality AI experiences. Maxim AI’s end-to-end observability platform empowers teams to instrument, monitor, and evaluate every component of their RAG pipelines with precision and flexibility. By following the steps and best practices outlined above with Maxim AI, engineering and product teams can proactively manage reliability, optimize performance, and ship AI applications faster.

To learn more or see Maxim AI in action, book a demo or sign up for free.