Observability

How to Implement Observability in Multi-Step Agentic Workflows: A Technical Guide with Code Examples

Introduction

Observability is the backbone of reliable, scalable, and trustworthy AI systems. As AI applications evolve from simple, single-step chatbots to complex, multi-step agentic workflows (incorporating RAG pipelines, tool calls, and multi-turn conversations) the need for robust observability becomes paramount. This blog provides a comprehensive, technical walkthrough for implementing observability in multi-step AI applications, leveraging Maxim AI’s end-to-end platform and SDKs. We’ll cover architectural principles, practical code examples, and best practices for tracking, debugging, and optimizing agentic workflows.

Why Observability Matters in Multi-Step AI Applications

Modern AI applications are rarely monolithic. They often consist of multiple sub-systems, each responsible for a distinct part of the workflow, such as planning, retrieval, generation, external tool invocation, etc. Without observability, these systems become black boxes, making it difficult to:

Uncover failure modes and performance bottlenecks
Monitor critical metrics (token usage, latency, cost, model parameters)
Detect and resolve issues proactively
Track retrieval and tool call failures
Ensure trustworthy AI and compliance

Traditional monitoring tools fall short for LLM observability, as they cannot capture the nuanced, multi-step reasoning and data flows inherent in agentic systems. Maxim AI’s platform addresses these gaps with distributed tracing, real-time monitoring, and flexible evaluation frameworks.

Architectural Principles for AI Observability

1. Distributed Tracing

Distributed tracing tracks the complete lifecycle of a request as it traverses multiple services and components. In Maxim, a trace represents the end-to-end processing of a user request, while spans break down the trace into logical units of work (e.g., planning, retrieval, generation).

Trace: Represents the end-to-end processing of a user request
Span: Sub-unit of a trace, can have child spans, tags, metadata
Session: Groups related traces for multi-turn conversations

Learn more about tracing concepts in Maxim Docs

2. Granular Entity Logging

Maxim’s observability framework extends beyond traces and spans to include:

Generations: Individual LLM calls, with input messages, model parameters, and results
Retrievals: RAG pipeline queries to vector databases or knowledge bases
Tool Calls: External system/service invocations triggered by LLM responses
Events: Milestones or state changes within traces/spans
Attachments: Files, images, or URLs for richer context
User Feedback: Structured ratings and comments for quality assessment

Explore Maxim’s SDK entities

3. OpenTelemetry Compatibility

Maxim’s SDKs and APIs are compatible with OpenTelemetry, enabling seamless integration with existing observability stacks (New Relic, Snowflake, OTLP collectors) and standardized trace formats.

Step-by-Step Implementation: Observability in a Multi-Step AI Workflow

Let’s walk through a practical example: instrumenting a multi-step enterprise search chatbot using Maxim’s JavaScript SDK. The architecture includes:

API Gateway (authentication, routing)
Planner (execution plan creation)
Intent Detector (query analysis)
Answer Generator (prompt creation, RAG context)
RAG Pipeline (vector database retrieval)

1. Setting Up Maxim Observability

a. Create a Log Repository

Organize logs by environment, application, or service for efficient analysis.

// In Maxim Dashboard: Create repository "Chatbot Production"

Repository setup guide

b. Install the SDK

npm install @maximai/maxim-js

SDK installation docs

c. Initialize the Logger

import { Maxim } from "@maximai/maxim-js";

const maxim = new Maxim({ apiKey: "your-api-key" });
const logger = await maxim.logger({ id: "your-repo-id" });

Logger initialization

2. Instrumenting the API Gateway

a. Create a Trace for Each User Request

Use a unique request ID (e.g., cf-request-id) to track the lifecycle.

const trace = logger.trace({
    id: req.headers["cf-request-id"],
    name: "user-query",
    tags: {
        userId: req.body.userId,
        accountId: req.body.accountId
    },
    metadata: {
        environment: "production",
        sessionId: req.body.sessionId
    }
});

trace.input(req.body.query);

Trace creation and tagging

3. Adding Spans for Microservice Operations

Each microservice (planner, intent detector, etc.) should create a span within the trace.

const span = trace.span({
    id: uuid(),
    name: "plan-query",
    tags: {
        userId: req.body.userId,
        accountId: req.body.accountId
    },
    metadata: {
        service: "planner"
    }
});

Span implementation

4. Logging RAG Retrievals

Capture context retrievals from vector databases.

const retrieval = trace.retrieval({
    id: uuid(),
    name: "vector-db-retrieval",
    metadata: {
        vectorDb: "pinecone",
        indexName: "company-docs"
    }
});

retrieval.input("search query");
retrieval.output(["doc1", "doc2", "doc3"]);

RAG tracing

5. Tracking LLM Generations

Log each LLM call with model details and results.

const generation = span.generation({
    id: uuid(),
    name: "generate-answer",
    provider: "openai",
    model: "gpt-4o",
    modelParameters: { temperature: 0.7 },
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: req.body.query }
    ],
    metadata: {
        promptVersion: "v2.1"
    }
});

// After LLM response
generation.result({
    id: "chatcmpl-123",
    object: "chat.completion",
    created: Date.now(),
    model: "gpt-4o",
    choices: [{
        index: 0,
        message: {
            role: "assistant",
            content: "Here is your answer."
        },
        finish_reason: "stop"
    }],
    usage: {
        prompt_tokens: 100,
        completion_tokens: 50,
        total_tokens: 150
    }
});

LLM generation logging

6. Logging Tool Calls and External Integrations

Track calls to external APIs or services triggered by the agent.

const traceToolCall = trace.toolCall({
    id: uuid(),
    name: "fetch-weather",
    description: "Get current temperature for a given location.",
    args: { location: "New York" },
    tags: { location: "New York" }
});

try {
    const result = callExternalService("fetch-weather", { location: "New York" });
    traceToolCall.result(result);
} catch (error) {
    traceToolCall.error(error);
}

Tool call tracking

7. Capturing Events and Milestones

Mark significant points in the workflow for debugging and analytics.

await trace.event({
    id: uuid(),
    name: "answer-sent",
    tags: { userId: req.body.userId }
});

Event logging

8. Attaching Files and URLs for Rich Context

Add attachments for audit trails or debugging.

trace.addAttachment({
    id: uuid(),
    type: "url",
    url: "<https://sample-image.com/test-image>"
});

Attachment handling

9. Collecting User Feedback

Integrate user ratings and comments for continuous improvement.

trace.feedback({
    score: 5,
    feedback: "Great job!",
    metadata: {
        flow: "support",
        properties: { name: "John Doe" }
    }
});

User feedback integration

10. Error Tracking and Hallucination Detection

Log errors from LLMs and tool calls for reliability and debugging.

generation.error({
    message: "Rate limit exceeded. Please try again later.",
    type: "RateLimitError",
    code: "429"
});

Error tracking

Advanced: OpenTelemetry Integration and Data Forwarding

Maxim supports OTLP ingestion and can forward enriched traces to platforms like New Relic, Snowflake, or any OpenTelemetry collector. This enables unified observability across your entire stack.

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry import trace as trace_api

maxim_api_key = "your_api_key_here"
repo_id = "your_repository_id_here"

tracer_provider = trace_sdk.TracerProvider()
span_exporter = OTLPSpanExporter(
    endpoint="<https://api.getmaxim.ai/v1/otel>",
    headers={
        "x-maxim-api-key": f"{maxim_api_key}",
        "x-maxim-repo-id": f"{repo_id}",
    },
)

tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter))
trace_api.set_tracer_provider(tracer_provider)

OpenTelemetry integration guide

Best Practices for Observability in Multi-Step AI Applications

Structure log repositories by environment, application, and service for efficient analysis (source).
Tag traces, spans, and entities with meaningful context (user ID, session ID, environment, etc.) for powerful filtering and debugging.
Capture all relevant entities (generations, retrievals, tool calls, events, attachments, feedback) for comprehensive visibility.
Monitor real-time metrics (cost, latency, token usage, error rates) and set up alerts for critical thresholds (source).
Leverage custom dashboards and saved views for rapid issue resolution and deep insights (source).
Integrate human-in-the-loop evaluations for last-mile quality checks (source).
Forward traces to external platforms for unified observability and long-term analytics (source).

Conclusion

Implementing observability in multi-step AI applications is not just a technical necessity, it’s a strategic advantage. With Maxim AI’s distributed tracing, granular entity logging, and seamless OpenTelemetry integration, engineering and product teams can build, monitor, and optimize agentic workflows with confidence. From debugging LLM applications to tracking RAG pipelines and tool calls, Maxim empowers you to deliver reliable, high-quality AI experiences and ship Ai applications faster.

Ready to experience the future of AI observability? Book a demo or sign up now to get started with Maxim AI.