Files

Stefano 97f358245d Based on the staged changes, here are the appropriate commit messages:

1. For the documentation reorganization and tool renaming:
docs(opentelemetry): reorganize documentation and rename observability tools

- Move OpenTelemetry documentation to docs/ directory
- Rename tools from 'capture_llm_observability_opentelemetry' to 'llm_observability_otel'
- Rename PostHog tool from 'capture_llm_observability' to 'llm_observability_posthog'
- Update README to reflect new tool names and documentation structure

2. For the file deletions and additions:
chore(docs): remove old documentation files

- Delete OPENTELEMETRY.md and examples/opentelemetry-usage.md
- Add new comprehensive docs/opentelemetry.md

3. For the tool implementation changes:
refactor(tools): update tool names in implementation files

- Update tool names in opentelemetry-llm.tool.ts and posthog-llm.tool.ts
- Keep all functionality identical, only change naming

2025-07-15 17:17:01 -05:00

8.7 KiB

Raw Permalink Blame History

OpenTelemetry LLM Observability

This document provides comprehensive guidance for using the OpenTelemetry LLM observability tool with the MCP server. It covers setup, configuration, usage, troubleshooting, and practical examples for all major OpenTelemetry-compatible backends.

Features

Universal Compatibility: Works with Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more
Comprehensive Metrics: Request counts, token usage, latency, error rates
Distributed Tracing: Full request lifecycle tracking with spans
Flexible Configuration: Environment-based configuration for different backends
Zero-Code Integration: Drop-in replacement for existing observability tools

Quick Start

1. Install Dependencies

OpenTelemetry dependencies are included in package.json:

npm install

2. Configure Your Backend

Jaeger (Local Development)

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-mcp

New Relic

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp

Grafana Cloud

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
export OTEL_SERVICE_NAME=llm-observability-mcp

Honeycomb

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp

Datadog

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp

Lightstep

export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
export OTEL_SERVICE_NAME=llm-observability-mcp

Kubernetes Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-observability-mcp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-observability-mcp
  template:
    metadata:
      labels:
        app: llm-observability-mcp
    spec:
      containers:
      - name: llm-observability-mcp
        image: llm-observability-mcp:latest
        ports:
        - containerPort: 3000
        env:
        - name: OTEL_SERVICE_NAME
          value: "llm-observability-mcp"
        - name: OTEL_SERVICE_VERSION
          value: "1.2.3"
        - name: OTEL_ENVIRONMENT
          value: "production"
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "https://your-backend.com:4318"
        - name: OTEL_EXPORTER_OTLP_HEADERS
          valueFrom:
            secretKeyRef:
              name: otel-credentials
              key: headers

Running the MCP Server

# Start with stdio transport
npm run mcp:stdio
# Start with HTTP transport
npm run mcp:http

Usage

OpenTelemetry Tool: `llm_observability_otel`

Required Parameters

userId: The distinct ID of the user
model: The model used (e.g., "gpt-4", "claude-3")
provider: The LLM provider (e.g., "openai", "anthropic")

Optional Parameters

traceId: Trace ID for grouping related events
input: The input to the LLM (messages, prompt, etc.)
outputChoices: The output from the LLM
inputTokens: Number of tokens in the input
outputTokens: Number of tokens in the output
latency: Latency of the LLM call in seconds
httpStatus: HTTP status code of the LLM call
baseUrl: Base URL of the LLM API
operationName: Name of the operation being performed
error: Error message if the request failed
errorType: Type of error (e.g., "rate_limit", "timeout")
mcpToolsUsed: List of MCP tools used during the request

Example Usage

{
  "tool": "llm_observability_otel",
  "arguments": {
    "userId": "user-12345",
    "model": "gpt-4",
    "provider": "openai",
    "inputTokens": 150,
    "outputTokens": 75,
    "latency": 2.3,
    "httpStatus": 200,
    "operationName": "chat-completion",
    "traceId": "trace-abc123",
    "input": "What is the weather like today?",
    "outputChoices": ["The weather is sunny and 75°F today."]
  }
}

Configuration Reference

Variable	Description	Default
`OTEL_SERVICE_NAME`	Service name for OpenTelemetry	`llm-observability-mcp`
`OTEL_SERVICE_VERSION`	Service version	`1.0.0`
`OTEL_ENVIRONMENT`	Environment name	`development`
`OTEL_EXPORTER_OTLP_ENDPOINT`	Default OTLP endpoint	-
`OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`	Metrics endpoint	-
`OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`	Traces endpoint	-
`OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`	Logs endpoint	-
`OTEL_EXPORTER_OTLP_HEADERS`	Headers for authentication (format: "key1=value1,key2=value2")	-
`OTEL_METRIC_EXPORT_INTERVAL`	Metrics export interval in ms	`10000`
`OTEL_METRIC_EXPORT_TIMEOUT`	Metrics export timeout in ms	`5000`
`OTEL_TRACES_SAMPLER_ARG`	Sampling ratio (0.0-1.0)	`1.0`

Metrics Collected

llm.requests.total: Total number of LLM requests
llm.tokens.total: Total tokens used (input + output)
llm.latency.duration: Request latency in milliseconds
llm.requests.active: Number of active requests

Trace Attributes

llm.model, llm.provider, llm.user_id, llm.operation, llm.input_tokens, llm.output_tokens, llm.total_tokens, llm.latency_ms, llm.http_status, llm.base_url, llm.error, llm.error_type, llm.input, llm.output, llm.mcp_tools_used

Practical Examples

Jaeger: View Traces

Open http://localhost:16686 to see your traces.

Error Tracking Example

{
  "tool": "llm_observability_otel",
  "arguments": {
    "userId": "user-12345",
    "model": "gpt-4",
    "provider": "openai",
    "httpStatus": 429,
    "error": "Rate limit exceeded",
    "errorType": "rate_limit",
    "latency": 0.1,
    "operationName": "chat-completion"
  }
}

Multi-Tool Usage Tracking Example

{
  "tool": "llm_observability_otel",
  "arguments": {
    "userId": "user-12345",
    "model": "gpt-4",
    "provider": "openai",
    "inputTokens": 500,
    "outputTokens": 200,
    "latency": 5.2,
    "httpStatus": 200,
    "operationName": "complex-workflow",
    "mcpToolsUsed": ["file_read", "web_search", "code_execution"],
    "traceId": "complex-workflow-123"
  }
}

Testing Script

#!/bin/bash
# test-opentelemetry.sh

docker run -d --name jaeger-test \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
sleep 5
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-test
export OTEL_ENVIRONMENT=test
npm run mcp:stdio &
sleep 3
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "llm_observability_otel",
    "arguments": {
      "userId": "test-user",
      "model": "gpt-4",
      "provider": "openai",
      "inputTokens": 100,
      "outputTokens": 50,
      "latency": 1.5,
      "httpStatus": 200,
      "operationName": "test-completion"
    }
  }'
echo "Test complete. View traces at http://localhost:16686"

Migration from PostHog

The OpenTelemetry tool is a drop-in replacement for the PostHog tool. Both can coexist for gradual migration:

PostHog Tool: llm_observability_posthog
OpenTelemetry Tool: llm_observability_otel

Both accept the same parameters.

Troubleshooting & Performance

Common Issues

No data in backend: check endpoint URLs, authentication, network, server logs
High resource usage: lower sampling (OTEL_TRACES_SAMPLER_ARG), increase export intervals
Missing traces: verify OpenTelemetry is enabled, check logs, service name

Debug Mode

export DEBUG=true
npm run mcp:stdio

Performance Tuning

Reduce sampling for high-volume: OTEL_TRACES_SAMPLER_ARG=0.01
Increase export intervals: OTEL_METRIC_EXPORT_INTERVAL=60000
Disable metrics/logs if not needed: unset OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, unset OTEL_EXPORTER_OTLP_LOGS_ENDPOINT

Support

For issues or questions:

Check this document and troubleshooting
Review server logs with DEBUG=true
Verify OpenTelemetry configuration
Test with Jaeger locally first

8.7 KiB Raw Permalink Blame History