Files
llm-observability-mcp/OPENTELEMETRY.md

7.8 KiB

OpenTelemetry LLM Observability Integration

This MCP server now includes comprehensive OpenTelemetry support for LLM observability, compatible with any OpenTelemetry backend including Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more.

Features

  • Universal Compatibility: Works with any OpenTelemetry-compatible backend
  • Comprehensive Metrics: Request counts, token usage, latency, error rates
  • Distributed Tracing: Full request lifecycle tracking with spans
  • Flexible Configuration: Environment-based configuration for different backends
  • Zero-Code Integration: Drop-in replacement for existing observability tools

Quick Start

1. Install Dependencies

The OpenTelemetry dependencies are already included in the package.json:

npm install

2. Configure Your Backend

Jaeger (Local Development)

# Start Jaeger locally
docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

# Configure the MCP server
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-mcp

New Relic

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp

Grafana Cloud

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
export OTEL_SERVICE_NAME=llm-observability-mcp

Datadog

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp

3. Start the MCP Server

# Start with stdio transport
npm run mcp:stdio

# Start with HTTP transport
npm run mcp:http

Usage

Using the OpenTelemetry Tool

The MCP server provides a new tool: capture_llm_observability_opentelemetry

Required Parameters

  • userId: The distinct ID of the user
  • model: The model used (e.g., "gpt-4", "claude-3")
  • provider: The LLM provider (e.g., "openai", "anthropic")

Optional Parameters

  • traceId: Trace ID for grouping related events
  • input: The input to the LLM (messages, prompt, etc.)
  • outputChoices: The output from the LLM
  • inputTokens: Number of tokens in the input
  • outputTokens: Number of tokens in the output
  • latency: Latency of the LLM call in seconds
  • httpStatus: HTTP status code of the LLM call
  • baseUrl: Base URL of the LLM API
  • operationName: Name of the operation being performed
  • error: Error message if the request failed
  • errorType: Type of error (e.g., "rate_limit", "timeout")
  • mcpToolsUsed: List of MCP tools used during the request

Example Usage

{
  "userId": "user-123",
  "model": "gpt-4",
  "provider": "openai",
  "inputTokens": 150,
  "outputTokens": 75,
  "latency": 2.5,
  "httpStatus": 200,
  "operationName": "chat-completion",
  "traceId": "trace-abc123"
}

Configuration Reference

Environment Variables

Variable Description Default
OTEL_SERVICE_NAME Service name for OpenTelemetry llm-observability-mcp
OTEL_SERVICE_VERSION Service version 1.0.0
OTEL_ENVIRONMENT Environment name development
OTEL_EXPORTER_OTLP_ENDPOINT Default OTLP endpoint -
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT Metrics endpoint -
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT Traces endpoint -
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT Logs endpoint -
OTEL_EXPORTER_OTLP_HEADERS Headers for authentication (format: "key1=value1,key2=value2") -
OTEL_METRIC_EXPORT_INTERVAL Metrics export interval in ms 10000
OTEL_METRIC_EXPORT_TIMEOUT Metrics export timeout in ms 5000
OTEL_TRACES_SAMPLER_ARG Sampling ratio (0.0-1.0) 1.0

Backend-Specific Configuration

New Relic

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_LICENSE_KEY"

Jaeger

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Grafana Cloud

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"

Honeycomb

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"

Lightstep

export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"

Metrics Collected

Counters

  • llm.requests.total: Total number of LLM requests
  • llm.tokens.total: Total tokens used (input + output)

Histograms

  • llm.latency.duration: Request latency in milliseconds

Gauges

  • llm.requests.active: Number of active requests

Trace Attributes

  • llm.model: The model used
  • llm.provider: The provider name
  • llm.user_id: The user ID
  • llm.operation: The operation name
  • llm.input_tokens: Input token count
  • llm.output_tokens: Output token count
  • llm.total_tokens: Total token count
  • llm.latency_ms: Latency in milliseconds
  • llm.http_status: HTTP status code
  • llm.base_url: API base URL
  • llm.error: Error message (if any)
  • llm.error_type: Error type classification
  • llm.input: Input content (optional)
  • llm.output: Output content (optional)
  • llm.mcp_tools_used: MCP tools used

Testing with Jaeger

1. Start Jaeger

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

2. Configure MCP Server

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-mcp
npm run mcp:stdio

3. View Traces

Open http://localhost:16686 to view traces in Jaeger UI.

Migration from PostHog

The OpenTelemetry tool is designed to be a drop-in replacement for the PostHog tool. Both tools can coexist, allowing for gradual migration:

  1. PostHog Tool: capture_llm_observability
  2. OpenTelemetry Tool: capture_llm_observability_opentelemetry

Both tools accept the same parameters, making migration straightforward.

Troubleshooting

Common Issues

No Data in Backend

  1. Verify endpoint URLs are correct
  2. Check authentication headers
  3. Ensure network connectivity
  4. Check server logs for errors

High Resource Usage

  1. Adjust sampling ratio: OTEL_TRACES_SAMPLER_ARG=0.1
  2. Increase export intervals: OTEL_METRIC_EXPORT_INTERVAL=30000

Missing Traces

  1. Verify OpenTelemetry is enabled (check for endpoint configuration)
  2. Check for initialization errors in logs
  3. Ensure proper service name configuration

Debug Mode

Enable debug logging:

export DEBUG=true
npm run mcp:stdio

Advanced Configuration

Custom Headers

export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token,Custom-Header=value"

Multiple Backends

Configure different endpoints for metrics and traces:

export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=https://metrics.example.com
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://traces.example.com

Sampling Configuration

# Sample 10% of traces
export OTEL_TRACES_SAMPLER_ARG=0.1

Support

For issues or questions:

  1. Check the troubleshooting section above
  2. Review server logs with DEBUG=true
  3. Verify OpenTelemetry configuration
  4. Test with Jaeger locally first