# OpenTelemetry LLM Observability Integration This MCP server now includes comprehensive OpenTelemetry support for LLM observability, compatible with any OpenTelemetry backend including Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more. ## Features - **Universal Compatibility**: Works with any OpenTelemetry-compatible backend - **Comprehensive Metrics**: Request counts, token usage, latency, error rates - **Distributed Tracing**: Full request lifecycle tracking with spans - **Flexible Configuration**: Environment-based configuration for different backends - **Zero-Code Integration**: Drop-in replacement for existing observability tools ## Quick Start ### 1. Install Dependencies The OpenTelemetry dependencies are already included in the package.json: ```bash npm install ``` ### 2. Configure Your Backend #### Jaeger (Local Development) ```bash # Start Jaeger locally docker run -d --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest # Configure the MCP server export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 export OTEL_SERVICE_NAME=llm-observability-mcp ``` #### New Relic ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318 export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY" export OTEL_SERVICE_NAME=llm-observability-mcp ``` #### Grafana Cloud ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)" export OTEL_SERVICE_NAME=llm-observability-mcp ``` #### Datadog ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY" export OTEL_SERVICE_NAME=llm-observability-mcp ``` ### 3. Start the MCP Server ```bash # Start with stdio transport npm run mcp:stdio # Start with HTTP transport npm run mcp:http ``` ## Usage ### Using the OpenTelemetry Tool The MCP server provides a new tool: `capture_llm_observability_opentelemetry` #### Required Parameters - `userId`: The distinct ID of the user - `model`: The model used (e.g., "gpt-4", "claude-3") - `provider`: The LLM provider (e.g., "openai", "anthropic") #### Optional Parameters - `traceId`: Trace ID for grouping related events - `input`: The input to the LLM (messages, prompt, etc.) - `outputChoices`: The output from the LLM - `inputTokens`: Number of tokens in the input - `outputTokens`: Number of tokens in the output - `latency`: Latency of the LLM call in seconds - `httpStatus`: HTTP status code of the LLM call - `baseUrl`: Base URL of the LLM API - `operationName`: Name of the operation being performed - `error`: Error message if the request failed - `errorType`: Type of error (e.g., "rate_limit", "timeout") - `mcpToolsUsed`: List of MCP tools used during the request ### Example Usage ```json { "userId": "user-123", "model": "gpt-4", "provider": "openai", "inputTokens": 150, "outputTokens": 75, "latency": 2.5, "httpStatus": 200, "operationName": "chat-completion", "traceId": "trace-abc123" } ``` ## Configuration Reference ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `OTEL_SERVICE_NAME` | Service name for OpenTelemetry | `llm-observability-mcp` | | `OTEL_SERVICE_VERSION` | Service version | `1.0.0` | | `OTEL_ENVIRONMENT` | Environment name | `development` | | `OTEL_EXPORTER_OTLP_ENDPOINT` | Default OTLP endpoint | - | | `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics endpoint | - | | `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Traces endpoint | - | | `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs endpoint | - | | `OTEL_EXPORTER_OTLP_HEADERS` | Headers for authentication (format: "key1=value1,key2=value2") | - | | `OTEL_METRIC_EXPORT_INTERVAL` | Metrics export interval in ms | `10000` | | `OTEL_METRIC_EXPORT_TIMEOUT` | Metrics export timeout in ms | `5000` | | `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | `1.0` | ### Backend-Specific Configuration #### New Relic ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318 export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_LICENSE_KEY" ``` #### Jaeger ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 ``` #### Grafana Cloud ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)" ``` #### Honeycomb ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY" ``` #### Lightstep ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN" ``` ## Metrics Collected ### Counters - `llm.requests.total`: Total number of LLM requests - `llm.tokens.total`: Total tokens used (input + output) ### Histograms - `llm.latency.duration`: Request latency in milliseconds ### Gauges - `llm.requests.active`: Number of active requests ### Trace Attributes - `llm.model`: The model used - `llm.provider`: The provider name - `llm.user_id`: The user ID - `llm.operation`: The operation name - `llm.input_tokens`: Input token count - `llm.output_tokens`: Output token count - `llm.total_tokens`: Total token count - `llm.latency_ms`: Latency in milliseconds - `llm.http_status`: HTTP status code - `llm.base_url`: API base URL - `llm.error`: Error message (if any) - `llm.error_type`: Error type classification - `llm.input`: Input content (optional) - `llm.output`: Output content (optional) - `llm.mcp_tools_used`: MCP tools used ## Testing with Jaeger ### 1. Start Jaeger ```bash docker run -d --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest ``` ### 2. Configure MCP Server ```bash export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 export OTEL_SERVICE_NAME=llm-observability-mcp npm run mcp:stdio ``` ### 3. View Traces Open to view traces in Jaeger UI. ## Migration from PostHog The OpenTelemetry tool is designed to be a drop-in replacement for the PostHog tool. Both tools can coexist, allowing for gradual migration: 1. **PostHog Tool**: `capture_llm_observability` 2. **OpenTelemetry Tool**: `capture_llm_observability_opentelemetry` Both tools accept the same parameters, making migration straightforward. ## Troubleshooting ### Common Issues #### No Data in Backend 1. Verify endpoint URLs are correct 2. Check authentication headers 3. Ensure network connectivity 4. Check server logs for errors #### High Resource Usage 1. Adjust sampling ratio: `OTEL_TRACES_SAMPLER_ARG=0.1` 2. Increase export intervals: `OTEL_METRIC_EXPORT_INTERVAL=30000` #### Missing Traces 1. Verify OpenTelemetry is enabled (check for endpoint configuration) 2. Check for initialization errors in logs 3. Ensure proper service name configuration ### Debug Mode Enable debug logging: ```bash export DEBUG=true npm run mcp:stdio ``` ## Advanced Configuration ### Custom Headers ```bash export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token,Custom-Header=value" ``` ### Multiple Backends Configure different endpoints for metrics and traces: ```bash export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=https://metrics.example.com export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://traces.example.com ``` ### Sampling Configuration ```bash # Sample 10% of traces export OTEL_TRACES_SAMPLER_ARG=0.1 ``` ## Support For issues or questions: 1. Check the troubleshooting section above 2. Review server logs with `DEBUG=true` 3. Verify OpenTelemetry configuration 4. Test with Jaeger locally first