Initial implementation of opentelemetry-llm tool

2025-07-14 16:27:29 -05:00
parent 63cf87a6c6
commit 1f201a093f
14 changed files with 3191 additions and 61 deletions
--- a/OPENTELEMETRY.md
+++ b/OPENTELEMETRY.md
@@ -0,0 +1,301 @@
+# OpenTelemetry LLM Observability Integration
+
+This MCP server now includes comprehensive OpenTelemetry support for LLM observability, compatible with any OpenTelemetry backend including Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more.
+
+## Features
+
+- **Universal Compatibility**: Works with any OpenTelemetry-compatible backend
+- **Comprehensive Metrics**: Request counts, token usage, latency, error rates
+- **Distributed Tracing**: Full request lifecycle tracking with spans
+- **Flexible Configuration**: Environment-based configuration for different backends
+- **Zero-Code Integration**: Drop-in replacement for existing observability tools
+
+## Quick Start
+
+### 1. Install Dependencies
+
+The OpenTelemetry dependencies are already included in the package.json:
+
+```bash
+npm install
+```
+
+### 2. Configure Your Backend
+
+#### Jaeger (Local Development)
+
+```bash
+# Start Jaeger locally
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+
+# Configure the MCP server
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### New Relic
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
+export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Grafana Cloud
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
+export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Datadog
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
+export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+### 3. Start the MCP Server
+
+```bash
+# Start with stdio transport
+npm run mcp:stdio
+
+# Start with HTTP transport
+npm run mcp:http
+```
+
+## Usage
+
+### Using the OpenTelemetry Tool
+
+The MCP server provides a new tool: `capture_llm_observability_opentelemetry`
+
+#### Required Parameters
+
+- `userId`: The distinct ID of the user
+- `model`: The model used (e.g., "gpt-4", "claude-3")
+- `provider`: The LLM provider (e.g., "openai", "anthropic")
+
+#### Optional Parameters
+
+- `traceId`: Trace ID for grouping related events
+- `input`: The input to the LLM (messages, prompt, etc.)
+- `outputChoices`: The output from the LLM
+- `inputTokens`: Number of tokens in the input
+- `outputTokens`: Number of tokens in the output
+- `latency`: Latency of the LLM call in seconds
+- `httpStatus`: HTTP status code of the LLM call
+- `baseUrl`: Base URL of the LLM API
+- `operationName`: Name of the operation being performed
+- `error`: Error message if the request failed
+- `errorType`: Type of error (e.g., "rate_limit", "timeout")
+- `mcpToolsUsed`: List of MCP tools used during the request
+
+### Example Usage
+
+```json
+{
+  "userId": "user-123",
+  "model": "gpt-4",
+  "provider": "openai",
+  "inputTokens": 150,
+  "outputTokens": 75,
+  "latency": 2.5,
+  "httpStatus": 200,
+  "operationName": "chat-completion",
+  "traceId": "trace-abc123"
+}
+```
+
+## Configuration Reference
+
+### Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `OTEL_SERVICE_NAME` | Service name for OpenTelemetry | `llm-observability-mcp` |
+| `OTEL_SERVICE_VERSION` | Service version | `1.0.0` |
+| `OTEL_ENVIRONMENT` | Environment name | `development` |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | Default OTLP endpoint | - |
+| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics endpoint | - |
+| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Traces endpoint | - |
+| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs endpoint | - |
+| `OTEL_EXPORTER_OTLP_HEADERS` | Headers for authentication (format: "key1=value1,key2=value2") | - |
+| `OTEL_METRIC_EXPORT_INTERVAL` | Metrics export interval in ms | `10000` |
+| `OTEL_METRIC_EXPORT_TIMEOUT` | Metrics export timeout in ms | `5000` |
+| `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | `1.0` |
+
+### Backend-Specific Configuration
+
+#### New Relic
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
+export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_LICENSE_KEY"
+```
+
+#### Jaeger
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+```
+
+#### Grafana Cloud
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
+export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
+```
+
+#### Honeycomb
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
+export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
+```
+
+#### Lightstep
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
+export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
+```
+
+## Metrics Collected
+
+### Counters
+
+- `llm.requests.total`: Total number of LLM requests
+- `llm.tokens.total`: Total tokens used (input + output)
+
+### Histograms
+
+- `llm.latency.duration`: Request latency in milliseconds
+
+### Gauges
+
+- `llm.requests.active`: Number of active requests
+
+### Trace Attributes
+
+- `llm.model`: The model used
+- `llm.provider`: The provider name
+- `llm.user_id`: The user ID
+- `llm.operation`: The operation name
+- `llm.input_tokens`: Input token count
+- `llm.output_tokens`: Output token count
+- `llm.total_tokens`: Total token count
+- `llm.latency_ms`: Latency in milliseconds
+- `llm.http_status`: HTTP status code
+- `llm.base_url`: API base URL
+- `llm.error`: Error message (if any)
+- `llm.error_type`: Error type classification
+- `llm.input`: Input content (optional)
+- `llm.output`: Output content (optional)
+- `llm.mcp_tools_used`: MCP tools used
+
+## Testing with Jaeger
+
+### 1. Start Jaeger
+
+```bash
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+```
+
+### 2. Configure MCP Server
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+export OTEL_SERVICE_NAME=llm-observability-mcp
+npm run mcp:stdio
+```
+
+### 3. View Traces
+
+Open <http://localhost:16686> to view traces in Jaeger UI.
+
+## Migration from PostHog
+
+The OpenTelemetry tool is designed to be a drop-in replacement for the PostHog tool. Both tools can coexist, allowing for gradual migration:
+
+1. **PostHog Tool**: `capture_llm_observability`
+2. **OpenTelemetry Tool**: `capture_llm_observability_opentelemetry`
+
+Both tools accept the same parameters, making migration straightforward.
+
+## Troubleshooting
+
+### Common Issues
+
+#### No Data in Backend
+
+1. Verify endpoint URLs are correct
+2. Check authentication headers
+3. Ensure network connectivity
+4. Check server logs for errors
+
+#### High Resource Usage
+
+1. Adjust sampling ratio: `OTEL_TRACES_SAMPLER_ARG=0.1`
+2. Increase export intervals: `OTEL_METRIC_EXPORT_INTERVAL=30000`
+
+#### Missing Traces
+
+1. Verify OpenTelemetry is enabled (check for endpoint configuration)
+2. Check for initialization errors in logs
+3. Ensure proper service name configuration
+
+### Debug Mode
+
+Enable debug logging:
+
+```bash
+export DEBUG=true
+npm run mcp:stdio
+```
+
+## Advanced Configuration
+
+### Custom Headers
+
+```bash
+export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token,Custom-Header=value"
+```
+
+### Multiple Backends
+
+Configure different endpoints for metrics and traces:
+
+```bash
+export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=https://metrics.example.com
+export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://traces.example.com
+```
+
+### Sampling Configuration
+
+```bash
+# Sample 10% of traces
+export OTEL_TRACES_SAMPLER_ARG=0.1
+```
+
+## Support
+
+For issues or questions:
+
+1. Check the troubleshooting section above
+2. Review server logs with `DEBUG=true`
+3. Verify OpenTelemetry configuration
+4. Test with Jaeger locally first