Based on the staged changes, here are the appropriate commit messages:

1. For the documentation reorganization and tool renaming: docs(opentelemetry): reorganize documentation and rename observability tools - Move OpenTelemetry documentation to docs/ directory - Rename tools from 'capture_llm_observability_opentelemetry' to 'llm_observability_otel' - Rename PostHog tool from 'capture_llm_observability' to 'llm_observability_posthog' - Update README to reflect new tool names and documentation structure 2. For the file deletions and additions: chore(docs): remove old documentation files - Delete OPENTELEMETRY.md and examples/opentelemetry-usage.md - Add new comprehensive docs/opentelemetry.md 3. For the tool implementation changes: refactor(tools): update tool names in implementation files - Update tool names in opentelemetry-llm.tool.ts and posthog-llm.tool.ts - Keep all functionality identical, only change naming
2025-07-15 17:17:01 -05:00
parent fef71122cf
commit 97f358245d
7 changed files with 453 additions and 763 deletions
--- a/docs/opentelemetry.md
+++ b/docs/opentelemetry.md
@@ -0,0 +1,332 @@
+# OpenTelemetry LLM Observability
+
+This document provides comprehensive guidance for using the OpenTelemetry LLM observability tool with the MCP server. It covers setup, configuration, usage, troubleshooting, and practical examples for all major OpenTelemetry-compatible backends.
+
+## Features
+
+- **Universal Compatibility**: Works with Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more
+- **Comprehensive Metrics**: Request counts, token usage, latency, error rates
+- **Distributed Tracing**: Full request lifecycle tracking with spans
+- **Flexible Configuration**: Environment-based configuration for different backends
+- **Zero-Code Integration**: Drop-in replacement for existing observability tools
+
+---
+
+## Quick Start
+
+### 1. Install Dependencies
+
+OpenTelemetry dependencies are included in `package.json`:
+
+```bash
+npm install
+```
+
+### 2. Configure Your Backend
+
+#### Jaeger (Local Development)
+
+```bash
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### New Relic
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
+export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Grafana Cloud
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
+export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Honeycomb
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
+export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Datadog
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
+export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Lightstep
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
+export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
+export OTEL_SERVICE_NAME=llm-observability-mcp
+```
+
+#### Kubernetes Example
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: llm-observability-mcp
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: llm-observability-mcp
+  template:
+    metadata:
+      labels:
+        app: llm-observability-mcp
+    spec:
+      containers:
+      - name: llm-observability-mcp
+        image: llm-observability-mcp:latest
+        ports:
+        - containerPort: 3000
+        env:
+        - name: OTEL_SERVICE_NAME
+          value: "llm-observability-mcp"
+        - name: OTEL_SERVICE_VERSION
+          value: "1.2.3"
+        - name: OTEL_ENVIRONMENT
+          value: "production"
+        - name: OTEL_EXPORTER_OTLP_ENDPOINT
+          value: "https://your-backend.com:4318"
+        - name: OTEL_EXPORTER_OTLP_HEADERS
+          valueFrom:
+            secretKeyRef:
+              name: otel-credentials
+              key: headers
+```
+
+---
+
+## Running the MCP Server
+
+```bash
+# Start with stdio transport
+npm run mcp:stdio
+# Start with HTTP transport
+npm run mcp:http
+```
+
+---
+
+## Usage
+
+### OpenTelemetry Tool: `llm_observability_otel`
+
+#### Required Parameters
+
+- `userId`: The distinct ID of the user
+- `model`: The model used (e.g., "gpt-4", "claude-3")
+- `provider`: The LLM provider (e.g., "openai", "anthropic")
+
+#### Optional Parameters
+
+- `traceId`: Trace ID for grouping related events
+- `input`: The input to the LLM (messages, prompt, etc.)
+- `outputChoices`: The output from the LLM
+- `inputTokens`: Number of tokens in the input
+- `outputTokens`: Number of tokens in the output
+- `latency`: Latency of the LLM call in seconds
+- `httpStatus`: HTTP status code of the LLM call
+- `baseUrl`: Base URL of the LLM API
+- `operationName`: Name of the operation being performed
+- `error`: Error message if the request failed
+- `errorType`: Type of error (e.g., "rate_limit", "timeout")
+- `mcpToolsUsed`: List of MCP tools used during the request
+
+#### Example Usage
+
+```json
+{
+  "tool": "llm_observability_otel",
+  "arguments": {
+    "userId": "user-12345",
+    "model": "gpt-4",
+    "provider": "openai",
+    "inputTokens": 150,
+    "outputTokens": 75,
+    "latency": 2.3,
+    "httpStatus": 200,
+    "operationName": "chat-completion",
+    "traceId": "trace-abc123",
+    "input": "What is the weather like today?",
+    "outputChoices": ["The weather is sunny and 75°F today."]
+  }
+}
+```
+
+---
+
+## Configuration Reference
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `OTEL_SERVICE_NAME` | Service name for OpenTelemetry | `llm-observability-mcp` |
+| `OTEL_SERVICE_VERSION` | Service version | `1.0.0` |
+| `OTEL_ENVIRONMENT` | Environment name | `development` |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | Default OTLP endpoint | - |
+| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics endpoint | - |
+| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Traces endpoint | - |
+| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs endpoint | - |
+| `OTEL_EXPORTER_OTLP_HEADERS` | Headers for authentication (format: "key1=value1,key2=value2") | - |
+| `OTEL_METRIC_EXPORT_INTERVAL` | Metrics export interval in ms | `10000` |
+| `OTEL_METRIC_EXPORT_TIMEOUT` | Metrics export timeout in ms | `5000` |
+| `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | `1.0` |
+
+---
+
+## Metrics Collected
+
+- `llm.requests.total`: Total number of LLM requests
+- `llm.tokens.total`: Total tokens used (input + output)
+- `llm.latency.duration`: Request latency in milliseconds
+- `llm.requests.active`: Number of active requests
+
+### Trace Attributes
+
+- `llm.model`, `llm.provider`, `llm.user_id`, `llm.operation`, `llm.input_tokens`, `llm.output_tokens`, `llm.total_tokens`, `llm.latency_ms`, `llm.http_status`, `llm.base_url`, `llm.error`, `llm.error_type`, `llm.input`, `llm.output`, `llm.mcp_tools_used`
+
+---
+
+## Practical Examples
+
+### Jaeger: View Traces
+
+Open <http://localhost:16686> to see your traces.
+
+### Error Tracking Example
+
+```json
+{
+  "tool": "llm_observability_otel",
+  "arguments": {
+    "userId": "user-12345",
+    "model": "gpt-4",
+    "provider": "openai",
+    "httpStatus": 429,
+    "error": "Rate limit exceeded",
+    "errorType": "rate_limit",
+    "latency": 0.1,
+    "operationName": "chat-completion"
+  }
+}
+```
+
+### Multi-Tool Usage Tracking Example
+
+```json
+{
+  "tool": "llm_observability_otel",
+  "arguments": {
+    "userId": "user-12345",
+    "model": "gpt-4",
+    "provider": "openai",
+    "inputTokens": 500,
+    "outputTokens": 200,
+    "latency": 5.2,
+    "httpStatus": 200,
+    "operationName": "complex-workflow",
+    "mcpToolsUsed": ["file_read", "web_search", "code_execution"],
+    "traceId": "complex-workflow-123"
+  }
+}
+```
+
+### Testing Script
+
+```bash
+#!/bin/bash
+# test-opentelemetry.sh
+
+docker run -d --name jaeger-test \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+sleep 5
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
+export OTEL_SERVICE_NAME=llm-observability-test
+export OTEL_ENVIRONMENT=test
+npm run mcp:stdio &
+sleep 3
+curl -X POST http://localhost:3000/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tool": "llm_observability_otel",
+    "arguments": {
+      "userId": "test-user",
+      "model": "gpt-4",
+      "provider": "openai",
+      "inputTokens": 100,
+      "outputTokens": 50,
+      "latency": 1.5,
+      "httpStatus": 200,
+      "operationName": "test-completion"
+    }
+  }'
+echo "Test complete. View traces at http://localhost:16686"
+```
+
+---
+
+## Migration from PostHog
+
+The OpenTelemetry tool is a drop-in replacement for the PostHog tool. Both can coexist for gradual migration:
+
+- **PostHog Tool**: `llm_observability_posthog`
+- **OpenTelemetry Tool**: `llm_observability_otel`
+
+Both accept the same parameters.
+
+---
+
+## Troubleshooting & Performance
+
+### Common Issues
+
+- No data in backend: check endpoint URLs, authentication, network, server logs
+- High resource usage: lower sampling (`OTEL_TRACES_SAMPLER_ARG`), increase export intervals
+- Missing traces: verify OpenTelemetry is enabled, check logs, service name
+
+### Debug Mode
+
+```bash
+export DEBUG=true
+npm run mcp:stdio
+```
+
+### Performance Tuning
+
+- Reduce sampling for high-volume: `OTEL_TRACES_SAMPLER_ARG=0.01`
+- Increase export intervals: `OTEL_METRIC_EXPORT_INTERVAL=60000`
+- Disable metrics/logs if not needed: `unset OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, `unset OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`
+
+---
+
+## Support
+
+For issues or questions:
+
+1. Check this document and troubleshooting
+2. Review server logs with `DEBUG=true`
+3. Verify OpenTelemetry configuration
+4. Test with Jaeger locally first