1. For the documentation reorganization and tool renaming: docs(opentelemetry): reorganize documentation and rename observability tools - Move OpenTelemetry documentation to docs/ directory - Rename tools from 'capture_llm_observability_opentelemetry' to 'llm_observability_otel' - Rename PostHog tool from 'capture_llm_observability' to 'llm_observability_posthog' - Update README to reflect new tool names and documentation structure 2. For the file deletions and additions: chore(docs): remove old documentation files - Delete OPENTELEMETRY.md and examples/opentelemetry-usage.md - Add new comprehensive docs/opentelemetry.md 3. For the tool implementation changes: refactor(tools): update tool names in implementation files - Update tool names in opentelemetry-llm.tool.ts and posthog-llm.tool.ts - Keep all functionality identical, only change naming
8.7 KiB
8.7 KiB
OpenTelemetry LLM Observability
This document provides comprehensive guidance for using the OpenTelemetry LLM observability tool with the MCP server. It covers setup, configuration, usage, troubleshooting, and practical examples for all major OpenTelemetry-compatible backends.
Features
- Universal Compatibility: Works with Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more
- Comprehensive Metrics: Request counts, token usage, latency, error rates
- Distributed Tracing: Full request lifecycle tracking with spans
- Flexible Configuration: Environment-based configuration for different backends
- Zero-Code Integration: Drop-in replacement for existing observability tools
Quick Start
1. Install Dependencies
OpenTelemetry dependencies are included in package.json:
npm install
2. Configure Your Backend
Jaeger (Local Development)
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-mcp
New Relic
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp
Grafana Cloud
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
export OTEL_SERVICE_NAME=llm-observability-mcp
Honeycomb
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp
Datadog
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
export OTEL_SERVICE_NAME=llm-observability-mcp
Lightstep
export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
export OTEL_SERVICE_NAME=llm-observability-mcp
Kubernetes Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-observability-mcp
spec:
replicas: 3
selector:
matchLabels:
app: llm-observability-mcp
template:
metadata:
labels:
app: llm-observability-mcp
spec:
containers:
- name: llm-observability-mcp
image: llm-observability-mcp:latest
ports:
- containerPort: 3000
env:
- name: OTEL_SERVICE_NAME
value: "llm-observability-mcp"
- name: OTEL_SERVICE_VERSION
value: "1.2.3"
- name: OTEL_ENVIRONMENT
value: "production"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "https://your-backend.com:4318"
- name: OTEL_EXPORTER_OTLP_HEADERS
valueFrom:
secretKeyRef:
name: otel-credentials
key: headers
Running the MCP Server
# Start with stdio transport
npm run mcp:stdio
# Start with HTTP transport
npm run mcp:http
Usage
OpenTelemetry Tool: llm_observability_otel
Required Parameters
userId: The distinct ID of the usermodel: The model used (e.g., "gpt-4", "claude-3")provider: The LLM provider (e.g., "openai", "anthropic")
Optional Parameters
traceId: Trace ID for grouping related eventsinput: The input to the LLM (messages, prompt, etc.)outputChoices: The output from the LLMinputTokens: Number of tokens in the inputoutputTokens: Number of tokens in the outputlatency: Latency of the LLM call in secondshttpStatus: HTTP status code of the LLM callbaseUrl: Base URL of the LLM APIoperationName: Name of the operation being performederror: Error message if the request failederrorType: Type of error (e.g., "rate_limit", "timeout")mcpToolsUsed: List of MCP tools used during the request
Example Usage
{
"tool": "llm_observability_otel",
"arguments": {
"userId": "user-12345",
"model": "gpt-4",
"provider": "openai",
"inputTokens": 150,
"outputTokens": 75,
"latency": 2.3,
"httpStatus": 200,
"operationName": "chat-completion",
"traceId": "trace-abc123",
"input": "What is the weather like today?",
"outputChoices": ["The weather is sunny and 75°F today."]
}
}
Configuration Reference
| Variable | Description | Default |
|---|---|---|
OTEL_SERVICE_NAME |
Service name for OpenTelemetry | llm-observability-mcp |
OTEL_SERVICE_VERSION |
Service version | 1.0.0 |
OTEL_ENVIRONMENT |
Environment name | development |
OTEL_EXPORTER_OTLP_ENDPOINT |
Default OTLP endpoint | - |
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT |
Metrics endpoint | - |
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT |
Traces endpoint | - |
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT |
Logs endpoint | - |
OTEL_EXPORTER_OTLP_HEADERS |
Headers for authentication (format: "key1=value1,key2=value2") | - |
OTEL_METRIC_EXPORT_INTERVAL |
Metrics export interval in ms | 10000 |
OTEL_METRIC_EXPORT_TIMEOUT |
Metrics export timeout in ms | 5000 |
OTEL_TRACES_SAMPLER_ARG |
Sampling ratio (0.0-1.0) | 1.0 |
Metrics Collected
llm.requests.total: Total number of LLM requestsllm.tokens.total: Total tokens used (input + output)llm.latency.duration: Request latency in millisecondsllm.requests.active: Number of active requests
Trace Attributes
llm.model,llm.provider,llm.user_id,llm.operation,llm.input_tokens,llm.output_tokens,llm.total_tokens,llm.latency_ms,llm.http_status,llm.base_url,llm.error,llm.error_type,llm.input,llm.output,llm.mcp_tools_used
Practical Examples
Jaeger: View Traces
Open http://localhost:16686 to see your traces.
Error Tracking Example
{
"tool": "llm_observability_otel",
"arguments": {
"userId": "user-12345",
"model": "gpt-4",
"provider": "openai",
"httpStatus": 429,
"error": "Rate limit exceeded",
"errorType": "rate_limit",
"latency": 0.1,
"operationName": "chat-completion"
}
}
Multi-Tool Usage Tracking Example
{
"tool": "llm_observability_otel",
"arguments": {
"userId": "user-12345",
"model": "gpt-4",
"provider": "openai",
"inputTokens": 500,
"outputTokens": 200,
"latency": 5.2,
"httpStatus": 200,
"operationName": "complex-workflow",
"mcpToolsUsed": ["file_read", "web_search", "code_execution"],
"traceId": "complex-workflow-123"
}
}
Testing Script
#!/bin/bash
# test-opentelemetry.sh
docker run -d --name jaeger-test \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
sleep 5
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=llm-observability-test
export OTEL_ENVIRONMENT=test
npm run mcp:stdio &
sleep 3
curl -X POST http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d '{
"tool": "llm_observability_otel",
"arguments": {
"userId": "test-user",
"model": "gpt-4",
"provider": "openai",
"inputTokens": 100,
"outputTokens": 50,
"latency": 1.5,
"httpStatus": 200,
"operationName": "test-completion"
}
}'
echo "Test complete. View traces at http://localhost:16686"
Migration from PostHog
The OpenTelemetry tool is a drop-in replacement for the PostHog tool. Both can coexist for gradual migration:
- PostHog Tool:
llm_observability_posthog - OpenTelemetry Tool:
llm_observability_otel
Both accept the same parameters.
Troubleshooting & Performance
Common Issues
- No data in backend: check endpoint URLs, authentication, network, server logs
- High resource usage: lower sampling (
OTEL_TRACES_SAMPLER_ARG), increase export intervals - Missing traces: verify OpenTelemetry is enabled, check logs, service name
Debug Mode
export DEBUG=true
npm run mcp:stdio
Performance Tuning
- Reduce sampling for high-volume:
OTEL_TRACES_SAMPLER_ARG=0.01 - Increase export intervals:
OTEL_METRIC_EXPORT_INTERVAL=60000 - Disable metrics/logs if not needed:
unset OTEL_EXPORTER_OTLP_METRICS_ENDPOINT,unset OTEL_EXPORTER_OTLP_LOGS_ENDPOINT
Support
For issues or questions:
- Check this document and troubleshooting
- Review server logs with
DEBUG=true - Verify OpenTelemetry configuration
- Test with Jaeger locally first