1. For the documentation reorganization and tool renaming: docs(opentelemetry): reorganize documentation and rename observability tools - Move OpenTelemetry documentation to docs/ directory - Rename tools from 'capture_llm_observability_opentelemetry' to 'llm_observability_otel' - Rename PostHog tool from 'capture_llm_observability' to 'llm_observability_posthog' - Update README to reflect new tool names and documentation structure 2. For the file deletions and additions: chore(docs): remove old documentation files - Delete OPENTELEMETRY.md and examples/opentelemetry-usage.md - Add new comprehensive docs/opentelemetry.md 3. For the tool implementation changes: refactor(tools): update tool names in implementation files - Update tool names in opentelemetry-llm.tool.ts and posthog-llm.tool.ts - Keep all functionality identical, only change naming
333 lines
8.7 KiB
Markdown
333 lines
8.7 KiB
Markdown
# OpenTelemetry LLM Observability
|
|
|
|
This document provides comprehensive guidance for using the OpenTelemetry LLM observability tool with the MCP server. It covers setup, configuration, usage, troubleshooting, and practical examples for all major OpenTelemetry-compatible backends.
|
|
|
|
## Features
|
|
|
|
- **Universal Compatibility**: Works with Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more
|
|
- **Comprehensive Metrics**: Request counts, token usage, latency, error rates
|
|
- **Distributed Tracing**: Full request lifecycle tracking with spans
|
|
- **Flexible Configuration**: Environment-based configuration for different backends
|
|
- **Zero-Code Integration**: Drop-in replacement for existing observability tools
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
OpenTelemetry dependencies are included in `package.json`:
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
### 2. Configure Your Backend
|
|
|
|
#### Jaeger (Local Development)
|
|
|
|
```bash
|
|
docker run -d --name jaeger \
|
|
-e COLLECTOR_OTLP_ENABLED=true \
|
|
-p 16686:16686 \
|
|
-p 4317:4317 \
|
|
-p 4318:4318 \
|
|
jaegertracing/all-in-one:latest
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### New Relic
|
|
|
|
```bash
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
|
|
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### Grafana Cloud
|
|
|
|
```bash
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
|
|
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### Honeycomb
|
|
|
|
```bash
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
|
|
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### Datadog
|
|
|
|
```bash
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
|
|
export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### Lightstep
|
|
|
|
```bash
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
|
|
export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
|
|
export OTEL_SERVICE_NAME=llm-observability-mcp
|
|
```
|
|
|
|
#### Kubernetes Example
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: llm-observability-mcp
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: llm-observability-mcp
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: llm-observability-mcp
|
|
spec:
|
|
containers:
|
|
- name: llm-observability-mcp
|
|
image: llm-observability-mcp:latest
|
|
ports:
|
|
- containerPort: 3000
|
|
env:
|
|
- name: OTEL_SERVICE_NAME
|
|
value: "llm-observability-mcp"
|
|
- name: OTEL_SERVICE_VERSION
|
|
value: "1.2.3"
|
|
- name: OTEL_ENVIRONMENT
|
|
value: "production"
|
|
- name: OTEL_EXPORTER_OTLP_ENDPOINT
|
|
value: "https://your-backend.com:4318"
|
|
- name: OTEL_EXPORTER_OTLP_HEADERS
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: otel-credentials
|
|
key: headers
|
|
```
|
|
|
|
---
|
|
|
|
## Running the MCP Server
|
|
|
|
```bash
|
|
# Start with stdio transport
|
|
npm run mcp:stdio
|
|
# Start with HTTP transport
|
|
npm run mcp:http
|
|
```
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
### OpenTelemetry Tool: `llm_observability_otel`
|
|
|
|
#### Required Parameters
|
|
|
|
- `userId`: The distinct ID of the user
|
|
- `model`: The model used (e.g., "gpt-4", "claude-3")
|
|
- `provider`: The LLM provider (e.g., "openai", "anthropic")
|
|
|
|
#### Optional Parameters
|
|
|
|
- `traceId`: Trace ID for grouping related events
|
|
- `input`: The input to the LLM (messages, prompt, etc.)
|
|
- `outputChoices`: The output from the LLM
|
|
- `inputTokens`: Number of tokens in the input
|
|
- `outputTokens`: Number of tokens in the output
|
|
- `latency`: Latency of the LLM call in seconds
|
|
- `httpStatus`: HTTP status code of the LLM call
|
|
- `baseUrl`: Base URL of the LLM API
|
|
- `operationName`: Name of the operation being performed
|
|
- `error`: Error message if the request failed
|
|
- `errorType`: Type of error (e.g., "rate_limit", "timeout")
|
|
- `mcpToolsUsed`: List of MCP tools used during the request
|
|
|
|
#### Example Usage
|
|
|
|
```json
|
|
{
|
|
"tool": "llm_observability_otel",
|
|
"arguments": {
|
|
"userId": "user-12345",
|
|
"model": "gpt-4",
|
|
"provider": "openai",
|
|
"inputTokens": 150,
|
|
"outputTokens": 75,
|
|
"latency": 2.3,
|
|
"httpStatus": 200,
|
|
"operationName": "chat-completion",
|
|
"traceId": "trace-abc123",
|
|
"input": "What is the weather like today?",
|
|
"outputChoices": ["The weather is sunny and 75°F today."]
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Reference
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `OTEL_SERVICE_NAME` | Service name for OpenTelemetry | `llm-observability-mcp` |
|
|
| `OTEL_SERVICE_VERSION` | Service version | `1.0.0` |
|
|
| `OTEL_ENVIRONMENT` | Environment name | `development` |
|
|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Default OTLP endpoint | - |
|
|
| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics endpoint | - |
|
|
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Traces endpoint | - |
|
|
| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs endpoint | - |
|
|
| `OTEL_EXPORTER_OTLP_HEADERS` | Headers for authentication (format: "key1=value1,key2=value2") | - |
|
|
| `OTEL_METRIC_EXPORT_INTERVAL` | Metrics export interval in ms | `10000` |
|
|
| `OTEL_METRIC_EXPORT_TIMEOUT` | Metrics export timeout in ms | `5000` |
|
|
| `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | `1.0` |
|
|
|
|
---
|
|
|
|
## Metrics Collected
|
|
|
|
- `llm.requests.total`: Total number of LLM requests
|
|
- `llm.tokens.total`: Total tokens used (input + output)
|
|
- `llm.latency.duration`: Request latency in milliseconds
|
|
- `llm.requests.active`: Number of active requests
|
|
|
|
### Trace Attributes
|
|
|
|
- `llm.model`, `llm.provider`, `llm.user_id`, `llm.operation`, `llm.input_tokens`, `llm.output_tokens`, `llm.total_tokens`, `llm.latency_ms`, `llm.http_status`, `llm.base_url`, `llm.error`, `llm.error_type`, `llm.input`, `llm.output`, `llm.mcp_tools_used`
|
|
|
|
---
|
|
|
|
## Practical Examples
|
|
|
|
### Jaeger: View Traces
|
|
|
|
Open <http://localhost:16686> to see your traces.
|
|
|
|
### Error Tracking Example
|
|
|
|
```json
|
|
{
|
|
"tool": "llm_observability_otel",
|
|
"arguments": {
|
|
"userId": "user-12345",
|
|
"model": "gpt-4",
|
|
"provider": "openai",
|
|
"httpStatus": 429,
|
|
"error": "Rate limit exceeded",
|
|
"errorType": "rate_limit",
|
|
"latency": 0.1,
|
|
"operationName": "chat-completion"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Multi-Tool Usage Tracking Example
|
|
|
|
```json
|
|
{
|
|
"tool": "llm_observability_otel",
|
|
"arguments": {
|
|
"userId": "user-12345",
|
|
"model": "gpt-4",
|
|
"provider": "openai",
|
|
"inputTokens": 500,
|
|
"outputTokens": 200,
|
|
"latency": 5.2,
|
|
"httpStatus": 200,
|
|
"operationName": "complex-workflow",
|
|
"mcpToolsUsed": ["file_read", "web_search", "code_execution"],
|
|
"traceId": "complex-workflow-123"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Testing Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# test-opentelemetry.sh
|
|
|
|
docker run -d --name jaeger-test \
|
|
-e COLLECTOR_OTLP_ENABLED=true \
|
|
-p 16686:16686 \
|
|
-p 4318:4318 \
|
|
jaegertracing/all-in-one:latest
|
|
sleep 5
|
|
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
|
export OTEL_SERVICE_NAME=llm-observability-test
|
|
export OTEL_ENVIRONMENT=test
|
|
npm run mcp:stdio &
|
|
sleep 3
|
|
curl -X POST http://localhost:3000/mcp \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"tool": "llm_observability_otel",
|
|
"arguments": {
|
|
"userId": "test-user",
|
|
"model": "gpt-4",
|
|
"provider": "openai",
|
|
"inputTokens": 100,
|
|
"outputTokens": 50,
|
|
"latency": 1.5,
|
|
"httpStatus": 200,
|
|
"operationName": "test-completion"
|
|
}
|
|
}'
|
|
echo "Test complete. View traces at http://localhost:16686"
|
|
```
|
|
|
|
---
|
|
|
|
## Migration from PostHog
|
|
|
|
The OpenTelemetry tool is a drop-in replacement for the PostHog tool. Both can coexist for gradual migration:
|
|
|
|
- **PostHog Tool**: `llm_observability_posthog`
|
|
- **OpenTelemetry Tool**: `llm_observability_otel`
|
|
|
|
Both accept the same parameters.
|
|
|
|
---
|
|
|
|
## Troubleshooting & Performance
|
|
|
|
### Common Issues
|
|
|
|
- No data in backend: check endpoint URLs, authentication, network, server logs
|
|
- High resource usage: lower sampling (`OTEL_TRACES_SAMPLER_ARG`), increase export intervals
|
|
- Missing traces: verify OpenTelemetry is enabled, check logs, service name
|
|
|
|
### Debug Mode
|
|
|
|
```bash
|
|
export DEBUG=true
|
|
npm run mcp:stdio
|
|
```
|
|
|
|
### Performance Tuning
|
|
|
|
- Reduce sampling for high-volume: `OTEL_TRACES_SAMPLER_ARG=0.01`
|
|
- Increase export intervals: `OTEL_METRIC_EXPORT_INTERVAL=60000`
|
|
- Disable metrics/logs if not needed: `unset OTEL_EXPORTER_OTLP_METRICS_ENDPOINT`, `unset OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
|
|
1. Check this document and troubleshooting
|
|
2. Review server logs with `DEBUG=true`
|
|
3. Verify OpenTelemetry configuration
|
|
4. Test with Jaeger locally first
|