Initial implementation of opentelemetry-llm tool
This commit is contained in:
301
OPENTELEMETRY.md
Normal file
301
OPENTELEMETRY.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# OpenTelemetry LLM Observability Integration
|
||||
|
||||
This MCP server now includes comprehensive OpenTelemetry support for LLM observability, compatible with any OpenTelemetry backend including Jaeger, New Relic, Grafana, Datadog, Honeycomb, and more.
|
||||
|
||||
## Features
|
||||
|
||||
- **Universal Compatibility**: Works with any OpenTelemetry-compatible backend
|
||||
- **Comprehensive Metrics**: Request counts, token usage, latency, error rates
|
||||
- **Distributed Tracing**: Full request lifecycle tracking with spans
|
||||
- **Flexible Configuration**: Environment-based configuration for different backends
|
||||
- **Zero-Code Integration**: Drop-in replacement for existing observability tools
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
The OpenTelemetry dependencies are already included in the package.json:
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
### 2. Configure Your Backend
|
||||
|
||||
#### Jaeger (Local Development)
|
||||
|
||||
```bash
|
||||
# Start Jaeger locally
|
||||
docker run -d --name jaeger \
|
||||
-e COLLECTOR_OTLP_ENABLED=true \
|
||||
-p 16686:16686 \
|
||||
-p 4317:4317 \
|
||||
-p 4318:4318 \
|
||||
jaegertracing/all-in-one:latest
|
||||
|
||||
# Configure the MCP server
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
||||
export OTEL_SERVICE_NAME=llm-observability-mcp
|
||||
```
|
||||
|
||||
#### New Relic
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_NEW_RELIC_LICENSE_KEY"
|
||||
export OTEL_SERVICE_NAME=llm-observability-mcp
|
||||
```
|
||||
|
||||
#### Grafana Cloud
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
|
||||
export OTEL_SERVICE_NAME=llm-observability-mcp
|
||||
```
|
||||
|
||||
#### Datadog
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com/api/v2/series
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="DD-API-KEY=YOUR_DD_API_KEY"
|
||||
export OTEL_SERVICE_NAME=llm-observability-mcp
|
||||
```
|
||||
|
||||
### 3. Start the MCP Server
|
||||
|
||||
```bash
|
||||
# Start with stdio transport
|
||||
npm run mcp:stdio
|
||||
|
||||
# Start with HTTP transport
|
||||
npm run mcp:http
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Using the OpenTelemetry Tool
|
||||
|
||||
The MCP server provides a new tool: `capture_llm_observability_opentelemetry`
|
||||
|
||||
#### Required Parameters
|
||||
|
||||
- `userId`: The distinct ID of the user
|
||||
- `model`: The model used (e.g., "gpt-4", "claude-3")
|
||||
- `provider`: The LLM provider (e.g., "openai", "anthropic")
|
||||
|
||||
#### Optional Parameters
|
||||
|
||||
- `traceId`: Trace ID for grouping related events
|
||||
- `input`: The input to the LLM (messages, prompt, etc.)
|
||||
- `outputChoices`: The output from the LLM
|
||||
- `inputTokens`: Number of tokens in the input
|
||||
- `outputTokens`: Number of tokens in the output
|
||||
- `latency`: Latency of the LLM call in seconds
|
||||
- `httpStatus`: HTTP status code of the LLM call
|
||||
- `baseUrl`: Base URL of the LLM API
|
||||
- `operationName`: Name of the operation being performed
|
||||
- `error`: Error message if the request failed
|
||||
- `errorType`: Type of error (e.g., "rate_limit", "timeout")
|
||||
- `mcpToolsUsed`: List of MCP tools used during the request
|
||||
|
||||
### Example Usage
|
||||
|
||||
```json
|
||||
{
|
||||
"userId": "user-123",
|
||||
"model": "gpt-4",
|
||||
"provider": "openai",
|
||||
"inputTokens": 150,
|
||||
"outputTokens": 75,
|
||||
"latency": 2.5,
|
||||
"httpStatus": 200,
|
||||
"operationName": "chat-completion",
|
||||
"traceId": "trace-abc123"
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `OTEL_SERVICE_NAME` | Service name for OpenTelemetry | `llm-observability-mcp` |
|
||||
| `OTEL_SERVICE_VERSION` | Service version | `1.0.0` |
|
||||
| `OTEL_ENVIRONMENT` | Environment name | `development` |
|
||||
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Default OTLP endpoint | - |
|
||||
| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics endpoint | - |
|
||||
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Traces endpoint | - |
|
||||
| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs endpoint | - |
|
||||
| `OTEL_EXPORTER_OTLP_HEADERS` | Headers for authentication (format: "key1=value1,key2=value2") | - |
|
||||
| `OTEL_METRIC_EXPORT_INTERVAL` | Metrics export interval in ms | `10000` |
|
||||
| `OTEL_METRIC_EXPORT_TIMEOUT` | Metrics export timeout in ms | `5000` |
|
||||
| `OTEL_TRACES_SAMPLER_ARG` | Sampling ratio (0.0-1.0) | `1.0` |
|
||||
|
||||
### Backend-Specific Configuration
|
||||
|
||||
#### New Relic
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="api-key=YOUR_LICENSE_KEY"
|
||||
```
|
||||
|
||||
#### Jaeger
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
||||
```
|
||||
|
||||
#### Grafana Cloud
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n YOUR_INSTANCE_ID:YOUR_API_KEY | base64)"
|
||||
```
|
||||
|
||||
#### Honeycomb
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"
|
||||
```
|
||||
|
||||
#### Lightstep
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.lightstep.com:443/api/v2/otel/trace
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="lightstep-access-token=YOUR_ACCESS_TOKEN"
|
||||
```
|
||||
|
||||
## Metrics Collected
|
||||
|
||||
### Counters
|
||||
|
||||
- `llm.requests.total`: Total number of LLM requests
|
||||
- `llm.tokens.total`: Total tokens used (input + output)
|
||||
|
||||
### Histograms
|
||||
|
||||
- `llm.latency.duration`: Request latency in milliseconds
|
||||
|
||||
### Gauges
|
||||
|
||||
- `llm.requests.active`: Number of active requests
|
||||
|
||||
### Trace Attributes
|
||||
|
||||
- `llm.model`: The model used
|
||||
- `llm.provider`: The provider name
|
||||
- `llm.user_id`: The user ID
|
||||
- `llm.operation`: The operation name
|
||||
- `llm.input_tokens`: Input token count
|
||||
- `llm.output_tokens`: Output token count
|
||||
- `llm.total_tokens`: Total token count
|
||||
- `llm.latency_ms`: Latency in milliseconds
|
||||
- `llm.http_status`: HTTP status code
|
||||
- `llm.base_url`: API base URL
|
||||
- `llm.error`: Error message (if any)
|
||||
- `llm.error_type`: Error type classification
|
||||
- `llm.input`: Input content (optional)
|
||||
- `llm.output`: Output content (optional)
|
||||
- `llm.mcp_tools_used`: MCP tools used
|
||||
|
||||
## Testing with Jaeger
|
||||
|
||||
### 1. Start Jaeger
|
||||
|
||||
```bash
|
||||
docker run -d --name jaeger \
|
||||
-e COLLECTOR_OTLP_ENABLED=true \
|
||||
-p 16686:16686 \
|
||||
-p 4317:4317 \
|
||||
-p 4318:4318 \
|
||||
jaegertracing/all-in-one:latest
|
||||
```
|
||||
|
||||
### 2. Configure MCP Server
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
||||
export OTEL_SERVICE_NAME=llm-observability-mcp
|
||||
npm run mcp:stdio
|
||||
```
|
||||
|
||||
### 3. View Traces
|
||||
|
||||
Open <http://localhost:16686> to view traces in Jaeger UI.
|
||||
|
||||
## Migration from PostHog
|
||||
|
||||
The OpenTelemetry tool is designed to be a drop-in replacement for the PostHog tool. Both tools can coexist, allowing for gradual migration:
|
||||
|
||||
1. **PostHog Tool**: `capture_llm_observability`
|
||||
2. **OpenTelemetry Tool**: `capture_llm_observability_opentelemetry`
|
||||
|
||||
Both tools accept the same parameters, making migration straightforward.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### No Data in Backend
|
||||
|
||||
1. Verify endpoint URLs are correct
|
||||
2. Check authentication headers
|
||||
3. Ensure network connectivity
|
||||
4. Check server logs for errors
|
||||
|
||||
#### High Resource Usage
|
||||
|
||||
1. Adjust sampling ratio: `OTEL_TRACES_SAMPLER_ARG=0.1`
|
||||
2. Increase export intervals: `OTEL_METRIC_EXPORT_INTERVAL=30000`
|
||||
|
||||
#### Missing Traces
|
||||
|
||||
1. Verify OpenTelemetry is enabled (check for endpoint configuration)
|
||||
2. Check for initialization errors in logs
|
||||
3. Ensure proper service name configuration
|
||||
|
||||
### Debug Mode
|
||||
|
||||
Enable debug logging:
|
||||
|
||||
```bash
|
||||
export DEBUG=true
|
||||
npm run mcp:stdio
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Custom Headers
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer token,Custom-Header=value"
|
||||
```
|
||||
|
||||
### Multiple Backends
|
||||
|
||||
Configure different endpoints for metrics and traces:
|
||||
|
||||
```bash
|
||||
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=https://metrics.example.com
|
||||
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://traces.example.com
|
||||
```
|
||||
|
||||
### Sampling Configuration
|
||||
|
||||
```bash
|
||||
# Sample 10% of traces
|
||||
export OTEL_TRACES_SAMPLER_ARG=0.1
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
|
||||
1. Check the troubleshooting section above
|
||||
2. Review server logs with `DEBUG=true`
|
||||
3. Verify OpenTelemetry configuration
|
||||
4. Test with Jaeger locally first
|
||||
Reference in New Issue
Block a user