Catalogue
LLM Observability Tools Compared: Langfuse vs Datadog LLM Observability

LLM Observability Tools Compared: Langfuse vs Datadog LLM Observability

🌐 日本語で読む

I compared the LLM observability tools Langfuse and Datadog LLM Observability based on primary sources (official documentation and GitHub).

  • Note: This article reflects research as of June 11, 2026. Both products evolve rapidly, so I recommend checking the latest official documentation before adopting either one.

Conclusion

What you prioritizeRecommendation
Data sovereignty (self-hosting, air-gapped environments), prompt management, OSSLangfuse
Already using Datadog, ease of instrumentation, managed evaluations and security integrationDatadog

Both share the same overall structure of “tracing + evaluation + security features,” but the delivery models are contrasting.

  • Langfuse: All core features are MIT-licensed OSS. Self-hostable with no usage limits
  • Datadog: SaaS only. Its strengths are auto-instrumentation, managed evaluations, and integration with the existing Datadog stack (APM / logs / Sensitive Data Scanner)
  • Note: Datadog is in the process of renaming “LLM Observability” to “Agent Observability“ (as of June 2026).

Comparison Table

AspectLangfuseDatadog
DeploymentBoth self-hosted / Cloud, air-gap capableSaaS only
LicenseCore is MIT (Enterprise features are commercial)Proprietary
TracingStructured recording of prompts, responses, tokens, cost, and tool callsAuto-instrumentation requires almost no code changes
EvaluationLLM-as-a-Judge, experiments, datasetsManaged evaluations + 9 templates + Custom LLM-as-a-Judge
Prompt managementYes (versioning, deployment)Could not confirm
Guardrails(No clear official documentation found)Sensitive Data Scanner integration + AI Guard (Preview)
Integration with existing APMWeak out of the boxSame stack as APM / logs / RUM
OpenTelemetryOTLP ingestion supported (GenAI convention compliant)Ingests GenAI convention spans directly (no SDK required)
Affinity with AWSSelf-host via official Terraform module, Bedrock / AgentCore integrationBedrock auto-instrumentation, Bedrock Agents / SageMaker integration

Langfuse

Pros

1. All core features are MIT-licensed with unlimited usage

Tracing, LLM-as-a-Judge evaluation, prompt management, experiments, datasets, annotations, and even the playground are provided under the MIT license. The GitHub README also explicitly states “MIT licensed, except for the ee folders.”

2. Self-hostable, works even in air-gapped environments

The official documentation explicitly states it “runs anywhere from a laptop to an air-gapped cluster with no artificial usage limits,” which suits requirements where data cannot leave the organization (healthcare, finance, etc.).

3. Clearly provides prompt management

It offers prompt management features including versioning and deployment, a differentiator that I could not confirm on the Datadog side in this research.

4. LLM-specific tracing data model

It records prompts, responses, token usage, latency, tool calls, and retrieval steps in a structured way. Cost / token tracking targets generation / embedding type observations.

Cons

  • Enterprise features require a commercial license: When self-hosting, SCIM, audit logs, data retention policies, etc. are not included in the OSS edition
  • Operational burden is on you: In v3, you need to build and operate the infrastructure yourself, including ClickHouse and others
  • Unified operation with existing APM / log stacks is not available out of the box

References:

Datadog LLM Observability (Agent Observability)

Pros

1. Auto-instrumentation requires almost no code changes

It integrates with OpenAI, LangChain, AWS Bedrock, Anthropic, Vertex AI, and more, automatically capturing prompts / outputs, token usage and cost, latency, errors, and model parameters (temperature, etc.) — though the SDK still needs to be enabled.

2. Multi-layered managed evaluations

  • Managed evaluations that can be enabled from the UI without code
  • Custom LLM-as-a-Judge that lets you define evaluation logic in natural language
  • 9 official templates (Hallucination, Prompt Injection, Toxicity, and the agent-oriented Tool Selection / Tool Argument Correctness, etc.)

Every evaluation is tied to an individual span, and you can review the input/output that the evaluation was based on directly within the trace.

3. Integration with the existing Datadog stack

The same Sensitive Data Scanner used for logs / APM / RUM can automatically detect and redact sensitive information in LLM input/output (bundling 1GB of SDS allocation per 10K requests). It also provides “Patterns,” which performs automatic topic clustering of production traffic, and anomaly-detection Insights.

4. The real-time guardrail “AI Guard”

It claims to protect against prompt injection, jailbreaks, tool misuse, and sensitive data exfiltration (in Preview stage).

5. Multi-language SDKs

Supports Python (3.7+) / Node.js (16+) / Java (8+). It supports 7 span kinds — llm, workflow, agent, tool, task, embedding, retrieval — and automatic tracing of parent-child relationships.

Cons

  • No self-hosting (SaaS only): Data is sent to Datadog. This is the biggest structural difference from Langfuse
  • Prompt management features could not be confirmed: Features such as versioning and deployment could not be confirmed from the official documentation
  • Lock-in to the Datadog ecosystem: Strengths such as evaluations, SDS, and Patterns presuppose a Datadog contract
  • AI Guard is in Preview: The GA timing and billing are undetermined

References:

Pricing Comparison (as of June 2026)

Langfuse Cloud

PlanMonthlyIncluded unitsOverageData retention
HobbyFree50k/monthNone (hard cap)30 days
Core$29100k/monthFrom $8/100k90 days
Pro$199100k/monthFrom $8/100k3 years
Enterprise$2,499100k/monthFrom $8/100k3 years
  • The billing unit (unit) is every trace data point, including traces, observations, and scores
  • Overage is tiered: $8/100k up to 1M, $7/100k from 1M to 10M, decreasing down to a minimum of $6/100k
  • Self-hosting (OSS) is unlimited and free (only Enterprise features require a commercial license)

Datadog LLM Observability (Agent Observability)

PlanMonthlyIncluded LLM spansData retention
FreeFree40k/month15 days
ProFrom $160100k/month (pay-as-you-go for overage)15 days (extendable to 30/60/90 days for a fee)
  • The billing unit (LLM span) is a single call to an LLM provider. There is no separate charge for evaluations (Evals); LLM calls issued by an evaluation are also counted as LLM spans
  • The Sensitive Data Scanner includes 1GB of usage per 10K requests in the usage fee
  • The published unit price for overage is not stated on the pricing page, so confirm it at contract time

Caveats when comparing

Because the billing units differ, you cannot make a simple quantity-based comparison. Langfuse counts observations and scores within a single trace individually, whereas Datadog counts only LLM calls. Even for the same application, the counts can vary significantly.

In addition, Langfuse incurs zero usage-based billing if you self-host (infrastructure costs are separate), whereas Datadog is SaaS only, so usage-proportional billing always applies.

References:

Affinity with OpenTelemetry

Conclusion: Both can directly ingest traces based on the OTel GenAI semantic conventions, so affinity is high. However, there are differences in coverage.

AspectLangfuseDatadog
OTLP ingestion/api/public/otel (HTTP/JSON, HTTP/protobuf)OTLP endpoint (http/protobuf + dd-otlp-source=llmobs header)
gRPCNot supported
GenAI semantic conventionsCompliant (since the conventions are still evolving, langfuse.* attributes take priority)Can directly ingest GenAI convention spans from OTel 1.37+ (no SDK / Agent required)
OTel-based instrumentation librariesOpenLIT, OpenLLMetry, Arize, MLflow, etc.OpenLLMetry v0.47+ supported / OpenInference and OpenLLMetry below v0.47 not supported
OTel CollectorConfiguration examples available (filtering possible)Datadog Distribution of OTel Collector (DDOT) available
LimitationsTrace-level attributes (userId, etc.) need to be propagated to all spansVia OTel, trace display has a 3-5 minute delay, and may also be recorded in APM traces
  • Langfuse can send data from an OTel SDK / Collector with just environment variable configuration, and provides endpoints for the EU / US / Japan / HIPAA regions
  • Datadog also supports Prompt Tracking, Experiments, and external evaluations via OTel. Since vendor-neutral instrumentation (OTel) can send to either, instrumenting with OTel keeps the cost of switching in the future low

References:

Affinity with AWS

Conclusion: The approaches differ. Langfuse has high affinity as “a stack you self-host on AWS,” while Datadog has high affinity as “a SaaS that auto-instruments Bedrock.”

Langfuse

  • Official Terraform module (langfuse/langfuse-terraform-aws) officially supports self-hosting on AWS. It deploys a highly available configuration including VPC / RDS / S3 / ElastiCache on ECS Fargate (Langfuse Cloud itself also runs on ECS Fargate)
  • Amazon Bedrock instrumentation: Via frameworks such as LangChain / LlamaIndex / Vercel AI SDK, or manual instrumentation using SDK decorators. It records token counts, model IDs, parameters, and errors
  • Bedrock AgentCore support: Receives traces from the AgentCore runtime via OTel (requires disabling ADOT). It visualizes agent execution flows, tool calls, and MCP interactions
  • For Bedrock connections within the platform internals (Playground / Evals), the AWS SDK default credentials provider chain (IAM roles, etc.) can be used

Datadog

  • Bedrock auto-instrumentation: Traces Bedrock Runtime SDK (boto3 / botocore) calls without code changes. The Java SDK also supports Bedrock
  • Bedrock Agents monitoring integration: Automatically captures details of latency, error rate, token usage, and tool calls (also featured in the official AWS blog)
  • SageMaker integration: Metrics collection, visualization, and alerting for ML endpoints / jobs (part of the existing Datadog AWS integration)
  • However, since the backend is Datadog SaaS, trace data is sent outside AWS (to Datadog)

References:

Things to Note

  • Datadog is in the middle of a rebrand, so documentation URLs and names may change

Summary

  • Langfuse: For teams that want to hold data sovereignty and cost control with OSS / self-hosting, and want to use it all the way through prompt management as a single solution
  • Datadog: For teams that have already built a monitoring stack on Datadog and want to quickly use auto-instrumentation along with managed evaluations and security integration
kenzo0107

kenzo0107