Better Stack vs PostHog: A Complete Comparison for 2026

Stanley Ulili
Updated on June 1, 2026

PostHog is a genuinely interesting product. It started as a product analytics platform and has been quietly expanding toward observability, adding error tracking, logs, and session replay to a suite that already covers feature flags, experiments, surveys, and A/B testing. For product teams at early-stage startups, that breadth in a single tool is compelling.

But compelling for product teams and sufficient for engineering teams are two different things. PostHog's own internal handbook acknowledges this plainly: "PostHog is not a complete observability platform (yet). It includes logs, but no APM, cloud monitoring, or other monitoring features." No distributed tracing. No infrastructure monitoring. No incident management. No on-call scheduling. No status pages. No uptime monitoring. No eBPF-based auto-instrumentation.

When something breaks at 3am, your product analytics tool is not the right place to start the investigation.

Better Stack covers the full production stack: logs, metrics, distributed traces, error tracking, session replay, real user monitoring, uptime monitoring, incident management, on-call scheduling, status pages, and an MCP server that connects AI assistants directly to your observability data. It does this from a unified interface with a single query language and volume-based pricing that doesn't penalize you for adding tags, indexing more data, or scaling your team.


Quick comparison at a glance

Category Better Stack PostHog
Primary focus Full-stack observability + incident management Product analytics + developer tooling
Log management GA, 100% searchable, SQL + PromQL GA (launched 2025), $0.25/GB, 14-day retention
APM / distributed tracing eBPF-based, zero-code, GA Not available (on roadmap)
Infrastructure monitoring GA, no cardinality penalties Not available
Error tracking GA, Sentry-compatible GA
Session replay / RUM GA, unified with backend telemetry GA, strong product-side integration
Incident management Built-in (on-call, escalations, phone/SMS) Not available
Status pages Built-in Not available
Uptime monitoring 30-second check intervals, Playwright checks Not available
Feature flags / experiments Not available Strong (core product)
Product analytics Not available Strong (core product)
MCP server GA (all customers) Not available
Pricing model Volume-based, no cardinality penalties Per-product, usage-based
HIPAA (via BAA) Not available Available on platform/enterprise plans
OpenTelemetry Native, first-class Supported for logs ingestion

Platform architecture

The fundamental architectural difference between these two platforms reflects their origins. PostHog was built to answer the question "what are users doing?" Better Stack was built to answer the question "why is production broken right now?" Those are related but distinct problems, and the tools reflect that.

Better Stack: observability-first, unified data model

Better Stack's architecture is built around three principles: kernel-level auto-instrumentation via eBPF, OpenTelemetry-native data collection, and a unified storage layer that treats logs, metrics, and traces as wide events in the same data warehouse.

The eBPF collector runs at the kernel level. Deploy it to Kubernetes as a DaemonSet and it automatically discovers services, traces HTTP/gRPC calls, instruments database queries to PostgreSQL, MySQL, Redis, and MongoDB, and starts shipping telemetry without any application code changes. This matters because the alternative, which is what PostHog requires for its observability features, is SDK instrumentation: you install a library, configure it, deploy it, and do that for every service separately.

Better Stack architecture diagram

Unified storage means every log line, metric, and trace lives in the same data warehouse and is queryable with SQL or PromQL. When an alert fires, you don't switch tabs or products; you see service maps, related logs, trace details, and metric anomalies in a single view. That's not a UX choice, it's an architectural one: the data is physically co-located and always correlated.

OpenTelemetry-native means your data is in an open format from day one. If you ever want to route telemetry elsewhere, you change a config line, not your codebase. There's no proprietary agent accumulating lock-in cost.

PostHog: product analytics-first, expanding toward observability

PostHog's architecture makes sense for what it was originally designed to do. Events from your frontend and backend are captured via SDK, stored in ClickHouse, and made available for product analytics queries, session replay, feature flag targeting, and experiment analysis.

The observability features added more recently (error tracking, logs) sit on top of this same event store. This creates an interesting situation: PostHog can answer "how many users hit this error and did it affect your signup funnel?" better than any pure observability tool. But it cannot answer "what was my p99 API latency in the past hour?" or "which service in my dependency graph caused this cascade?" or "who is on call and how do I wake them up?"

What does PostHog's own product team say about this gap? Their internal handbook states directly: "We can't position PostHog as a full Datadog replacement today. Honest assessment: Our Observability story is credible but incomplete." APM and distributed tracing are explicitly listed as "not shipped yet" on the roadmap. (Source: PostHog handbook)

Architecture aspect Better Stack PostHog
Data collection eBPF (kernel-level, zero code) SDK per service
Storage model Unified warehouse (logs, metrics, traces) Event store (optimized for product analytics)
Query language SQL + PromQL (unified) HogQL (SQL subset)
APM / tracing GA, eBPF-based Not available
Infrastructure monitoring GA Not available
Investigation flow Single interface, all context visible Switch to separate tools for backend
OpenTelemetry support Native, first-class Supported for logs ingestion
Time to first insights Minutes (zero-code) Minutes (JS snippet) to hours (backend SDK)

Pricing comparison

PostHog's pricing is generous at small scale, genuinely free up to meaningful volumes, and transparent in structure. The free tier covers 1M product analytics events, 5K session recordings, 1M feature flag requests, 100K error tracking exceptions, and 50GB of logs per month. After those limits, pricing is usage-based per product.

Better Stack pricing is volume-based across the full observability stack: logs, metrics, traces, error tracking, RUM, incident management, and status pages covered under one model. There are no per-feature charges, no cardinality penalties, and no indexing fees.

Better Stack: one bill, full stack

Pricing structure:

  • Logs: $0.10/GB ingestion + $0.05/GB/month retention (all logs searchable, 30-day default retention)
  • Traces: $0.10/GB ingestion + $0.05/GB/month retention
  • Metrics: $0.50/GB/month (no cardinality penalties)
  • Error tracking: $0.000050 per exception
  • Responders: $29/month (unlimited phone/SMS)
  • Monitors: $0.21/month each

What's included that PostHog doesn't offer at any price: distributed tracing, infrastructure monitoring, uptime monitoring, incident management, on-call scheduling, status pages, MCP server integration.

PostHog: free at small scale, accumulates at production scale

PostHog's model is genuinely attractive for early-stage products. The free tier is real and useful. Once you cross the free limits, costs accumulate per product. At production scale you also need to add the observability tools PostHog doesn't cover (an APM, infrastructure monitoring, incident management) as separate line items.

Pricing structure (verified from posthog.com/pricing, May 2026):

  • Product analytics (anonymous events): from $0.00005/event (after 1M free)
  • Product analytics (identified events): from $0.000248/event (after 1M free)
  • Session replay (web): $0.005/recording (5K-15K tier), stepping to $0.0015 at 500K+
  • Session replay (mobile): 2x web pricing across all tiers
  • Feature flags: $0.0001/request (after 1M free)
  • Error tracking: $0.00037/exception (after 100K free)
  • Logs: $0.25/GB ingested (after 50GB free), dropping to $0.15/GB at 300GB+
  • Surveys: $0.10/response (after 1.5K free)

Key log retention difference: PostHog logs have 14-day retention by default (30 and 90-day options described as "coming soon" as of publication). Better Stack provides 30-day default log retention.

What's missing at any price tier: APM, distributed tracing, infrastructure monitoring, uptime monitoring, incident management, on-call, status pages.

At a 100-engineer SaaS company running production workloads, you'd typically need PostHog plus a separate APM tool plus an incident management tool. That's three separate contracts, three separate integrations, and three separate sets of data to manually correlate during incidents.

3-year TCO: full production stack

Assuming a 100-engineer company needing full observability plus incident management:

Category Better Stack PostHog + supplements
Logs, metrics, traces $33,600 PostHog logs: ~$9,000 + APM (e.g. New Relic): ~$67,000
Error tracking $9,000 ~$13,320
Incident management + on-call $5,220 PagerDuty Business: ~$21,600
Status pages $7,488 Third-party (e.g. Statuspage.io): ~$3,600
Uptime monitoring Included Third-party: ~$2,400
Total ~$55,308 ~$116,920+

Better Stack handles the full production observability stack in one subscription. PostHog is compelling within its defined scope, but that scope leaves meaningful gaps requiring additional tools.


Log management

PostHog's log management launched in 2025. It accepts logs via OpenTelemetry, provides SQL-based querying through HogQL, and integrates error context from the error tracking product. The free tier includes 50GB/month.

The limitations are real. PostHog logs have 14-day default retention (compared to Better Stack's 30 days). There is no infrastructure metric correlation, and no distributed trace correlation because there is no APM product. The $0.25/GB rate after the free tier is more than double Better Stack's $0.10/GB.

Better Stack: logs as first-class observability data

Better Stack logs treats every ingested log as structured data stored in the same warehouse as metrics and traces. All ingested logs are immediately queryable via SQL or PromQL. No indexing tiers, no choices about which logs to make searchable, no rehydration delays.

Query across logs simultaneously using SQL:

 
SELECT 
  service_name,
  COUNT(*) as error_count,
  AVG(duration_ms) as avg_duration
FROM logs
WHERE level = 'error'
  AND timestamp > NOW() - INTERVAL '1 hour'
GROUP BY service_name
ORDER BY error_count DESC

Save frequent queries as presets for fast access during incidents:

PostHog: logs with user-session integration as the differentiator

PostHog's genuine differentiation in logging is the connection to product data: you can filter logs by user segment, correlate backend errors with session replays, and ask "what was the user doing when this log error fired?" No pure observability tool does this natively.

Getting there does require some care in setup. Two things to know before you start:

PostHog logs only accept OTLP format. Teams migrating from standard log shippers like Fluentd or Logstash often hit a wall immediately because those tools default to non-OTLP output. The correct ingestion endpoint is https://us.i.posthog.com/i/v1/logs, and you need to configure your shipper to output in OTLP before anything reaches PostHog.

Authentication must use a project token, not a personal API key. Project tokens are prefixed phc_; personal API keys are prefixed phx_. Mixing those up is the most common reason logs appear to ship successfully but never show up in the interface. PostHog's own troubleshooting documentation lists 401 Unauthorized as a top ingestion error. The fix is straightforward once you know the cause: set the Authorization header to Bearer <ph_project_token>, or pass it as a query parameter with ?token=<ph_project_token>.

A second failure mode worth flagging: logs can return a 200 response and still not appear in PostHog. That usually means either the project doesn't have access to the logs feature on its current plan, or the payload doesn't conform to the OTLP spec. Both are silent from the sender's side, which makes them harder to diagnose than a clean error response would be.

Neither issue is a dealbreaker, but they're the kind of friction that slows initial setup and can leave teams assuming they have logging coverage when they don't.

Log management Better Stack PostHog
Pricing $0.10/GB ingestion $0.25/GB (after 50GB free), $0.15/GB at 300GB+
Default retention 30 days 14 days
Searchability 100% of ingested logs 100% of ingested logs
Query language SQL + PromQL HogQL (SQL subset)
Trace correlation Automatic Not available (no APM)
Metric correlation Automatic Not available
User session correlation Via RUM Built-in (PostHog's differentiator)
Ingestion format Multiple (including raw) OTLP required

Application performance monitoring

This is the clearest capability gap in the comparison. Better Stack has a full APM product based on eBPF auto-instrumentation. PostHog has no APM and no distributed tracing. Both are listed as future roadmap items with no committed timeline.

Why does this matter? When a user reports that your checkout flow is slow, "slow" could mean a frontend rendering issue, a slow database query, an overloaded microservice, or a third-party API call timing out. APM and distributed tracing let you trace the request from the frontend all the way through your backend services and find the bottleneck. Without it, you're guessing.

Better Stack: eBPF-based APM, zero code changes

Better Stack distributed tracing

Better Stack's APM captures distributed traces at the kernel level, with no SDK installation or code changes required.

Deploy the collector to Kubernetes and HTTP/gRPC traffic between services is captured immediately. Database queries to PostgreSQL, MySQL, Redis, and MongoDB are automatically traced. The trace data lives in the same warehouse as your logs and metrics, so when a trace shows a slow database query, the correlated log lines and infrastructure metrics are a click away in the same interface.

Frontend-to-backend correlation connects a slow page load directly to the backend service and query that caused it. You see the full request path from browser to database in one view, no manual stitching between products.

OpenTelemetry-native, no lock-in. Traces are stored in OTel format. If you later want to route traces to a different backend, you change a configuration line, not your instrumentation.

PostHog: no APM available

PostHog does not have APM or distributed tracing. This is stated explicitly in their own documentation. If your team needs APM today, PostHog is not currently a viable option for that requirement.

APM feature Better Stack PostHog
Distributed tracing GA (eBPF, zero code) Not available
Frontend-to-backend correlation Unified, same interface Session-level only (no backend tracing)
Database query tracing Automatic (Postgres, MySQL, Redis, Mongo) Not available
OpenTelemetry Native, zero lock-in Not available for traces
Instrumentation method eBPF (kernel-level) SDK required

Infrastructure monitoring

Better Stack monitors infrastructure with a Prometheus-compatible metrics system and no cardinality penalties. PostHog has no infrastructure monitoring product.

Better Stack: Prometheus-native, volume-based pricing

Better Stack metrics is fully PromQL-compatible with no per-metric or per-cardinality charges.

PostHog: no infrastructure monitoring

PostHog does not have an infrastructure monitoring product. CPU usage, memory pressure, disk I/O, Kubernetes pod health, and host-level metrics are not captured by PostHog.

Metrics feature Better Stack PostHog
Infrastructure monitoring GA Not available
Prometheus / PromQL Native Not available
Cardinality penalties None N/A
Kubernetes monitoring Built-in Not available
Host-level metrics Built-in Not available

Incident management

PostHog has no incident management product for customers. There is no on-call scheduling, no escalation policies, no phone or SMS alerting, and no status pages. When PostHog's own engineering team handles incidents internally, they use incident.io, a third-party tool. (PostHog on-call handbook)

For PostHog users who need incident management, the answer is the same as it would be for any analytics-only tool: add PagerDuty, Opsgenie, or a similar product separately.

Better Stack includes all of this natively.

Better Stack: full incident lifecycle, built in

Better Stack incident management includes unlimited phone/SMS alerting, on-call scheduling, escalation policies, Slack-native incident channels, automatic post-mortems, and AI-powered investigation, all at $29/month per responder.

Incident feature Better Stack PostHog
On-call scheduling Built-in Not available
Phone/SMS alerting Unlimited ($29/responder) Not available
Escalation policies Built-in Not available
Slack incident channels Native Not available
Post-mortems Automatic + manual Not available
AI incident investigation Built-in (AI SRE) Not available
Monthly cost (5 responders) $145 PagerDuty Business: ~$245-415 extra

Session replay and real user monitoring

Both platforms have session replay. The architectural difference determines what you can do with the data.

PostHog's session replay is connected to product analytics, feature flags, and experiments. Better Stack's session replay is connected to backend logs, metrics, and traces. For 5M web events and 50,000 session replays per month, Better Stack costs approximately $102 vs PostHog's approximately $250 at equivalent volume (PostHog at $0.005/recording for 50K recordings = $250 for replay alone, before analytics events).

Better Stack: RUM unified with backend telemetry

Better Stack RUM

Better Stack RUM captures frontend sessions, JavaScript errors, Core Web Vitals (LCP, CLS, INP), and user behavior analytics. Because all of this lives in the same data warehouse as backend logs, metrics, and traces, a slow page load is traceable from the browser all the way to the database query that caused it, in one interface.

Session replay runs at 2x speed with pause-skipping and filters for rage clicks, dead clicks, and errors. Sensitive fields are excluded at the SDK level.

Pricing: $0.00150/session replay, no per-session indexing fees.

PostHog: RUM with strong product analytics integration

PostHog's session replay is mature and well-regarded. The connection to feature flags and experiments is genuine: you can watch recordings from users who saw a specific experiment variant, or filter sessions by funnel step.

Mobile support (iOS, Android, React Native, Flutter) is broader than Better Stack's current web-focused offering. Mobile recordings cost 2x web pricing. If mobile RUM is your primary requirement, PostHog has broader native SDK coverage.

Known issues with PostHog session replay: - iOS SDK: session replay degrades frame rate for Google Maps and Google Street View when active. A reported bug shows "every time a screenshot is captured, there's a noticeable drop in frame rate." (GitHub issue #321) - Cookie management tools (CMPs) can cause an infinite loop with PostHog, making the page unresponsive. The fix requires setting persistence: "localStorage+cookie". (PostHog session replay troubleshooting) - Session replay data is batched in memory, meaning page reloads before the minimum duration threshold silently drop recordings. (GitHub issue #1099)

RUM / session replay Better Stack PostHog
Session replay GA GA
Core Web Vitals LCP, CLS, INP Yes
Mobile support Web (mobile coming) iOS, Android, React Native, Flutter
Mobile pricing N/A 2x web pricing
Backend trace correlation Unified (same interface) Not available
Product analytics integration Not available Strong (PostHog differentiator)
Feature flag correlation Not available Built-in
Pricing (web, per recording) $0.00150 $0.005 (5K-15K tier), $0.0015 at 500K+
Website analytics Real-time, UTM, referrers Basic

AI capabilities and MCP

Better Stack: AI SRE + MCP server (GA)

AI SRE activates automatically during incidents. It analyzes your service map, reviews recent deployments, queries logs, and surfaces likely root causes before you've had time to open your laptop.

Better Stack MCP server connects Claude, Cursor, and any MCP-compatible client directly to your observability data. Your AI assistant can query logs, check who's on-call, acknowledge incidents, and build dashboard queries through natural language.

 
{
  "mcpServers": {
    "betterstack": {
      "type": "http",
      "url": "https://mcp.betterstack.com"
    }
  }
}

The MCP server is GA and available to all customers. It covers uptime monitoring, incident management, log querying, metrics, dashboards, error tracking, and on-call scheduling.

PostHog: PostHog AI, no MCP server

Screenshot of PostHog AI

PostHog has an AI assistant called PostHog AI that answers product analytics questions in natural language within the PostHog interface.

PostHog does not have an MCP server. There is no way to connect Claude, Cursor, or other AI coding assistants directly to your PostHog data through the MCP protocol.

AI capability Better Stack PostHog
AI SRE (autonomous incident investigation) GA Not available
MCP server GA (all customers) Not available
Natural language queries Via MCP in any AI client PostHog AI (within PostHog UI only)
AI coding integration Claude Code + Cursor via MCP Not available
Product analytics AI Not available PostHog AI

Error tracking

Both platforms have error tracking. Better Stack is Sentry-compatible and focused on AI-assisted debugging. PostHog integrates error tracking with session replay and product analytics, which is its real differentiator here.

Better Stack: Sentry-compatible, AI-native debugging

Better Stack error tracking dashboard

Better Stack Error Tracking accepts Sentry SDK payloads with no instrumentation changes required, and connects every error to the full distributed trace that caused it. AI-native debugging via Claude Code and Cursor integration gives you pre-made prompts summarizing error context: copy, paste into your AI coding assistant, and debug without manually reading stack traces.

Pricing: $0.000050/exception after 100K free. PostHog charges $0.00037/exception after 100K free, roughly 7x more at scale.

PostHog: error tracking with product analytics depth

PostHog error tracking

PostHog connects exceptions to session replays, user behavior, and feature flag states. That product-side context is genuinely differentiating: "how many users hit this error?" and "did it cause funnel drop-off?" are questions Better Stack can't answer. The trade-off is maturity. PostHog's own docs acknowledge error tracking "is less mature than dedicated tools," the Sentry SDK is a migration path rather than a first-class integration, and there's no APM trace correlation.

Error tracking Better Stack PostHog
Sentry SDK First-class (primary intake) Migration path (native SDK preferred)
AI debugging Claude Code + Cursor (pre-made prompts) PostHog AI (natural language queries)
Trace correlation Automatic Not available (no APM)
Product analytics correlation Not available Strong (PostHog differentiator)
Session replay link Automatic Strong
Pricing $0.000050/exception $0.00037/exception (~7x more)
Free tier 100K/month 100K/month

Status pages and uptime monitoring

PostHog has no status pages product and no uptime monitoring product for customers.

Better Stack: built-in status pages and uptime monitoring

Better Stack Status Pages synchronize automatically with incident management and support custom domains, multi-channel subscriber notifications (email, SMS, Slack, webhook), and password or SSO protection for private pages.

Uptime monitoring runs at 30-second check intervals from multiple global locations, with Playwright-based transaction checks running a real Chrome browser to validate multi-step user flows.

PostHog: no status pages or uptime monitoring for customers

PostHog runs its own status page (status.posthog.com) to communicate about their own service health, but there is no equivalent product available to PostHog customers to build status pages for their own services.

Status pages Better Stack PostHog
Status pages GA Not available
Uptime monitoring GA (30s intervals, Playwright checks) Not available
Subscriber notifications Email, SMS, Slack, webhook Not available
Incident sync Automatic Not available

Product analytics and feature flags

This is the section where PostHog wins clearly, and it's worth saying so directly. Better Stack does not have a product analytics product. There are no funnel analysis tools, no retention cohorts, no experiment tracking, no feature flag management, no A/B testing, and no survey tooling.

If your primary use case is understanding user behavior, measuring experiment impact, or managing feature rollouts safely, PostHog is the stronger platform.

PostHog's genuine strengths

PostHog's product analytics suite is mature and genuinely comprehensive. Funnel analysis, user path exploration, retention tracking, cohort analysis, and correlation analysis are all available with SQL-level flexibility via HogQL for power users.

Feature flags support boolean flags, multivariate flags, percentage rollouts, and user-property targeting, with local evaluation for low-latency performance. Experiments connect flag variants to analytics outcomes automatically. Surveys provide in-product feedback loops integrated with user segments and session data.

The integration across all of these products is the real differentiator. You can define an experiment, measure its impact on a funnel, watch session replays filtered by experiment variant, and correlate errors with specific flag states, all without leaving the product. No observability tool does this.

Product analytics Better Stack PostHog
Funnel analysis Not available GA, strong
Retention analysis Not available GA
Feature flags Not available GA, strong
A/B testing / experiments Not available GA, strong
Surveys Not available GA
User cohorts Not available GA
LLM observability Not available GA
Data warehouse Not available GA (ClickHouse-based)

Enterprise readiness

Both platforms offer enterprise features. One correction from common assumptions: PostHog does support HIPAA compliance via a Business Associate Agreement (BAA) on their platform package ($750/month add-on) and enterprise plans. This is documented at posthog.com/docs/privacy/hipaa-compliance. Better Stack does not currently offer HIPAA compliance.

Enterprise feature Better Stack PostHog
SOC 2 Type II
GDPR
HIPAA (via BAA) ✓ (platform/enterprise package required)
FedRAMP
SSO (SAML/OIDC) Okta, Azure, Google Google, GitHub, GitLab (SSO enforcement: Scale/Enterprise plans)
SCIM provisioning ✓ (Scale/Enterprise plans)
RBAC
Audit logs ✓ (Scale/Enterprise plans)
Data residency EU + US, optional S3 US (Virginia) + EU (Frankfurt)
Dedicated support Slack channel + account manager Email; Slack-based support at $2K+/month
SLA Enterprise SLA available Available
Open source / self-host Not available MIT licensed, self-hostable
Unlimited team members All paid plans All plans

Deployment and integrations

Better Stack

Deploy via a single Helm chart. The eBPF collector runs as a DaemonSet across Kubernetes nodes and discovers services automatically. Here's an overview of how data collection works across telemetry sources:

If you're already running OpenTelemetry collectors in your stack, Better Stack integrates natively without replacing your existing pipeline:

For log collection specifically, many teams use Vector as a processing layer between their services and Better Stack. Here's how that integration works:

Integrations cover 100+ sources across all major stacks: MCP, OpenTelemetry, Vector, Prometheus, Kubernetes, Docker, PostgreSQL, MySQL, Redis, MongoDB, Nginx, and more.

PostHog

PostHog deploys via JavaScript snippet or SDK. Setup is fast for frontend tracking. The SDK ecosystem covers major languages (JavaScript, Python, Ruby, PHP, iOS, Android, React Native, Flutter) with good documentation.

PostHog's data pipeline integrations connect to warehouses (BigQuery, Snowflake, Redshift, S3). The reverse ETL feature allows sending PostHog data to CRMs and marketing automation tools, which is useful for product-led growth workflows.

Deployment Better Stack PostHog
Time to first data Minutes (eBPF auto-discover) Minutes (JS snippet)
Code changes required Zero (backend), SDK (frontend) SDK required for all surfaces
Kubernetes deployment Single DaemonSet SDK per service
OpenTelemetry support Native (all signals) Supported for logs
Data warehouse export Optional (S3 bucket) BigQuery, Snowflake, Redshift

Documented PostHog incidents and known error patterns

PostHog publishes detailed post-mortems and maintains a public status page at status.posthog.com. What follows draws exclusively from those primary sources.


1. Feature flags: 14+ hours of outages across 10 days (October 2025)

Source: PostHog post-mortem

Four separate incidents between October 21-30, 2025, totaling over 14 hours of cumulative impact. Three shared the same root cause: CPU misconfigurations causing Kubernetes to over-pack pods and exhaust connection pools.

Date Duration Impact
October 21 103 minutes ~38% of US evaluation requests erroring
October 24 72 minutes ~97% of requests worldwide returning 429
October 28 123 minutes ~34% of US evaluation requests failing
October 29-30 7 hours 9 minutes CPU sustained above 90%, degraded performance

October 24 is worth singling out. A rate limiting deployment (PR #40074) treated all traffic as a single IP (the load balancer), immediately returning this for 97% of legitimate flag requests:

 
HTTP/1.1 429 Too Many Requests
{"type": "authentication_error", "code": "rate_limit_exceeded"}

No alerting existed for 429 errors, so detection relied on customer reports, causing a 62-minute delay.


2. SDK fetch wrapper: production sites taken down (January 2026)

Source: PostHog post-mortem — Severity: Critical

A bug in the JavaScript SDK's fetch wrapper broke requests with a ReadableStream body across 5 SDK versions over 5 days:

 
TypeError: Failed to execute 'fetch' on 'Window': 
Request with GET/HEAD method cannot have body.

At least 4 customers had production sites go down. Two fix releases (1.327.0 and 1.328.0) never reached PostHog's CDN due to a missed manual approval step. The post-mortem: "Even customers who had pinned their SDK version to 1.328.0 continued to receive the broken lazy-loaded script from the CDN." Detection relied entirely on customer reports: "Unlike issues with our backend systems, we do not get alerts when the SDK fails."


3. npm supply chain attack: PostHog SDKs compromised (November 24, 2025)

Sources: PostHog blog post-mortem · PostHog handbook

PostHog was used as "patient zero" in the Shai-Hulud 2.0 npm supply chain attack at 4:11 AM UTC on November 24, 2025. Malicious SDK versions scanned for credentials and exfiltrated them to public GitHub repositories. Contained by 9:30 AM UTC. This was a broad industry attack (Zapier, Postman, and AsyncAPI were also hit), but developers who installed PostHog SDK packages during that 5-hour window should audit their CI/CD environments for credential exposure.


4. Cross-team query log exposure (August 2025)

Source: PostHog security advisories — Advisory PSA-2025-00001, Severity: Medium

An overly permissive table in PostHog's SQL editor exposed queries made by users in unrelated teams (not results, only query text). PostHog could not fully confirm whether the vulnerability was exploited between December 2024 and July 2025 due to a gap in their audit logs.


5. Breaking SDK change: survey responses silently dropped (February 2025)

Source: PostHog surveys troubleshooting

PostHog removed the $survey_response event property in February 2025, silently breaking survey tracking for customers on pinned SDK versions. From the docs: "Since February 25, we changed the way that survey responses are captured. If you're not seeing them, it might be because the $survey_response property is no longer captured." This reflects a recurring pattern: breaking changes to tracking behavior surface as silent data gaps rather than errors, with the January 2026 SDK incident being the most severe version of the same failure mode.

Final thoughts

PostHog and Better Stack are solving different problems, and the comparison only gets confusing if you treat them as direct competitors across every dimension.

Better Stack is an observability platform with incident management. For engineering teams that need logs, metrics, distributed traces, error tracking, RUM, uptime monitoring, incident management, on-call scheduling, and status pages in a unified tool, Better Stack is the more complete platform. The eBPF auto-instrumentation removes the SDK overhead that makes PostHog's current observability story incomplete for engineering use cases. The MCP server integration, the AI SRE, and the volume-based pricing model that eliminates cardinality and indexing games all point in the same direction: a platform built for the engineering team that keeps production running.

Start your Better Stack trial to see how much of your production toolchain consolidates into one subscription.