Avoiding Flaky Tests in Playwright

Playwright is a powerful end-to-end testing framework developed by Microsoft that enables developers to write reliable, cross-browser tests with support for Chromium, Firefox, and WebKit. Its modern architecture and comprehensive API make it ideal for testing complex web applications across multiple environments.

Flaky tests refer to tests that inconsistently pass or fail when run under identical conditions. These unreliable tests are a significant obstacle in the development process: they erode team confidence in the test suite, waste developer time investigating false failures, and can lead teams to ignore legitimate test failures. In the worst case, flaky tests can completely undermine an automated testing strategy.

This article explores comprehensive strategies to minimize test flakiness in Playwright. We'll cover proper waiting techniques, selector strategies, network request handling, state management, and more. By implementing these practices, you can build a more reliable test suite that provides genuine confidence in your application's functionality.

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Common causes of flaky tests in Playwright

Test flakiness often stems from several common issues:

Race conditions: Tests proceed before the application is ready, leading to timing failures.
Unstable selectors: Using selectors that change between renders or application states.
Network unpredictability: Variations in API response times or intermittent failures.
State contamination: Tests affecting each other due to shared state.
Environmental differences: Tests that only fail in certain environments (CI vs local).
Resource constraints: Tests failing due to CPU, memory, or network limitations.

These factors, whether individually or combined, can lead to test flakiness. Now, let's explore some best practices to mitigate them!

Understand Playwright's auto-waiting

One of Playwright's strengths is its built-in auto-waiting mechanism, which automatically waits for elements to be actionable before performing actions. This distinguishes it from older frameworks like Selenium, where explicit waits are frequently required.

Copied!

// No explicit waits needed - Playwright automatically waits
// for the button to be visible and enabled
await page.click('button#submit');
// Similarly, form interactions wait automatically
await page.fill('#username', 'testuser');

Playwright's auto-waiting is intelligent and adaptive, waiting for elements to be:

Present in the DOM,
Visible (not hidden by CSS),
Stable (not animating or moving),
Enabled (not disabled by the 'disabled' attribute),
Receiving events (not covered by other elements).

However, auto-waiting has limitations. It waits for elements to be visible and enabled, but doesn't necessarily wait for asynchronous data to load or animations to complete. Auto-waiting also doesn't help with:

Elements that appear to be interactive but are waiting for background data
Custom JavaScript-based UI controls that don't use standard disabled attributes
Page transitions where the URL doesn't change
Content that loads dynamically after the page appears ready
Micro-animations or transitions that don't affect visibility but may affect interaction
Virtual scrolling where content appears only when scrolled into view

Understanding these limitations is crucial for writing stable tests. When auto-waiting isn't sufficient, explicit waiting strategies become necessary.

Proper use of waitFor methods

Playwright provides several explicit waiting mechanisms to handle scenarios where auto-waiting is insufficient:

Copied!

// Wait for an element to be visible
await page.waitForSelector('#data-loaded-indicator');

// Wait for an element to disappear
await page.waitForSelector('#loading-spinner', { state: 'hidden' });

// Wait for a specific state (attached, detached, visible, hidden)
await page.waitForSelector('#element', { state: 'visible' });

// Wait with a custom timeout (in milliseconds)
await page.waitForSelector('#slow-element', { timeout: 10000 });

// Wait for multiple elements to be available
await page.waitForSelector('.item:nth-child(10)'); // Waits for at least 10 items

// Wait for a specific URL
await page.waitForURL('**/dashboard');

// Wait for a page load state
await page.waitForLoadState('domcontentloaded');
await page.waitForLoadState('load');
await page.waitForLoadState('networkidle');

// Wait for a download to start
const downloadPromise = page.waitForEvent('download');
await page.click('#download-button');
const download = await downloadPromise;

Each waiting mechanism has specific use cases:

waitForSelector: Best for waiting for elements to appear, disappear, or change state.
waitForFunction: Ideal for complex conditions involving multiple elements or JavaScript state.
waitForLoadState: Good for ensuring the page has reached a certain loading stage.
waitForURL: Perfect for navigation events and redirects.
waitForEvent: Useful for downloads, dialogs, and other events.

❌ Don't do this

Avoid arbitrary timeouts that may fail on slower systems or be unnecessarily long.

Copied!

// Bad practice: arbitrary timeout
await page.waitForTimeout(5000); // Wait 5 seconds and hope element appears
// Bad practice: inconsistent timing
await page.click('#load-data');
await page.waitForTimeout(2000); // Arbitrary wait hoping data has loaded
await page.click('.data-item'); // May fail if data takes longer than 2 seconds

✅ Do this instead

Wait for specific UI changes that indicate the application is ready.

Copied!

// Good practice: wait for a specific condition
await page.click('#load-data');
await page.waitForSelector('[data-test="results-loaded"]');
await page.click('.data-item'); // Now safe because we know data is loaded

Prefer locators to selectors

Selecting DOM elements reliably is the cornerstone of stable Playwright tests. Traditional selection methods like CSS and XPath selectors—while powerful—often create brittle tests that break when your application's structure changes.

Playwright's locator API represents a significant advancement in element selection. Unlike traditional selectors that perform a one-time query, locators are lazy and resilient references to elements that automatically retry until elements become available, wait implicitly for elements to be actionable, and adapt to DOM changes between queries. This approach dramatically reduces flakiness by handling timing issues that plague many end-to-end tests.

The most reliable selectors mimic how users actually perceive and interact with your application. Role-based selection is particularly effective:

Copied!

// Excellent: Uses accessibility roles with name filtering
await page.getByRole('button', { name: 'Submit' }).click();
// Also good for complex controls
await page.getByRole('combobox', { name: 'Country' }).click();
await page.getByRole('option', { name: 'Canada' }).click();

This approach encourages accessibility in your application, remains resilient to implementation changes, and closely models user interaction patterns.

Text-based selection is another powerful approach since users typically identify elements by their visible text:

Copied!

// Good: Users identify elements by their text
await page.getByText('Welcome back').isVisible();
// With additional precision
await page.getByText('Continue', { exact: true }).click();

For form controls, Playwright offers intuitive selection methods that align with user behavior:

Copied!

// Select input by associated label
await page.getByLabel('Email address').fill('user@example.com');
// Select by placeholder
await page.getByPlaceholder('Enter your password').fill('securePass123');

For maximum reliability, especially in complex applications, consider using testing-specific attributes:

Copied!

// Most reliable for complex applications
await page.getByTestId('checkout-button').click();
// In your application code:
<button data-testid="checkout-button">Complete Purchase</button>

Traditional CSS and XPath selectors often create problems. Consider these brittle examples:

Copied!

// Brittle: Depends on exact class names
await page.locator('.MuiButton-contained.MuiButton-primary').click();
// Brittle: Depends on DOM structure
await page.locator('.header div:nth-child(2) > ul > li:nth-child(3) a').click();
// Brittle: XPath with positional dependencies
await page.locator('//div[@class="results"]/div[3]').click();

These selectors break easily when CSS frameworks generate new class names, designers reorganize layout structures, new items are added to lists or menus, or component libraries are updated.

The best implementation strategy is to add test attributes systematically to key elements in your application, prefer role and semantic locators that align with accessibility best practices, fall back to text content for user-facing elements, document selection strategy in your team's testing guidelines, and create helper functions for common selection patterns.

For complex selections, use Playwright's powerful composition features:

Copied!

// Filter by role then refine with has-text
const submitButton = page.getByRole('button').filter({ hasText: 'Submit' });
// Navigate deep inside components
const emailField = page
.getByTestId('login-form')
.getByLabel('Email');
// Combine strategies with has-locator
const activeUserItem = page
.getByTestId('user-list')
.getByRole('listitem')
.filter({
has: page.getByTestId('status-indicator-active')
});

By embracing Playwright's locator paradigm instead of traditional selectors, you create tests that are more stable, maintainable, and resistant to UI changes—dramatically reducing test flakiness.

Configure timeouts appropriately

Playwright's timeout mechanisms are foundational to preventing flaky tests. The framework automatically waits for elements to become available and assertions to pass, but the default timeout values might not always suit your specific application or testing environment.

Proper timeout configuration is critical because improper values frequently cause test flakiness. When timeouts are too short, tests fail sporadically during temporary slowdowns. When they're too long, tests might hang indefinitely when genuine issues occur.

There are two primary timeout settings to understand:

Test timeout: This defines the maximum duration a single test can run. By default, it is set to 30 seconds and controls the total execution time of the test.
Expect timeout: This specifies the maximum time allowed for assertions to complete. The default setting is 5 seconds, determining how long the test waits for conditions to become true.

Most locator actions in Playwright (like click, fill, etc.) include a timeout parameter that defaults to 0, which means these actions rely on the test's overall timeout.

This approach works well in most scenarios because Playwright's auto-waiting capabilities handle typical timing issues. For consistent test behavior across different environments, configure timeouts globally in your Playwright configuration file:

Copied!

import { defineConfig } from '@playwright/test';
export default defineConfig({
  // Test timeout set to 2 minutes
  timeout: 2 * 60 * 1000,
  expect: {
    // Expect timeout set to 10 seconds
    timeout: 10 * 1000
  }
});

For exceptional cases where specific tests need different timeout values, you can override the global settings:

Copied!

import { test, expect } from '@playwright/test';
test('complex operation with longer processing', async ({ page }) => {
  // Increase timeout for this specific test to 5 minutes
  test.setTimeout(5 * 60 * 1000);
  // Test implementation...
});

Another helpful approach is using the test.slow() method, which multiplies the default timeout by three for tests that consistently take longer but don't need explicit timing values:

Copied!

import { test, expect } from '@playwright/test';
test('moderately slow integration test', async ({ page }) => {
  // Mark as slow, giving 3x the normal timeout
  test.slow();
  // Test implementation...
});

When configuring timeouts, follow these principles:

Set global timeouts based on your typical application behavior.
Avoid excessively long global timeouts as they mask actual issues.
Use per-test timeout extensions sparingly and with clear documentation.
Never set test timeouts to 0 (infinite) as this can lead to hung test runs.
Consider different timeout values for development versus CI environments.

With properly configured timeouts, your tests will achieve the right balance between giving operations enough time to complete and failing promptly when something is genuinely wrong.

Set up automatic retries

Playwright's retry mechanism serves as a practical defense against test flakiness in production environments. When enabled, this feature automatically re-runs failed tests multiple times, allowing intermittently failing tests to have additional opportunities to pass before being conclusively marked as failures.

By default, Playwright runs each test exactly once with no automatic retries. This behavior provides clear results during test development but can create problems in CI/CD pipelines where environmental factors might cause occasional failures in otherwise solid tests.

To enable automatic retries, configure the retries parameter in your Playwright configuration:

playwright.config.js

Copied!

import { defineConfig } from '@playwright/test';
export default defineConfig({
  // Give failing tests 3 retry attempts
  retries: 3,
});

Alternatively, you can specify retries directly from the command line, which is useful for specific test runs without modifying your configuration file:

Copied!

npx playwright test --retries=3

When you enable retries, Playwright categorizes test results in more nuanced ways:

Passed: Tests that succeeded on their first attempt.
Flaky: Tests that initially failed but passed on a subsequent retry.
Failed: Tests that failed on first attempt and all retries.

The "flaky" category is particularly valuable as it identifies tests that need attention while still allowing your pipeline to proceed. Each flaky test represents technical debt that should eventually be addressed, but the retry mechanism prevents these issues from completely blocking your workflow.

For maximum effectiveness, consider implementing environment-specific retry configurations:

playwright.config.js

Copied!

import { defineConfig } from '@playwright/test';
export default defineConfig({
  // No retries during local development for immediate feedback
  retries: process.env.CI ? 2 : 0,
  // Optional: More retries for specific challenging test projects
  projects: [
    {
      name: 'stable-features',
      retries: process.env.CI ? 1 : 0,
    },
    {
      name: 'experimental-features',
      retries: process.env.CI ? 3 : 0,
    }
  ]
});

Remember that while retries help manage flakiness, they shouldn't become a permanent solution for fundamentally unstable tests. Use the flaky test reports to identify which tests need improvement, then address the underlying causes of instability for truly reliable test suites.

Handle element collections with proper waiting

Playwright's locator.all() method provides a powerful way to interact with multiple elements matching a single locator, but it comes with significant risks for test stability if used improperly. Understanding its behavior is essential for avoiding unpredictable test results.

Unlike most Playwright locator methods that automatically wait for elements to appear, locator.all() performs an immediate snapshot of the current DOM state without waiting. This fundamental difference makes it susceptible to race conditions in dynamic applications.

Here's a basic example of using this method:

Copied!

for (const item of await page.getByRole('items').all()) {
  await item.click();
}

While this code appears straightforward, it can lead to flaky tests when the list of items is loading asynchronously or changing dynamically. If the DOM is still updating when all() executes, you might capture an incomplete set of elements.

To use locator.all() safely, ensure that the elements you're targeting have fully loaded before calling the method. This typically requires an explicit waiting strategy:

Copied!

// Wait for a specific number of items to be present
await page.waitForFunction(() =>
  document.querySelectorAll('li').length >= 5
);

// Now safe to use all() as the list is fully loaded
for (const item of await page.getByRole('listitem').all()) {
  await item.click();
}

Another effective approach is to wait for a specific condition that indicates content completion:

Copied!

// Wait for loading indicator to disappear
await page.getByTestId('loading-spinner').waitFor({ state: 'hidden' });

// Wait for specific content in the last item to verify list completion
await page.getByRole('listitem').last().getByText('Last Item').waitFor();

// Now safe to use all()
const items = await page.getByRole('listitem').all();
console.log(Found ${items.length} items);

Remember that even after a list has loaded, elements might be removed or added during test execution. If you're storing elements for later use, be aware that the references may become stale. For longer operations, consider re-querying the DOM rather than relying on stored element references.

By implementing proper waiting strategies before using locator.all(), you can significantly reduce flakiness while maintaining the flexibility to interact with dynamic collections of elements.

Debugging flaky tests

Playwright includes powerful diagnostic tools that can help identify the root causes of test flakiness. By capturing detailed recordings of test execution, you can observe exactly what happened during failed tests rather than merely guessing at potential issues.

The most comprehensive debugging tool available is Playwright's trace functionality. Traces record all operations performed during a test, including DOM snapshots, console logs, network requests, and more. This detailed timeline lets you replay test execution step by step to pinpoint where things went wrong.

To enable trace recording in your Playwright configuration:

Copied!

import { defineConfig } from '@playwright/test';
export default defineConfig({
  retries: 1, // Required for on-first-retry trace recording
  use: {
    trace: 'on-first-retry', // Records traces when tests are retried
  },
});

The trace configuration accepts several values to control when recordings happen:

on-first-retry: Records only on the first retry attempt (default).
on-all-retries: Records on every retry attempt.
retain-on-failure: Records for all tests but keeps only failures.
on: Records for every test (not recommended due to performance impact).
off: Disables trace recording.

When a test fails and is retried, Playwright produces a trace.zip file containing the complete execution record. You can explore this file using the Playwright Trace Viewer, a graphical interface that displays the timeline of test actions alongside screenshots of what the browser displayed at each step.

To analyze a trace file, use the Playwright CLI:

Copied!

npx playwright show-trace path/to/trace.zip

You can also upload and view traces in your browser at trace.playwright.dev without installing any software locally, which is especially helpful for sharing diagnostics with team members.

For even more visual context, Playwright can capture screenshots and videos of test execution:

Copied!

import { defineConfig } from '@playwright/test';
export default defineConfig({
  use: {
    // Capture screenshot when tests fail
    screenshot: 'only-on-failure',
    // Record video when tests are retried
    video: 'on-first-retry'
  },
});

Screenshots offer a single point-in-time view of the failure state, while videos provide continuous visual recording of the test execution. For intermittent failures, videos are particularly valuable as they can reveal timing issues, race conditions, or unexpected UI changes that might not be obvious from a static screenshot.

All diagnostic artifacts are stored in the test output directory (typically test-results) with a folder structure that makes it easy to correlate artifacts with specific test failures.

Final thoughts

Creating reliable Playwright tests requires a multi-faceted approach that addresses the common sources of flakiness. By implementing proper waiting strategies with explicit conditions rather than arbitrary timeouts, you establish a foundation for stability.

Choosing stable selectors through data-test attributes insulates your tests from styling changes, while properly mocking network requests ensures consistent behavior regardless of external services.

Test isolation ensures each test runs independently with a clean state, while flexible assertions accommodate reasonable UI variations. In CI environments, configuring appropriate retry policies and timeouts acknowledges that these environments often behave differently than development machines.

When tests do fail, Playwright's tracing and video recording capabilities provide the visibility needed to diagnose issues quickly. Remember that test reliability is an ongoing process, not a one-time fix. With consistent application of these practices, you can build a test suite that provides genuine confidence in your application, enabling faster development cycles and more reliable releases.

Got an article suggestion? Let us know

Getting Started with Playwright Testing in Python

Learn how to use Playwright with Python for end-to-end web testing. This comprehensive guide covers everything from basic setup to advanced features with practical examples.

→