# Avoiding Flaky Tests in Playwright

Playwright is a powerful end-to-end testing framework developed by Microsoft
that enables developers to write reliable, cross-browser tests with support for
Chromium, Firefox, and WebKit. Its modern architecture and comprehensive API
make it ideal for testing complex web applications across multiple environments.

**Flaky tests refer to tests that inconsistently pass or fail when run under
identical conditions**. These unreliable tests are a significant obstacle in the
development process: they erode team confidence in the test suite, waste
developer time investigating false failures, and can lead teams to ignore
legitimate test failures. In the worst case, flaky tests can completely
undermine an automated testing strategy.

This article explores comprehensive strategies to minimize test flakiness in
Playwright. We'll cover proper waiting techniques, selector strategies, network
request handling, state management, and more. By implementing these practices,
you can build a more reliable test suite that provides genuine confidence in
your application's functionality.

[ad-logs]

## Common causes of flaky tests in Playwright

Test flakiness often stems from several common issues:

- **Race conditions**: Tests proceed before the application is ready, leading to
  timing failures.
- **Unstable selectors**: Using selectors that change between renders or
  application states.
- **Network unpredictability**: Variations in API response times or intermittent
  failures.
- **State contamination**: Tests affecting each other due to shared state.
- **Environmental differences**: Tests that only fail in certain environments
  (CI vs local).
- **Resource constraints**: Tests failing due to CPU, memory, or network
  limitations.

These factors, whether individually or combined, can lead to test flakiness.
Now, let's explore some best practices to mitigate them!

## Understand Playwright's auto-waiting

One of Playwright's strengths is its built-in auto-waiting mechanism, which
automatically waits for elements to be actionable before performing actions.
This distinguishes it from older frameworks like Selenium, where explicit waits
are frequently required.

```javascript
// No explicit waits needed - Playwright automatically waits
// for the button to be visible and enabled
await page.click('button#submit');
// Similarly, form interactions wait automatically
await page.fill('#username', 'testuser');
```

Playwright's auto-waiting is intelligent and adaptive, waiting for elements to
be:

- Present in the DOM,
- Visible (not hidden by CSS),
- Stable (not animating or moving),
- Enabled (not disabled by the 'disabled' attribute),
- Receiving events (not covered by other elements).

However, auto-waiting has limitations. It waits for elements to be visible and
enabled, but doesn't necessarily wait for asynchronous data to load or
animations to complete. Auto-waiting also doesn't help with:

- Elements that appear to be interactive but are waiting for background data
- Custom JavaScript-based UI controls that don't use standard disabled
  attributes
- Page transitions where the URL doesn't change
- Content that loads dynamically after the page appears ready
- Micro-animations or transitions that don't affect visibility but may affect
  interaction
- Virtual scrolling where content appears only when scrolled into view

Understanding these limitations is crucial for writing stable tests. When
auto-waiting isn't sufficient, explicit waiting strategies become necessary.

## Proper use of waitFor methods

Playwright provides several explicit waiting mechanisms to handle scenarios
where auto-waiting is insufficient:

```javascript
// Wait for an element to be visible
await page.waitForSelector('#data-loaded-indicator');

// Wait for an element to disappear
await page.waitForSelector('#loading-spinner', { state: 'hidden' });

// Wait for a specific state (attached, detached, visible, hidden)
await page.waitForSelector('#element', { state: 'visible' });

// Wait with a custom timeout (in milliseconds)
await page.waitForSelector('#slow-element', { timeout: 10000 });

// Wait for multiple elements to be available
await page.waitForSelector('.item:nth-child(10)'); // Waits for at least 10 items

// Wait for a specific URL
await page.waitForURL('**/dashboard');

// Wait for a page load state
await page.waitForLoadState('domcontentloaded');
await page.waitForLoadState('load');
await page.waitForLoadState('networkidle');

// Wait for a download to start
const downloadPromise = page.waitForEvent('download');
await page.click('#download-button');
const download = await downloadPromise;
```

Each waiting mechanism has specific use cases:

- `waitForSelector`: Best for waiting for elements to appear, disappear, or
  change state.
- `waitForFunction`: Ideal for complex conditions involving multiple elements or
  JavaScript state.
- `waitForLoadState`: Good for ensuring the page has reached a certain loading
  stage.
- `waitForURL`: Perfect for navigation events and redirects.
- `waitForEvent`: Useful for downloads, dialogs, and other events.

### ❌ Don't do this

Avoid arbitrary timeouts that may fail on slower systems or be unnecessarily
long.

```javascript
// Bad practice: arbitrary timeout
await page.waitForTimeout(5000); // Wait 5 seconds and hope element appears
// Bad practice: inconsistent timing
await page.click('#load-data');
await page.waitForTimeout(2000); // Arbitrary wait hoping data has loaded
await page.click('.data-item'); // May fail if data takes longer than 2 seconds
```

### ✅ Do this instead

Wait for specific UI changes that indicate the application is ready.

```javascript
// Good practice: wait for a specific condition
await page.click('#load-data');
await page.waitForSelector('[data-test="results-loaded"]');
await page.click('.data-item'); // Now safe because we know data is loaded
```

## Prefer locators to selectors

Selecting DOM elements reliably is the cornerstone of stable Playwright tests.
Traditional selection methods like CSS and XPath selectors—while powerful—often
create brittle tests that break when your application's structure changes.

Playwright's locator API represents a significant advancement in element
selection. Unlike traditional selectors that perform a one-time query, locators
are lazy and resilient references to elements that automatically retry until
elements become available, wait implicitly for elements to be actionable, and
adapt to DOM changes between queries. This approach dramatically reduces
flakiness by handling timing issues that plague many end-to-end tests.

The most reliable selectors mimic how users actually perceive and interact with
your application. Role-based selection is particularly effective:

```javascript
// Excellent: Uses accessibility roles with name filtering
await page.getByRole('button', { name: 'Submit' }).click();
// Also good for complex controls
await page.getByRole('combobox', { name: 'Country' }).click();
await page.getByRole('option', { name: 'Canada' }).click();
```

This approach encourages accessibility in your application, remains resilient to
implementation changes, and closely models user interaction patterns.

Text-based selection is another powerful approach since users typically identify
elements by their visible text:

```javascript
// Good: Users identify elements by their text
await page.getByText('Welcome back').isVisible();
// With additional precision
await page.getByText('Continue', { exact: true }).click();
```

For form controls, Playwright offers intuitive selection methods that align with
user behavior:

```javascript
// Select input by associated label
await page.getByLabel('Email address').fill('user@example.com');
// Select by placeholder
await page.getByPlaceholder('Enter your password').fill('securePass123');
```

For maximum reliability, especially in complex applications, consider using
testing-specific attributes:

```javascript
// Most reliable for complex applications
await page.getByTestId('checkout-button').click();
// In your application code:
<button data-testid="checkout-button">Complete Purchase</button>
```

Traditional CSS and XPath selectors often create problems. Consider these
brittle examples:

```javascript
// Brittle: Depends on exact class names
await page.locator('.MuiButton-contained.MuiButton-primary').click();
// Brittle: Depends on DOM structure
await page.locator('.header div:nth-child(2) > ul > li:nth-child(3) a').click();
// Brittle: XPath with positional dependencies
await page.locator('//div[@class="results"]/div[3]').click();
```

These selectors break easily when CSS frameworks generate new class names,
designers reorganize layout structures, new items are added to lists or menus,
or component libraries are updated.

The best implementation strategy is to add test attributes systematically to key
elements in your application, prefer role and semantic locators that align with
accessibility best practices, fall back to text content for user-facing
elements, document selection strategy in your team's testing guidelines, and
create helper functions for common selection patterns.

For complex selections, use Playwright's powerful composition features:

```javascript
// Filter by role then refine with has-text
const submitButton = page.getByRole('button').filter({ hasText: 'Submit' });
// Navigate deep inside components
const emailField = page
.getByTestId('login-form')
.getByLabel('Email');
// Combine strategies with has-locator
const activeUserItem = page
.getByTestId('user-list')
.getByRole('listitem')
.filter({
has: page.getByTestId('status-indicator-active')
});
```

By embracing Playwright's locator paradigm instead of traditional selectors, you
create tests that are more stable, maintainable, and resistant to UI
changes—dramatically reducing test flakiness.

## Configure timeouts appropriately

![266688701-ffca2fd1-5349-41fb-ade9-ace143bb2c58.png](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/0f8eb43a-53d3-46d2-e5ea-fc6872042b00/orig =3743x2027)

Playwright's timeout mechanisms are foundational to preventing flaky tests. The
framework automatically waits for elements to become available and assertions to
pass, but the default timeout values might not always suit your specific
application or testing environment.

Proper timeout configuration is critical because improper values frequently
cause test flakiness. When timeouts are too short, tests fail sporadically
during temporary slowdowns. When they're too long, tests might hang indefinitely
when genuine issues occur.

There are two primary timeout settings to understand:

- **Test timeout**: This defines the maximum duration a single test can run. By
  default, it is set to 30 seconds and controls the total execution time of the
  test.

- **Expect timeout**: This specifies the maximum time allowed for assertions to
  complete. The default setting is 5 seconds, determining how long the test
  waits for conditions to become true.

Most locator actions in Playwright (like `click`, `fill`, etc.) include a
timeout parameter that defaults to 0, which means these actions rely on the
test's overall timeout.

This approach works well in most scenarios because Playwright's auto-waiting
capabilities handle typical timing issues. For consistent test behavior across
different environments, configure timeouts globally in your Playwright
configuration file:

```javascript
import { defineConfig } from '@playwright/test';
export default defineConfig({
  // Test timeout set to 2 minutes
  timeout: 2 * 60 * 1000,
  expect: {
    // Expect timeout set to 10 seconds
    timeout: 10 * 1000
  }
});
```

For exceptional cases where specific tests need different timeout values, you
can override the global settings:

```javascript
import { test, expect } from '@playwright/test';
test('complex operation with longer processing', async ({ page }) => {
  // Increase timeout for this specific test to 5 minutes
  test.setTimeout(5 * 60 * 1000);
  // Test implementation...
});
```

Another helpful approach is using the `test.slow()` method, which multiplies the
default timeout by three for tests that consistently take longer but don't need
explicit timing values:

```javascript
import { test, expect } from '@playwright/test';
test('moderately slow integration test', async ({ page }) => {
  // Mark as slow, giving 3x the normal timeout
  test.slow();
  // Test implementation...
});
```

When configuring timeouts, follow these principles:

- Set global timeouts based on your typical application behavior.
- Avoid excessively long global timeouts as they mask actual issues.
- Use per-test timeout extensions sparingly and with clear documentation.
- Never set test timeouts to 0 (infinite) as this can lead to hung test runs.
- Consider different timeout values for development versus CI environments.

With properly configured timeouts, your tests will achieve the right balance
between giving operations enough time to complete and failing promptly when
something is genuinely wrong.

## Set up automatic retries

Playwright's retry mechanism serves as a practical defense against test
flakiness in production environments. When enabled, this feature automatically
re-runs failed tests multiple times, allowing intermittently failing tests to
have additional opportunities to pass before being conclusively marked as
failures.

By default, Playwright runs each test exactly once with no automatic retries.
This behavior provides clear results during test development but can create
problems in CI/CD pipelines where environmental factors might cause occasional
failures in otherwise solid tests.

To enable automatic retries, configure the retries parameter in your Playwright
configuration:

```javascript
[label playwright.config.js]
import { defineConfig } from '@playwright/test';
export default defineConfig({
  // Give failing tests 3 retry attempts
  retries: 3,
});
```

Alternatively, you can specify retries directly from the command line, which is
useful for specific test runs without modifying your configuration file:

```command
npx playwright test --retries=3
```

When you enable retries, Playwright categorizes test results in more nuanced
ways:

- **Passed**: Tests that succeeded on their first attempt.
- **Flaky**: Tests that initially failed but passed on a subsequent retry.
- **Failed**: Tests that failed on first attempt and all retries.

The "flaky" category is particularly valuable as it identifies tests that need
attention while still allowing your pipeline to proceed. Each flaky test
represents technical debt that should eventually be addressed, but the retry
mechanism prevents these issues from completely blocking your workflow.

For maximum effectiveness, consider implementing environment-specific retry
configurations:

```typescript
[label playwright.config.js]
import { defineConfig } from '@playwright/test';
export default defineConfig({
  // No retries during local development for immediate feedback
  retries: process.env.CI ? 2 : 0,
  // Optional: More retries for specific challenging test projects
  projects: [
    {
      name: 'stable-features',
      retries: process.env.CI ? 1 : 0,
    },
    {
      name: 'experimental-features',
      retries: process.env.CI ? 3 : 0,
    }
  ]
});
```

Remember that while retries help manage flakiness, they shouldn't become a
permanent solution for fundamentally unstable tests. Use the flaky test reports
to identify which tests need improvement, then address the underlying causes of
instability for truly reliable test suites.

![183437023-524f1803-84e4-4862-9ce3-1d55af0e023e.png](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/2fcb451a-7e2a-4449-25c4-12a0bb011800/lg1x =1943x1172)

## Handle element collections with proper waiting

Playwright's `locator.all()` method provides a powerful way to interact with
multiple elements matching a single locator, but it comes with significant risks
for test stability if used improperly. Understanding its behavior is essential
for avoiding unpredictable test results.

Unlike most Playwright locator methods that automatically wait for elements to
appear, `locator.all()` performs an immediate snapshot of the current DOM state
without waiting. This fundamental difference makes it susceptible to race
conditions in dynamic applications.

Here's a basic example of using this method:

```javascript
for (const item of await page.getByRole('items').all()) {
  await item.click();
}
```

While this code appears straightforward, it can lead to flaky tests when the
list of items is loading asynchronously or changing dynamically. If the DOM is
still updating when `all()` executes, you might capture an incomplete set of
elements.

To use `locator.all()` safely, ensure that the elements you're targeting have
fully loaded before calling the method. This typically requires an explicit
waiting strategy:

```javascript
// Wait for a specific number of items to be present
await page.waitForFunction(() =>
  document.querySelectorAll('li').length >= 5
);

// Now safe to use all() as the list is fully loaded
for (const item of await page.getByRole('listitem').all()) {
  await item.click();
}
```

Another effective approach is to wait for a specific condition that indicates
content completion:

```javascript
// Wait for loading indicator to disappear
await page.getByTestId('loading-spinner').waitFor({ state: 'hidden' });

// Wait for specific content in the last item to verify list completion
await page.getByRole('listitem').last().getByText('Last Item').waitFor();

// Now safe to use all()
const items = await page.getByRole('listitem').all();
console.log(Found ${items.length} items);
```

Remember that even after a list has loaded, elements might be removed or added
during test execution. If you're storing elements for later use, be aware that
the references may become stale. For longer operations, consider re-querying the
DOM rather than relying on stored element references.

By implementing proper waiting strategies before using `locator.all()`, you can
significantly reduce flakiness while maintaining the flexibility to interact
with dynamic collections of elements.

## Debugging flaky tests

Playwright includes powerful diagnostic tools that can help identify the root
causes of test flakiness. By capturing detailed recordings of test execution,
you can observe exactly what happened during failed tests rather than merely
guessing at potential issues.

The most comprehensive debugging tool available is Playwright's trace
functionality. Traces record all operations performed during a test, including
DOM snapshots, console logs, network requests, and more. This detailed timeline
lets you replay test execution step by step to pinpoint where things went wrong.

![screenshot-2025-02-27-09-31-21.png](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/9072ceaf-9049-4463-0e45-49583532c900/md1x =1794x1015)

To enable trace recording in your Playwright configuration:

```typescript
import { defineConfig } from '@playwright/test';
export default defineConfig({
  retries: 1, // Required for on-first-retry trace recording
  use: {
    trace: 'on-first-retry', // Records traces when tests are retried
  },
});
```

The trace configuration accepts several values to control when recordings
happen:

- `on-first-retry`: Records only on the first retry attempt (default).
- `on-all-retries`: Records on every retry attempt.
- `retain-on-failure`: Records for all tests but keeps only failures.
- `on`: Records for every test (not recommended due to performance impact).
- `off`: Disables trace recording.

When a test fails and is retried, Playwright produces a `trace.zip` file
containing the complete execution record. You can explore this file using the
Playwright Trace Viewer, a graphical interface that displays the timeline of
test actions alongside screenshots of what the browser displayed at each step.

To analyze a trace file, use the Playwright CLI:

```command
npx playwright show-trace path/to/trace.zip
```

You can also upload and view traces in your browser at trace.playwright.dev
without installing any software locally, which is especially helpful for sharing
diagnostics with team members.

For even more visual context, Playwright can capture screenshots and videos of
test execution:

```javascript
import { defineConfig } from '@playwright/test';
export default defineConfig({
  use: {
    // Capture screenshot when tests fail
    screenshot: 'only-on-failure',
    // Record video when tests are retried
    video: 'on-first-retry'
  },
});
```

Screenshots offer a single point-in-time view of the failure state, while videos
provide continuous visual recording of the test execution. For intermittent
failures, videos are particularly valuable as they can reveal timing issues,
race conditions, or unexpected UI changes that might not be obvious from a
static screenshot.

All diagnostic artifacts are stored in the test output directory (typically
`test-results`) with a folder structure that makes it easy to correlate
artifacts with specific test failures.

## Final thoughts

Creating reliable Playwright tests requires a multi-faceted approach that
addresses the common sources of flakiness. By implementing proper waiting
strategies with explicit conditions rather than arbitrary timeouts, you
establish a foundation for stability.

Choosing stable selectors through data-test attributes insulates your tests from
styling changes, while properly mocking network requests ensures consistent
behavior regardless of external services.

Test isolation ensures each test runs independently with a clean state, while
flexible assertions accommodate reasonable UI variations. In CI environments,
configuring appropriate retry policies and timeouts acknowledges that these
environments often behave differently than development machines.

When tests do fail, Playwright's tracing and video recording capabilities
provide the visibility needed to diagnose issues quickly. Remember that test
reliability is an ongoing process, not a one-time fix. With consistent
application of these practices, you can build a test suite that provides genuine
confidence in your application, enabling faster development cycles and more
reliable releases.
