End-to-end testing

Have agents write maintainable E2E from the user’s view: stable selectors, sensible waits, environment and test-data contracts.

Role, essentials, and scenario decision tree

E2E proves “wiring works” and critical user journeys — it does not replace unit/integration tests. The SKILL should specify base URL, auth, test data lifecycle, and where traces/screenshots go on failure.

Split login, payments, and other critical flows into composable steps and page objects (or equivalent), and note preview vs production differences (feature flags, third-party sandboxes).

Each scenario has clear Given-When-Then; assertions target user-visible outcomes.
Parallel jobs use separate accounts or isolated datasets.
On failure, screenshots and traces are retained automatically for agent-assisted diagnosis.

Scenario selection criteria (test only the 3–5 highest-business-value flows):

Decision tree for E2E scenario selection:

Must write E2E (if any one applies):
  ✓ Directly revenue-related: registration, login, payment, order placement
  ✓ Regulatory/compliance required: identity verification, data export, account deletion
  ✓ Multi-service chain that no single integration test can cover end-to-end

Should NOT write E2E:
  ✗ Pure logic already covered by unit/integration tests (e.g. discount calculation)
  ✗ Infrequently-used back-office features (low risk, high maintenance cost)
  ✗ Visual-only differences (use visual regression tests instead)

Prioritization matrix:
  High business value + High technical risk → Must have E2E (e.g. checkout flow)
  High business value + Low technical risk  → Integration test sufficient (e.g. profile edit)
  Low business value  + High technical risk → E2E optional (e.g. data export)
  Low business value  + Low technical risk  → No E2E needed

Complete Playwright Page Object example (login + cart checkout):

// tests/e2e/pages/LoginPage.ts
import { Page, Locator } from '@playwright/test';

export class LoginPage {
  private readonly emailInput: Locator;
  private readonly passwordInput: Locator;
  private readonly submitButton: Locator;

  constructor(private page: Page) {
    this.emailInput = page.getByLabel('Email');
    this.passwordInput = page.getByLabel('Password');
    this.submitButton = page.getByRole('button', { name: 'Sign in' });
  }

  async goto() {
    await this.page.goto('/login');
  }

  async login(email: string, password: string) {
    await this.emailInput.fill(email);
    await this.passwordInput.fill(password);
    await this.submitButton.click();
    // Wait for navigation to complete (no sleep — wait for URL change)
    await this.page.waitForURL('/dashboard');
  }
}

// tests/e2e/pages/CartPage.ts
export class CartPage {
  constructor(private page: Page) {}

  async addItem(productName: string) {
    await this.page.getByRole('button', { name: `Add to cart ${productName}` }).click();
    // Wait for cart icon count to update — not sleep
    await this.page.getByTestId('cart-count').filter({ hasText: /[1-9]/ }).waitFor();
  }

  async checkout() {
    await this.page.getByRole('link', { name: 'View cart' }).click();
    await this.page.waitForURL('/cart');
    await this.page.getByRole('button', { name: 'Proceed to checkout' }).click();
    await this.page.waitForURL('/checkout');
  }

  async getItemCount(): Promise<number> {
    const text = await this.page.getByTestId('cart-count').textContent();
    return parseInt(text || '0');
  }
}

// tests/e2e/checkout.spec.ts
import { test, expect } from '@playwright/test';
import { LoginPage } from './pages/LoginPage';
import { CartPage } from './pages/CartPage';

test.describe('Complete checkout flow', () => {
  test.beforeEach(async ({ page }) => {
    // Create test account via API (not UI registration — faster and more stable)
    await page.request.post('/api/test/create-user', {
      data: { email: 'test@example.com', password: 'Test1234!' }
    });
  });

  test('user can log in and place an order', async ({ page }) => {
    const loginPage = new LoginPage(page);
    const cartPage = new CartPage(page);

    // 1. Login
    await loginPage.goto();
    await loginPage.login('test@example.com', 'Test1234!');

    // 2. Add item
    await page.goto('/products');
    await cartPage.addItem('Laptop');
    expect(await cartPage.getItemCount()).toBe(1);

    // 3. Checkout
    await cartPage.checkout();
    await expect(page).toHaveURL('/checkout');
    await expect(page.getByRole('heading', { name: 'Confirm Order' })).toBeVisible();
  });
});

Selector stability

Prefer accessibility queries (role + accessible name), stable copy (product-approved), and team data-testid (or equivalent contract attributes); avoid style classes, deep DOM, auto id, and brittle long XPath.

Ban naked sleep in the SKILL: use auto-waits (expect / assertion retry), network idle, or framework “stable after action” hooks; for heavy animation, agree what “stable” means (route finished, skeleton gone).

Checklist: Can role+name uniquely target? If lists repeat, scope under a parent (landmark / form) first? Dynamic lists use text or inline testid—not ordinal index alone?

Flake management (5 root causes with code fixes)

Retries (job- or test-level) should be last resort; the SKILL should require root-cause buckets for failures: race (assert before UI/network ready), dirty data (parallel shared accounts or cache), external deps (rate limits, clocks), environment drift (browser version, regional CDN).

❌ Root cause 1: Hard-coded sleep instead of explicit wait
// Bad (fails when page loads slowly)
await page.click('#submit');
await page.waitForTimeout(2000);  // ← fixed 2-second wait
await expect(page.locator('.success')).toBeVisible();

// Fix (wait for a specific condition)
await page.click('#submit');
await expect(page.locator('.success')).toBeVisible({ timeout: 10_000 });

---
❌ Root cause 2: Parallel tests sharing the same account
// Bad (two parallel tests fight over the same user data)
const user = { email: 'fixed-test@example.com' };

// Fix (each test gets its own unique account)
const uniqueEmail = `test-${Date.now()}-${Math.random().toString(36).slice(2)}@example.com`;

---
❌ Root cause 3: Assertion fires before navigation completes
// Bad
await page.click('[data-testid="order-btn"]');
await expect(page.locator('.order-id')).toBeVisible();  // ← may not have navigated yet

// Fix (wait for URL change first, then assert)
await page.click('[data-testid="order-btn"]');
await page.waitForURL('/order-confirm/**');
await expect(page.locator('.order-id')).toBeVisible();

---
❌ Root cause 4: Screenshot/assertion during animation
// Bad
await page.click('.menu-trigger');
await expect(page.locator('.dropdown-menu')).toHaveScreenshot();  // ← still animating

// Fix (wait for animation-end marker)
await page.click('.menu-trigger');
await page.locator('.dropdown-menu').waitFor({ state: 'visible' });
await page.locator('.dropdown-menu').evaluate(el =>
  new Promise(resolve => {
    el.addEventListener('animationend', resolve, { once: true });
  })
);
await expect(page.locator('.dropdown-menu')).toHaveScreenshot();

---
❌ Root cause 5: System clock dependency causing boundary instability
// Bad (coupon expires today — may fail in some environments)
const coupon = { expiresAt: new Date().toISOString() };

// Fix (use explicit past/future dates in test data)
const pastDate = new Date('2020-01-01').toISOString();
const futureDate = new Date('2099-12-31').toISOString();
const expiredCoupon = { expiresAt: pastDate };   // Always expired
const validCoupon   = { expiresAt: futureDate };  // Always valid

Track consecutive failures separately from intermittent ones; intermittent issues get a dedicated issue with repro conditions.
Ban “longer sleep to green”: wait on explicit conditions or tighten data preconditions.
Known flakes may be quarantined (separate job, non-main gate) but need an expiry and owner.

E2E in CI (4-worker sharding + visual regression)

Order feedback: cheap signals first, expensive E2E last; E2E jobs should reuse build artifacts or preview envs and upload artifacts on failure. When sharding, each shard uses isolated data slices or account pools to avoid stepping on data.

Playwright visual regression configuration:

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  use: {
    // Save screenshots only on failure
    screenshot: 'only-on-failure',
    // Save video only on failure
    video: 'retain-on-failure',
    // Trace info (for debugging flaky tests)
    trace: 'on-first-retry',
  },
  // Visual regression config
  expect: {
    toHaveScreenshot: {
      // Allow 0.1% pixel difference (anti-aliasing / font rendering variance)
      maxDiffPixelRatio: 0.001,
      // Use mask to ignore dynamic content (timestamps, etc.)
    },
  },
  // Only run full E2E on main branch (PR runs smoke tests only)
  projects: [
    {
      name: 'smoke',
      testMatch: '**/*.smoke.spec.ts',
      use: { ...devices['Desktop Chrome'] },
    },
    {
      name: 'full-e2e',
      testMatch: '**/*.spec.ts',
      use: { ...devices['Desktop Chrome'] },
      // Only run on main branch / tag
    },
  ],
});

// Visual regression test example
test('checkout page visual regression', async ({ page }) => {
  await page.goto('/checkout');
  await page.waitForLoadState('networkidle');

  // Mask dynamic content (timestamps, countdown timers, etc.)
  const screenshot = await page.screenshot({
    mask: [page.locator('[data-testid="countdown-timer"]')] 
  });
  expect(screenshot).toMatchSnapshot('checkout-page.png');
});

4-worker parallel sharding + full suite only on main branch:

name: E2E Tests

on:
  push:
    branches: [main]   # Full E2E only on main
  pull_request:        # PR runs smoke tests only

jobs:
  e2e-smoke:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npx playwright install --with-deps chromium
      - run: npx playwright test --project=smoke
      - uses: actions/upload-artifact@v4
        if: failure()
        with: { name: smoke-test-results, path: playwright-report/ }

  e2e-full:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shardIndex: [1, 2, 3, 4]  # 4 workers in parallel
        shardTotal: [4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
        env:
          BASE_URL: ${{ vars.STAGING_URL }}
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: e2e-report-shard-${{ matrix.shardIndex }}
          path: playwright-report/

  [ push / open PR ]
                    │
                    ▼
          [ lint · typecheck · unit tests ]  ← fail fast
                    │
                    ▼
          [ build / preview env / image ]
                    │
                    ▼
   [ E2E: serial critical path OR sharded parallel + data isolation ]
                    │
           ┌────────┴────────┐
           ▼                 ▼
    [ pass: merge gate ]   [ fail: trace / screenshot / video ]
           │                      │
           │                      ▼
           │              [ classify: bug / flake / env ]
           │                      │
           └──────────────────────┘
                    │
                    ▼
            [ Regression or quarantine follow-up ]

Selector tips generator

Describe the target element and page context to generate tailored selector-stability bullets for a SKILL or review comment. Data stays in this browser (localStorage), not uploaded.

Element to locate (role, label, or short description) Page / flow context Stack (optional)

---
name: e2e-testing
description: Stable, diagnosable end-to-end user journey tests
---
# Essentials
1. Scenario selection: only revenue-critical or multi-service journeys; skip what unit/integration already covers
2. Page Objects: one class per page, actions return void, no assertions inside page objects
3. Selectors: getByRole/getByLabel/getByTestId; no CSS classes or nth-child chains
4. Flake prevention: no sleep; wait for URL change, visible state, or network idle
5. CI: smoke tests on PR; full suite on main with 4-worker sharding; upload artifacts on failure

Back to skills More skills