Skip to Content

FROM TEST CASE TO RUNNING PLAYWRIGHT SPEC: HOW SKILLS MAKE AGENTIC AI TEST AUTOMATION EFFICIENT

April 10, 2026
Mohammed Akbar Ali

About skills and why we used them for Test Automation

We (Automation COE team Sogeti India) used the Skills framework from the Anthropic open-source project (https://github.com/anthropics/skills) as the foundation for our AI-driven test automation workflow. Skills are version‑controlled, markdown‑based “knowledge modules” that encode how your automation must be generated — including methodology, naming conventions, Page Object Model rules, locators, assertions, logging, and validation criteria.

For Playwright automation, Skills act as the AI agent’s governing specification. Instead of relying on prompts or templates, the agent reads your Skill before every generation task, performs live DOM recon with playwright-cli, applies your POM architecture, enforces your validation checklist, and produces production-ready Playwright tests with zero manual code.
This blog explains how Skills solve the long‑standing problems in AI test generation and enable truly deterministic, convention-driven test automation.


Generating production Playwright Tests with zero manual code

AI can write Playwright tests. But can it write your Playwright tests — following your architecture, naming rules, and quality bar?
The gap between “AI-generated code” and “production-ready test code” is where most teams struggle.

Playwright Skills close that gap completely, converting plain-English test cases into validated, self‑healing Playwright specs with zero manual effort.


The AI Test Generation problem nobody talks about

The Hidden Cost of “Just Ask AI to Write Tests”

  • Convention ignorance: AI writes raw page.click() calls, ignoring your POM.
  • Hallucinated locators: Selectors invented from training data, not your DOM.
  • No quality gate: Broken imports, missing screenshots, hardcoded URLs.
  • One-off generation: Standards don’t propagate to previously generated tests.
  • Fragile output: When tests break, AI retries randomly with no strategy.

Why prompting AI directly fails

  • Requires long, detailed prompts that drift over time.
  • Output is inconsistent across sessions.
  • Browser tools alone cannot enforce conventions.
  • Prompt templates eventually become stale.

Skills: The Game-Changing approach

What is a Skill?

A Skill is a markdown file inside your repository that encodes how your team writes tests, including:

  • methodology & phases
  • naming conventions
  • architecture rules
  • reference examples
  • validation checklist

The agent reads the Skill at the start of each execution and generates code exactly as specified.

A complete Playwright Skill contains:

  1. Methodology (phases, rules, naming, architecture)
  2. Reference examples (POM + spec used as canonical templates)
  3. Validation checklist (binary rules for self‑review)

The agent:

  • reads the methodology
  • performs live DOM recon with playwright-cli
  • generates page objects and specs
  • self-validates using the checklist

Why this approach is revolutionary

  • Zero convention drift — every file matches your engineering style.
  • Live locators only — no hallucinated selectors.
  • Built-in quality gate — checklist enforces standards automatically.
  • Co-evolving standards — update the methodology once; all future tests will follow.
  • Structured healing — predictable fix-and-rerun flow, no random retry.

From Test Case to running spec

The Old Way — Raw AI Output

A typical prompt-based AI output:

  • Hardcoded URLs
  • Unreliable selectors
  • No page objects
  • No logging or screenshots
  • No load state coordination

The Skills way — Production architecture

The same test now produces:

  • A 3-layer POM (atomic, composite, assertion)
  • Automatic screenshots
  • Phase/sub-step logging
  • Clean specs calling only POM methods
  • Live selectors from recon

The recon loop — Live selectors, always

This is the breakthrough:

  1. Agent opens the page with playwright-cli
  2. A DOM snapshot is generated
  3. Agent extracts element refs from the snapshot
  4. Generated code uses real locators derived from the snapshot
  5. Every action refreshes the snapshot

This eliminates the biggest cause of flaky AI‑generated tests: selector hallucination.


The heal loop — Structured failure recovery

If the test fails:

  • Agent classifies the error
  • Applies the exact fix needed
  • Re-runs test (max 2 cycles)

Examples:

  • Timeout → fix locator using previous snapshots
  • Strict mode → apply .first() or .nth()
  • Assertion mismatch → update only expected value
  • Import error → fix that line only

If still failing → methodology needs an update. Not more retries.


Three assets that power skills

1. The Methodology (Phases + Rules)

Phases:

  1. Parse
  2. Recon
  3. Record
  4. Generate

Rule: Never write locators manually — only from recon output.


2. The validation checklist (Quality Gate)

Binary rules such as:

  • all POMs extend base class
  • no hardcoded URLs
  • screenshot in all assertion methods
  • composite methods must end with screenshot
  • no raw page calls in spec
  • console logs for all phases and sub-steps

Unambiguous ✓/✗ examples are required.


3. The Agent Orchestrator (Coordinator)

Responsibilities:

  • read methodology
  • generate code
  • validate using checklist
  • run Playwright tests
  • apply healing if needed
  • output final report

The Skill is the expert; the agent is the orchestrator.


Real-World impact

Input:
Plain English test case in .txt, .csv, or typed.

Output:
Fully validated, passing Playwright tests + HTML report.

Metrics:

  • 0 manual code
  • 0 hallucinated selectors
  • Automatic validation
  • 2-cycle max healing
  • Your standards applied consistently

Benefits

  • Instant consistency across teams
  • Living standards that update instantly
  • Self-reviewing output
  • DOM-reliable selectors
  • Phase-level logging
  • Reusable expertise across agents

Implementation best practices

  • Keep methodology + examples version-controlled.
  • Ensure reference examples are kept updated.
  • Make checklist rules binary.
  • Cap healing at two cycles.
  • Reuse snapshots for locator correction.

Limitations

  • Recon requires a running accessible application.
  • Heal cycle is capped intentionally.
  • Recon snapshots are ephemeral (do not commit).

Getting started

Prerequisites:

  • Node.js 18+
  • Playwright
  • npm install -g @playwright/cli

Quick Setup:

  1. Write methodology
  2. Add reference POM + spec
  3. Create validation checklist
  4. Build thin agent orchestrator
  5. Provide test case → agent does the rest

Key points

  • Skills encode methodology, not prompts
  • playwright-cli ensures live DOM accuracy
  • Checklist provides self-review
  • Heal loop provides controlled recovery
  • Standards propagate automatically
  • Agent is thin; Skill holds expertise

Conclusion

The real bottleneck in AI test automation is not tool access — it’s encoded knowledge.
Skills allow teams to encode their standards, architecture, and quality rules into a versioned, evolving asset that AI agents follow precisely.

Skills turn AI from a code generator into a production-grade test developer — consistent, validated, and aligned with your team’s conventions from the very first run.


About the author

Senior Manager | India
Akbar is a Senior Automation Architect at Sogeti, driving innovation through open-source automation frameworks and GenAI-led test strategies. He has led multiple PoCs, crafted scalable automation assets, and aligned testing solutions with business goals.

Leave a Reply

Your email address will not be published. Required fields are marked *

Slide to submit