About skills and why we used them for Test Automation
We (Automation COE team Sogeti India) used the Skills framework from the Anthropic open-source project (https://github.com/anthropics/skills) as the foundation for our AI-driven test automation workflow. Skills are version‑controlled, markdown‑based “knowledge modules” that encode how your automation must be generated — including methodology, naming conventions, Page Object Model rules, locators, assertions, logging, and validation criteria.
For Playwright automation, Skills act as the AI agent’s governing specification. Instead of relying on prompts or templates, the agent reads your Skill before every generation task, performs live DOM recon with playwright-cli, applies your POM architecture, enforces your validation checklist, and produces production-ready Playwright tests with zero manual code.
This blog explains how Skills solve the long‑standing problems in AI test generation and enable truly deterministic, convention-driven test automation.
Generating production Playwright Tests with zero manual code
AI can write Playwright tests. But can it write your Playwright tests — following your architecture, naming rules, and quality bar?
The gap between “AI-generated code” and “production-ready test code” is where most teams struggle.
Playwright Skills close that gap completely, converting plain-English test cases into validated, self‑healing Playwright specs with zero manual effort.
The AI Test Generation problem nobody talks about
The Hidden Cost of “Just Ask AI to Write Tests”
- Convention ignorance: AI writes raw page.click() calls, ignoring your POM.
- Hallucinated locators: Selectors invented from training data, not your DOM.
- No quality gate: Broken imports, missing screenshots, hardcoded URLs.
- One-off generation: Standards don’t propagate to previously generated tests.
- Fragile output: When tests break, AI retries randomly with no strategy.
Why prompting AI directly fails
- Requires long, detailed prompts that drift over time.
- Output is inconsistent across sessions.
- Browser tools alone cannot enforce conventions.
- Prompt templates eventually become stale.
Skills: The Game-Changing approach
What is a Skill?
A Skill is a markdown file inside your repository that encodes how your team writes tests, including:
- methodology & phases
- naming conventions
- architecture rules
- reference examples
- validation checklist
The agent reads the Skill at the start of each execution and generates code exactly as specified.
A complete Playwright Skill contains:
- Methodology (phases, rules, naming, architecture)
- Reference examples (POM + spec used as canonical templates)
- Validation checklist (binary rules for self‑review)
The agent:
- reads the methodology
- performs live DOM recon with playwright-cli
- generates page objects and specs
- self-validates using the checklist
Why this approach is revolutionary
- Zero convention drift — every file matches your engineering style.
- Live locators only — no hallucinated selectors.
- Built-in quality gate — checklist enforces standards automatically.
- Co-evolving standards — update the methodology once; all future tests will follow.
- Structured healing — predictable fix-and-rerun flow, no random retry.
From Test Case to running spec
The Old Way — Raw AI Output
A typical prompt-based AI output:
- Hardcoded URLs
- Unreliable selectors
- No page objects
- No logging or screenshots
- No load state coordination
The Skills way — Production architecture
The same test now produces:
- A 3-layer POM (atomic, composite, assertion)
- Automatic screenshots
- Phase/sub-step logging
- Clean specs calling only POM methods
- Live selectors from recon
The recon loop — Live selectors, always
This is the breakthrough:
- Agent opens the page with playwright-cli
- A DOM snapshot is generated
- Agent extracts element refs from the snapshot
- Generated code uses real locators derived from the snapshot
- Every action refreshes the snapshot
This eliminates the biggest cause of flaky AI‑generated tests: selector hallucination.
The heal loop — Structured failure recovery
If the test fails:
- Agent classifies the error
- Applies the exact fix needed
- Re-runs test (max 2 cycles)
Examples:
- Timeout → fix locator using previous snapshots
- Strict mode → apply .first() or .nth()
- Assertion mismatch → update only expected value
- Import error → fix that line only
If still failing → methodology needs an update. Not more retries.
Three assets that power skills
1. The Methodology (Phases + Rules)
Phases:
- Parse
- Recon
- Record
- Generate
Rule: Never write locators manually — only from recon output.
2. The validation checklist (Quality Gate)
Binary rules such as:
- all POMs extend base class
- no hardcoded URLs
- screenshot in all assertion methods
- composite methods must end with screenshot
- no raw page calls in spec
- console logs for all phases and sub-steps
Unambiguous ✓/✗ examples are required.
3. The Agent Orchestrator (Coordinator)
Responsibilities:
- read methodology
- generate code
- validate using checklist
- run Playwright tests
- apply healing if needed
- output final report
The Skill is the expert; the agent is the orchestrator.
Real-World impact
Input:
Plain English test case in .txt, .csv, or typed.
Output:
Fully validated, passing Playwright tests + HTML report.
Metrics:
- 0 manual code
- 0 hallucinated selectors
- Automatic validation
- 2-cycle max healing
- Your standards applied consistently
Benefits
- Instant consistency across teams
- Living standards that update instantly
- Self-reviewing output
- DOM-reliable selectors
- Phase-level logging
- Reusable expertise across agents
Implementation best practices
- Keep methodology + examples version-controlled.
- Ensure reference examples are kept updated.
- Make checklist rules binary.
- Cap healing at two cycles.
- Reuse snapshots for locator correction.
Limitations
- Recon requires a running accessible application.
- Heal cycle is capped intentionally.
- Recon snapshots are ephemeral (do not commit).
Getting started
Prerequisites:
- Node.js 18+
- Playwright
- npm install -g @playwright/cli
Quick Setup:
- Write methodology
- Add reference POM + spec
- Create validation checklist
- Build thin agent orchestrator
- Provide test case → agent does the rest
Key points
- Skills encode methodology, not prompts
- playwright-cli ensures live DOM accuracy
- Checklist provides self-review
- Heal loop provides controlled recovery
- Standards propagate automatically
- Agent is thin; Skill holds expertise
Conclusion
The real bottleneck in AI test automation is not tool access — it’s encoded knowledge.
Skills allow teams to encode their standards, architecture, and quality rules into a versioned, evolving asset that AI agents follow precisely.
Skills turn AI from a code generator into a production-grade test developer — consistent, validated, and aligned with your team’s conventions from the very first run.