Over the past two decades, I have witnessed testing evolve at a rapid pace: from rigid, script-heavy automation to more flexible, intelligent approaches. Now, we are entering a new phase. With Agentic AI and the Model Context Protocol (MCP) coming together, we are seeing the early signs of a shift in how functional test automation can be done with less brittle scripting, more intelligent, adaptive execution.
With Agentic AI and Model Context Protocols (MCPs) gaining traction across the software development lifecycle, in this post, I’ll share some of my insights and perspectives on leveraging MCP and LLMs within test automation frameworks and how this fusion could reshape test automation.
Accelerating Functional Test Automation with Agentic AI and MCPs
Traditional functional test automation using open-source tools like Selenium, Playwright often involves rigorous coding and can become brittle over time. Even minor UI changes can break scripts, and maintaining test suites becomes a bottleneck—especially in fast-paced agile environments.
We experimented by integrating playwright MCP server in our automation framework and using GitHub Copilot to feed prompts alongside test cases to develop automation scripts and execute them. The results we observed were promising:
- The agent, powered by Copilot and orchestrated via MCP, was able to invoke the MCP server, navigate the browser, and execute test steps autonomously.
- It generated Playwright TypeScript code with almost accurate object locators from the application DOM and executed the tests end-to-end.
- This significantly reduced manual effort and showcased how agents can adapt to UI changes and infer intent without needing constant updates.
However, this is still very much experimental. While the results are encouraging, the system wasn’t always consistent: where human intervention was still required. Additionally, generating code for every run is not scalable. The next logical step is to evolve toward direct execution of tests from high-level prompts, bypassing the need for intermediate code generation.
Looking ahead, I see potential in building custom MCP implementations tailored to specific enterprise environments and tools. This could unlock more robust integrations with internal tools, CI/CD pipelines, and observability platforms, making agentic testing more production ready.
How the Architecture Comes Together
A typical Agentic AI + MCP test architecture might look like this (though it will continue to evolve):
- LLM-powered agents with memory and planning (e.g., copilots)
- MCP server to manage context, tools, and agent communication
- Tool adapters for test frameworks, APIs, databases, etc.
- A layer for logging, debugging, and governance
This architecture can be further modularized and scaled, making it ideal for integration into DevOps and CI/CD pipelines.
Where Things Stand Today
Agentic AI and MCP aren’t just buzzwords, but they represent a real shift in how we think about test automation. Based on what I’ve explored so far, the benefits are clear: faster test creation, reduced maintenance, and better adaptability to change. However, we’re still in the early stages of this journey, and like any emerging technology, it comes with trade-offs.
There could be cost implications both in terms of compute resources and the effort required to fine-tune agents for specific environments. We also compromise some level of control, as these autonomous agents make decisions that may not always align with human expectations. Reliability can be inconsistent, especially when dealing with complex UI elements or dynamic application states.
In our experiments, for example, the system struggled to consistently identify intricate UI components, and generating code for every run proved unsustainable. Additionally, MCP servers often need to be updated in tandem with tooling changes to support new features. These are not blockers, but they are important considerations as we scale.
That said, the direction is clear: we are moving towards a future where intelligent agents can execute tests directly from high-level scenarios. Code generation may just be the starting point. As these systems mature, we could see agents that interpret user stories, understand application flows, and autonomously validate functionality without even the need for traditional automation scripts
How to get started
If you are already working with enterprise copilots or have a mature automation framework, this is worth exploring. Start small:
- Identify a few stable workflows or regression scenarios.
- In your existing IDEs integrate MCPs for ex: playwright from the official Microsoft GitHub
- Build a proof of concept using an MCP server and an LLM agent.
- Measure the effort saved and the adaptability of the agent.
Even in its current state, the acceleration achieved is notable. However, to make this work at scale, we will need to figure out some solid ways to structure, govern, and evolve these systems.
Final Thoughts
As this space continues to evolve, we’re likely to see more teams building their own MCPs, fine-tuning agents for their environments, and rethinking what “automation” truly means. This isn’t just about adopting new tools—it’s about reimagining how we deliver quality in a world where software is constantly changing.
That said, it’s also important to think of what could go wrong. Agentic systems are still maturing, and there are real risks such as misinterpretation of prompts or test intents, inconsistent behaviours across environments, without proper validation could lead to missed defects or false confidence.
To mitigate these risks, we need robust validation layers, fallback mechanisms, and clear boundaries for autonomous execution.
This shift also has implications for test engineers. Rather than replacing them, it can elevate their roles such as designing intelligent workflows, focussing more on prompt engineering, scenario design, and validating agent behavior and shaping how these systems learn, adapt, and scale.
In short, Agentic AI and MCP have the potential to transform test automation from a task-driven activity into a context based intelligent process that aligns more closely with the modern software development.