CHALLENGES IN TESTING GENERATIVE AI: A QUALITY ENGINEERING PERSPECTIVE

August 7, 2025

Narendra Vaishampayan

Generative Artificial Intelligence (Gen AI) has rapidly evolved from a research novelty to a core component of enterprise solutions. However, while its capabilities are impressive, the assumption that testing Gen AI is straightforward is a significant misconception. Unlike traditional systems, Gen AI does not operate on deterministic logic. It generates outputs based on probabilistic models, training data, and user input—making quality engineering a uniquely complex challenge.

This article explores the nuanced difficulties in testing Gen AI systems, particularly those powered by Large Language Models (LLMs), and outlines why conventional testing approaches fall short.

Image generated with AI using Microsoft Copilot. Copyright © 2025 Narendra Vaishampayan.

The Nature of Gen AI Output

Gen AI systems are designed to always produce an output. This output is influenced by:

The quality and structure of the input prompt,
The training data and fine-tuning applied to the model,
The underlying algorithms and model architecture.

Even when the input remains constant, the model may generate different responses across sessions. This behaviour is often intentional. Models are designed to assume that repeated same inputs indicate dissatisfaction with previous outputs. While this enhances user experience, it introduces significant variability that complicates testing.

Key Testing Challenges

1. Output Consistency

Problem: Repeated inputs do not guarantee identical outputs.

Gen AI models often interpret repeated prompts as a signal to diversify responses. This leads to inconsistency, which is problematic for regression testing and automation. Traditional test cases that rely on fixed expected outputs become unreliable.

2. Output Variety

Problem: Multiple valid outputs for the same input.

Gen AI’s strength lies in its ability to generate diverse, contextually appropriate responses. However, this variety makes it difficult to define a single “correct” output. Manual validation becomes time-consuming and subjective.

3. Excessive Detail in Output

Problem: LLMs often generate overly detailed responses.

While detailed outputs are beneficial in many contexts, they can be counterproductive in agile development environments. For working prototypes, excessive detail introduces noise, increases review time, and complicates validation.

4. Machine Learning of LLMs

Problem: Output quality is heavily dependent on the model’s training and fine-tuning.

If the model has been trained with domain-specific data, it may perform well. Without this, the model may rely on general knowledge or assumptions, leading to hallucinations or irrelevant outputs.

Rethinking Quality Engineering for Gen AI & Sogeti approach with Gen AI Amplifier

Testing Gen AI requires a paradigm shift. Instead of validating deterministic outputs, testers must evaluate:

Intent alignment: Does the output match the user’s intent?
Semantic accuracy: Is the response factually and contextually correct?
Behavioural consistency: Does the model behave predictably across similar inputs?

This calls for new tools, metrics, and frameworks that are designed for probabilistic systems. Techniques such as prompt templating, output clustering, and embedding-based comparisons are becoming essential in modern Gen AI testing.

Sogeti has developed the Gen AI Amplifier to support clients with over 40 use cases in software delivery and quality engineering. Already deployed across 25+ organizations, the solution is specifically designed to address the unique challenges of quality engineering in the context of generative AI

Conclusion

Generative AI is transforming how we build and interact with software. However, its non-deterministic nature introduces a new class of testing challenges that cannot be addressed with traditional methods. Quality engineering teams must adapt by embracing new strategies, tools, and mindsets to ensure reliability, consistency, and trust in Gen AI systems. Do reach out to us to understand how we systematically addressed challenges in Testing of AI & our Gen AI Amplifier capabilities.

About the author

Narendra has excellent track record for 20+ years across Business development, Program management, Consulting roles. He has led large business transformation programs. He managed large teams across locations. Heading Presales team & GTM for Sogeti India QET Practice.

Generative AI

Cloud

Testing

Artificial intelligence

Security

CHALLENGES IN TESTING GENERATIVE AI: A QUALITY ENGINEERING PERSPECTIVE

August 7, 2025

The Nature of Gen AI Output

Key Testing Challenges

Rethinking Quality Engineering for Gen AI & Sogeti approach with Gen AI Amplifier

Conclusion

About the author

Related posts

Sogeti’s QX Day 2025 RAIsing the Quality Game!

LLMs speak in signs too: bridging the communication gap

Executive Summit ’25 – The Bermuda Triangle of Agentic AI

Spice Up Your Testing: Choosing the Right AI Tools

Executive Summit ’25 – Intelligence at a Crossroad

The role of women in shaping Ethical AI

Chapter 35 Overview: Quality Coaching with a GenAI Twist

Executive Summit ’25 – Welcome: Setting the Human Tone by Christophe Bonnard

The Generalist/Specialist Conundrum

Search, Prompts and Possibilities

Leave a Reply Cancel reply

Generative AI

Cloud

Testing

Artificial intelligence

Security

The Nature of Gen AI Output

Key Testing Challenges

Rethinking Quality Engineering for Gen AI & Sogeti approach with Gen AI Amplifier

Conclusion

About the author

Narendra Vaishampayan

Director | India

Related posts

Sogeti’s QX Day 2025 RAIsing the Quality Game!

LLMs speak in signs too: bridging the communication gap

Executive Summit ’25 – The Bermuda Triangle of Agentic AI

Spice Up Your Testing: Choosing the Right AI Tools

Executive Summit ’25 – Intelligence at a Crossroad

The role of women in shaping Ethical AI

Chapter 35 Overview: Quality Coaching with a GenAI Twist

Executive Summit ’25 – Welcome: Setting the Human Tone by Christophe Bonnard

The Generalist/Specialist Conundrum

Search, Prompts and Possibilities

Leave a Reply Cancel reply