MULTIMODAL ANNOTATION AS A LEVER FOR TRAINING LARGE LANGUAGE MODELS

September 8, 2025

Robin Heckenauer

The rapid development of Large Language Models (LLMs) has profoundly reshaped the paradigms of contemporary artificial intelligence. These models, capable of processing and generating content across multiple modalities (text, audio, image, video), require vast amounts of precisely annotated data for effective training. Annotation, as the process of structuring and contextualizing raw data, represents a critical step in building robust, diverse, and representative training corpora.

In this context, we have developed a proof of concept for a multimodal annotation application, designed to meet the growing demands of heterogeneous data processing. This application features a modular architecture that allows dynamic adaptation of both the number and type of data channels to be annotated. It supports multiple synchronous or asynchronous modalities, including time series data (e.g. from biomedical or environmental sensors), audio streams (e.g. speech, ambient sounds), video recordings (e.g. behaviors, facial expressions, gestural interactions), and textual data (e.g., transcriptions, metadata, semantic annotations).

One illustrative use case of this application is the multimodal analysis of complex clinical situations. For instance, annotating recordings of patients in hospital settings, connected to physiological monitoring devices—enables the integration of vital signals (e.g. ECG, respiratory rate), verbal interactions, and observable behaviors captured on video.

Example of annotation on the EAV dataset ¹ using the multimodal annotation application developed internally at SogetiLabs:

This integrated approach offers a more refined and contextualized understanding of the observed phenomena by combining complementary data modalities. It facilitates the identification of complex correlations that are often inaccessible through unimodal analysis, thereby paving the way for significant advancements in fields such as augmented medicine, applied research, and specialized training.

Lee, Min-Ho, Adai Shomanov, Balgyn Begim, Zhuldyz Kabidenova, Aruna Nyssanbay, Adnan Yazici, and Seong-Whan Lee. “EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts.” Scientific data 11, no. 1 (2024): 1026. https://www.nature.com/articles/s41597-024-03838-4 ↩︎

About the author

Robin Heckenauer is an AI researcher with a career spanning both academia and industry. In 2024, Robin joined SogetiLabs as an R&D Project Manager, where he leads a team working on cutting-edge AI projects, including pain expression recognition.

Generative AI

Cloud

Testing

Artificial intelligence

Security

MULTIMODAL ANNOTATION AS A LEVER FOR TRAINING LARGE LANGUAGE MODELS

September 8, 2025

About the author

Related posts

Did my Fine-tuning work? A practical guide to evaluating LLMs

The Knowledge of the Ancient

Choosing the Right Lens: A Clear Guide to Breast Cancer Imaging Technologies

Agentic design patterns – Core patterns in action

AI Meets Quantum Cryptography: Securing the Future of Intelligence

As Data Scientists, what should we prepare for in the future AI-driven world?

Agentic design patterns – Why they matter

Revolutionizing Medical Imaging with Machine Learning

Artificial Intelligence in SAFe

My reflections on 'The Coming Wave' by Mustafa Suleyman

Leave a Reply Cancel reply

Generative AI

Cloud

Testing

Artificial intelligence

Security

About the author

Robin Heckenauer

R&D Project Manager | France

Related posts

Did my Fine-tuning work? A practical guide to evaluating LLMs

The Knowledge of the Ancient

Choosing the Right Lens: A Clear Guide to Breast Cancer Imaging Technologies

Agentic design patterns – Core patterns in action

AI Meets Quantum Cryptography: Securing the Future of Intelligence

As Data Scientists, what should we prepare for in the future AI-driven world?

Agentic design patterns – Why they matter

Revolutionizing Medical Imaging with Machine Learning

Artificial Intelligence in SAFe

My reflections on 'The Coming Wave' by Mustafa Suleyman

Leave a Reply Cancel reply