MULTIMODAL ANNOTATION AS A LEVER FOR TRAINING LARGE LANGUAGE MODELS

September 8, 2025

Robin Heckenauer

The rapid development of Large Language Models (LLMs) has profoundly reshaped the paradigms of contemporary artificial intelligence. These models, capable of processing and generating content across multiple modalities (text, audio, image, video), require vast amounts of precisely annotated data for effective training. Annotation, as the process of structuring and contextualizing raw data, represents a critical step in building robust, diverse, and representative training corpora.

In this context, we have developed a proof of concept for a multimodal annotation application, designed to meet the growing demands of heterogeneous data processing. This application features a modular architecture that allows dynamic adaptation of both the number and type of data channels to be annotated. It supports multiple synchronous or asynchronous modalities, including time series data (e.g. from biomedical or environmental sensors), audio streams (e.g. speech, ambient sounds), video recordings (e.g. behaviors, facial expressions, gestural interactions), and textual data (e.g., transcriptions, metadata, semantic annotations).

One illustrative use case of this application is the multimodal analysis of complex clinical situations. For instance, annotating recordings of patients in hospital settings, connected to physiological monitoring devices—enables the integration of vital signals (e.g. ECG, respiratory rate), verbal interactions, and observable behaviors captured on video.

Example of annotation on the EAV dataset ¹ using the multimodal annotation application developed internally at SogetiLabs:

This integrated approach offers a more refined and contextualized understanding of the observed phenomena by combining complementary data modalities. It facilitates the identification of complex correlations that are often inaccessible through unimodal analysis, thereby paving the way for significant advancements in fields such as augmented medicine, applied research, and specialized training.

Lee, Min-Ho, Adai Shomanov, Balgyn Begim, Zhuldyz Kabidenova, Aruna Nyssanbay, Adnan Yazici, and Seong-Whan Lee. “EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts.” Scientific data 11, no. 1 (2024): 1026. https://www.nature.com/articles/s41597-024-03838-4 ↩︎

About the author

Robin Heckenauer is an AI researcher with a career spanning both academia and industry. In 2024, Robin joined SogetiLabs as an R&D Project Manager, where he leads a team working on cutting-edge AI projects, including pain expression recognition.

Generative AI

Cloud

Testing

Artificial intelligence

Security

MULTIMODAL ANNOTATION AS A LEVER FOR TRAINING LARGE LANGUAGE MODELS

September 8, 2025

About the author

Related posts

A review of Matthew Ball's book The Metaverse: How It Will Revolutionise Everything

Executive Summit ’25 – Lessons from the history of Automata by Franziska Kohlt

Reactive Content: The Night of the Living Content

10 smart ways to accelerate your learning using AI

LLMs speak in signs too: bridging the communication gap

Executive Summit ’25 – The Bermuda Triangle of Agentic AI

Exploited by Proxy

Emerging Trends: Decentralized Control and Collaborative Governance

UX for AIs, Not Just Humans

The role of women in shaping Ethical AI

Leave a Reply Cancel reply

Generative AI

Cloud

Testing

Artificial intelligence

Security

About the author

Robin Heckenauer

R&D Project Manager | France

Related posts

A review of Matthew Ball's book The Metaverse: How It Will Revolutionise Everything

Executive Summit ’25 – Lessons from the history of Automata by Franziska Kohlt

Reactive Content: The Night of the Living Content

10 smart ways to accelerate your learning using AI

LLMs speak in signs too: bridging the communication gap

Executive Summit ’25 – The Bermuda Triangle of Agentic AI

Exploited by Proxy

Emerging Trends: Decentralized Control and Collaborative Governance

UX for AIs, Not Just Humans

The role of women in shaping Ethical AI

Leave a Reply Cancel reply