Introduction
Zero-Shot Learning (ZSL) is a revolutionary approach in artificial intelligence that enables a model to recognize and classify objects or concepts it has never seen before during training.
Unlike supervised learning methods, which require a large volume of annotated data, ZSL exploits semantic representations, such as textual descriptions or relationships between categories, to make predictions about new classes. This approach is particularly useful in contexts where it is difficult, if not impossible, to obtain annotated data for all categories of interest, as in cases such as rare diseases and recently discovered species, where examples may be scarce or non-existent. Furthermore, annotating large quantities of sample data is costly and time-consuming.
This article explores the fundamentals of Zero-Shot Learning and presents several concrete applications.
Basic idea
Zero-Shot Learning is based on the idea that relationships exist between known and unknown concepts. These relationships are generally modeled using semantic spaces, where objects are represented as vector embeddings obtained from textual descriptions or other sources of information, such as visual attributes.
For example, if a ZSL model is trained to recognize animals such as cats, dogs and birds, it could be designed to identify new species such as tigers and wolves on the basis of common attributes (e.g. “has fur”, “is carnivorous”) without ever seeing labeled examples of these animals during training.
Methods
- Attribute-based methods: these use object descriptions in terms of shared attributes (e.g. color, texture, shape) to perform zero-shot classifications ;
- Transfer learning methods: architectures such as CLIP (Contrastive Language-Image Pretraining) and OWL-ViT exploit massive image and text corpora to learn correspondences between visual and semantic representations;
- Embedding-based methods: it is based on vector representations like BERT, word2vec or GloVe (Global Vectors) that can be used to reflect the features or meaning of different data points. Classification is then determined by measuring similarity between the semantic embedding of a given sample and the embeddings of the different classes it might be categorized into.
Applications
- Animal species detection
- Monitoring and safety
- Smart chatbots
- Detecting fake news
- Detection of rare diseases
- MRI image analysis
Challenges
- Reliability of semantic descriptions : ZSL accuracy depends heavily on the quality of textual embeddings and relationships between concepts.
- Model biases: Pre-trained models can be biased by the datasets on which they have been built.
- Evaluation difficulties: Benchmarks for learning from scratch are limited and do not always cover the diversity of real-life applications.
Perspectives
- Hybridization with supervised learning: approaches combining ZSL and supervised learning can improve model robustness.
- Use of large multimodal models : architectures such as GPT-4 or DINO combine vision and language to deliver enhanced zero-shot recognition performance.
- Optimization for embedded systems: Make ZSL more efficient on platforms with low computing power, such as drones or IoT devices.