This page is part of a multi-part series on Model-Agnostic Meta-Learning. If you are already familiar with the topic, use the menu on the right side to jump straight to the part that interests you. Otherwise, we suggest you start at the beginning.
In this section, we explain why MAML is "model-agnostic" and thereby gain a bit more of an overview of the meta-learning field. Metric-based and model-based approaches force constraints on either the sampling (e.g., episodic training) or the model's architecture. MAML, on the other hand, requires only one very general assumption: the model needs to be optimizable by a gradient-based optimizer. Hence, it has been introduced as "model-agnostic". But notice that MAML is still not completely free of assumptions. It is important to view the method in the context of the field to understand what really sets it apart in terms of design, assumptions, and approach, which is what we will consider on this page.
Applications of Meta-Learning outside the domain of few-shot learning
include the optimization of the task-level optimizer using
a LSTM network.
The core idea of metric-based approaches is to compare two samples in a
latent (metric) space: In this space, samples of the same class are supposed
to be close to each other, while two samples from different classes are
supposed to have a large distance (the notion of a distance makes
the latent space a metric space).
Model-based approaches are neural architectures that are deliberately designed
for fast adaption to new tasks without an inclination to overfit.
Memory-Augmented Neural Networks and MetaNet are two examples. Both employ
an external memory while still maintaining the ability to be trained
end-to-end.
MAML goes a different route: The neural network is designed the same way
your usual model might be (in the many-shot case). All the magic happens during the
optimization, which is what makes it "optimization-based".
As a consequence, unlike metric-based and model-based approaches, MAML lets
you choose the model architecture freely.
This has the great benefit of being applicable not only to
conventional supervised learning classification tasks but also
to reinforcement learning
In the following figure, you can find a selection of meta-learning methods
that tackle
few-shot
learning, their performance on Omniglot, as well as your own accuracy score from the starting page.
Next to recurrent
This figure shows the results of different methods on the Omniglot dataset. If not stated differently, you see the results of 20-way 1-shot, but some differences in the evaluation procedure exist. As usual, accuracy numbers need to be taken with a grain of salt as differences in the evaluation method, implementation, and model complexity may have a non-negligible impact on the performance.
The generative stroke model was introduced in the paper, which also
introduced the Omniglot dataset. The model is based on a latent stroke representation
(including the number and directions of strokes). While it is an
interesting approach, it can hardly be generalized to other
few-shot problems.
The same authors improved the model by learning latent
primitive motor elements and called this process "Hierarchical
Bayesian Program Learning" (HBPL).
While the accuracy was greatly increased, it also
is focused on symbol learning.
Siamese Nets consist of two identical networks which produce a latent representation.
From the representations of two samples, a distance is calculated to
assess the similarity of the two samples.
Matching Networks also work by comparing new samples to labeled
examples. They do so by utilizing an attention kernel.
Though the second version of the paper is cited here,
it was first published in 2016.
Prototypical Networks use prototype vectors to represent each class
in the metric space. The nearest neighbor (i.e., the closest prototype)
of a sample then determines the prediction.
Memory-Augmented Networks (MANNs) use external memory to make accurate
predictions using a small number of samples.
Meta Networks utilize a base learner (task level) and a meta learner
as well as a set of slow and rapid weights to allow
meta-learning and task-specific concepts.
In the next section, we will take a close look at MAML and study the math behind the method, as well as explore some of the concepts interactively.
Luis Müller implemented the visualization of MAML, FOMAML, Reptile and the Comparision. Max Ploner created the visualization of iMAML and the svelte elements and components. Both wrote the introduction together and contributed most of the text of the other parts. Thomas Goerttler came up with the idea and sketched out the project. He also wrote parts of the manuscript and helped with finalizing the document. Klaus Obermayer provided feedback on the project.
† equal contributors