This page is part of a multi-part series on Model-Agnostic Meta-Learning. If you are already familiar with the topic, use the menu on the left side to jump straight to the part that interests you. Otherwise, we suggest you start here.
If you tried the exercise above, you have undoubtedly received a very high accuracy score. Even though you likely have never seen some of the characters, you can classify them given only a single example, potentially without realizing that what you are able to do off the top of your head would be pretty impressive to an average deep neural network.
In this article, we give an interactive introduction to
model-agnostic meta-learning (MAML)
It is well known in the machine learning community that models must be trained with a large number of examples before meaningful predictions can be made for unseen data.
However, we do not always have enough data available to
cater to this need: A sufficient amount of data may be expensive or even
impossible to acquire.
Nevertheless, there are good reasons to believe that this is not an inherent issue
of learning.
Humans are known to excel at generalizing after seeing only a few
samples
Model-agnostic meta-learning, a method commonly abbreviated as MAML, will be the central topic of this article. It has prominently emerged from research in two fields that each address one of the above requirements. While introducing these two fields to you, we will also equip you with the most important terms and concepts we will need along the rest of the article.
While clearly, one sample is not enough for a model without prior knowledge, we can pretrain models on tasks that we assume to be similar to the target tasks. The idea in its core is to derive an inductive bias from a set of problem classes to perform better on other, newly encountered, problem-classes. This similarity assumption allows the model to collect meta-knowledge not obtained from a single task but the distribution of tasks. The learning of this meta-knowledge is called "meta-learning".
Achieving rapid convergence of machine learning models on a few samples is known as
"few-shot learning". If you are presented with \(N\) samples and are expected to learn a classification
problem with \( M \)
classes, we speak of an \( M \)-way-\(N\)-shot problem.
The small exercise from the beginning, which we offer either as a \(20\)- or \(5\)-way-1-shot problem, is a
prominent example of a few-shot learning task, whose symbols are
taken from
the Omniglot dataset
Having set the scene, we can now dig into MAML and its variants. Continue reading on the next page to find out why MAML is called "model-agnostic" or go straight to an explanation of MAML.
Luis Müller implemented the visualization of MAML, FOMAML, Reptile and the Comparision. Max Ploner created the visualization of iMAML and the svelte elements and components. Both wrote the introduction together and contributed most of the text of the other parts. Thomas Goerttler came up with the idea and sketched out the project. He also wrote parts of the manuscript and helped with finalizing the document. Klaus Obermayer provided feedback on the project.
† equal contributors