An Interactive Introduction to Model-Agnostic Meta-Learning

Exploring the world of model-agnostic meta-learning (MAML) and its variants.

What you have in front of you is a 5- or 20-way-1-shot problem (classification of 5 or 20 classes, given only one sample to learn), one that most conventional machine learning systems struggle to solve. To classify a sample (top), drag it to or click on the desired class (bottom) and see if you can do better. Use the drop-down menu on the top right to switch between 5-way and 20-way, deciding the number of classes of the problem.

MAML learns tasks like the ones above by acquiring meta-knowledge about similar problems.

This page is part of a multi-part series on Model-Agnostic Meta-Learning. If you are already familiar with the topic, use the menu on the left side to jump straight to the part that interests you. Otherwise, we suggest you start here.

If you tried the exercise above, you have undoubtedly received a very high accuracy score. Even though you likely have never seen some of the characters, you can classify them given only a single example, potentially without realizing that what you are able to do off the top of your head would be pretty impressive to an average deep neural network.

In this article, we give an interactive introduction to model-agnostic meta-learning (MAML), a well-establish method in the area of meta-learning. Meta-learning is a research field that attempts to equip conventional machine learning architectures with the power to gain meta-knowledge about a range of tasks to solve problems like the one above on a human level of accuracy.

Getting Started

It is well known in the machine learning community that models must be trained with a large number of examples before meaningful predictions can be made for unseen data. However, we do not always have enough data available to cater to this need: A sufficient amount of data may be expensive or even impossible to acquire. Nevertheless, there are good reasons to believe that this is not an inherent issue of learning. Humans are known to excel at generalizing after seeing only a few samples . It should, however, also be noted that humans do not learn novel concepts "in a vacuum" but are based on a lot of prior knowledge having learned in other (similar) tasks. Enabling machine learning methods to achieve the same brings us a step closer to learning on humans' data- and energy-efficiency level. Consequently, we would require algorithms to do the following two things, already successfully implemented in humans:

(a) Obtaining as much prior knowledge about the world as possible and
(b) using that to generalize well on only a few samples.

Model-agnostic meta-learning, a method commonly abbreviated as MAML, will be the central topic of this article. It has prominently emerged from research in two fields that each address one of the above requirements. While introducing these two fields to you, we will also equip you with the most important terms and concepts we will need along the rest of the article.

(a) Obtaining Prior Knowledge

While clearly, one sample is not enough for a model without prior knowledge, we can pretrain models on tasks that we assume to be similar to the target tasks. The idea in its core is to derive an inductive bias from a set of problem classes to perform better on other, newly encountered, problem-classes. This similarity assumption allows the model to collect meta-knowledge not obtained from a single task but the distribution of tasks. The learning of this meta-knowledge is called "meta-learning".

(b) Generalization on a Few Samples

Achieving rapid convergence of machine learning models on a few samples is known as "few-shot learning". If you are presented with \(N\) samples and are expected to learn a classification problem with \( M \) classes, we speak of an \( M \)-way-\(N\)-shot problem. The small exercise from the beginning, which we offer either as a \(20\)- or \(5\)-way-1-shot problem, is a prominent example of a few-shot learning task, whose symbols are taken from the Omniglot dataset Omniglot contains 1623 different characters across 50 alphabets, with each character being represented by 20 instances, each drawn by a different person. Excerpt from the omniglot characters Image credit to https://github.com/brendenlake/omniglot/blob/master/omniglot_grid.jpg . It contains 1623 different characters across 50 alphabets, with each character being represented by 20 instances, each drawn by a different person. Because of that, the original authors of the Omniglot dataset described it as a "transpose" of the well-known MNIST dataset , with MNIST containing only a few classes (the digits 0 to 9) and many instances and Omniglot containing a lot of classes but only a few instances for each.

Having set the scene, we can now dig into MAML and its variants. Continue reading on the next page to find out why MAML is called "model-agnostic" or go straight to an explanation of MAML.

Author Contributions

Luis Müller implemented the visualization of MAML, FOMAML, Reptile and the Comparision. Max Ploner created the visualization of iMAML and the svelte elements and components. Both wrote the introduction together and contributed most of the text of the other parts. Thomas Goerttler came up with the idea and sketched out the project. He also wrote parts of the manuscript and helped with finalizing the document. Klaus Obermayer provided feedback on the project.

† equal contributors