Generative vs Discriminative

This is a Twitter series on #FoundationsOfML.

❓ Today let's look at two fundamental modelling paradigms that are used throughout the whole ML landscape.

Let's dive into Generative vs Discriminative models...

πŸ‘‡πŸ§΅ 1/20


Say we want to learn to recognize 🐢dogs and 😺cats.

Let's not worry about input format right now and instead think in terms of abstract features, like, does it have pointy ears?

There are at least two ways in which we can do it.

πŸ‘‡ 2/20


1️⃣ We can try to learn what is a dog and what is a cat, independently.

That is, which are the fundamental characteristics that best define each one of those classes.

πŸ‘‡ 3/20


πŸ• For example, we can learn that dogs have (generally) four legs, large noses, cute eyes, round ears, fur, and long tongues.

🐈 Similarly, we can learn that cats have (generally) four legs, small noses, sneaky eyes, pointy ears, fur, and short tongues.

πŸ‘‡ 4/20


To classify a new animal 😼 we can then look at its features, and say:

❓ If this is a dog, what are the odds of seeing this kind of fur, this kind of legs, this kind of ears, ...

πŸ‘‡ 5/20


Likewise, we can ask:

❓ If this is a cat, what are the odds of seeing this kind of fur, this kind of legs, this kind of ears, ...

Then we compare how surprised we are between seeing a πŸ• and 🐈 with these specific features.

πŸ‘‡ 6/20


πŸ”Ά This is a Generative Model.

These models learn what are the fundamental features of a given class. Formally, these models estimate what is the probability P(f1,f2,...,C) of observing these specific features f1, f2, ..., in any given class C.

πŸ‘‡ 7/20


The reason they are called Generative is that they try to learn explicitly how an example of a given class is made.

πŸ’‘ You can often use these models to generate random examples of each class, by sampling from P(fi...|C).

πŸ‘‡ 8/20


2️⃣ Alternatively, we can try to learn directly what makes a dog different from a cat.

That is, which are the fundamental characteristics that best discriminate between those classes.

πŸ‘‡ 9/20


We can learn that the larger the nose or the tongue the more likely to be a πŸ•, or that the pointier the ears and the sneakier the eyes, the more likely it is to be a 🐈.

And we do not care, for example, about the number of legs or the fur.

πŸ‘‡ 10/20


To classify, we look at the features and say:

❓ Given these ears, and this nose, and these eyes, how likely is this to be a dog or a cat?

We compare the two results and answer the one we are most confident about.

πŸ‘‡ 11/20


πŸ”Ά This is a Discriminative Model.

These models learn which are the features that separate different classes. Formally, they are estimating explicitly what is the probability P(C|f1,f2,...) of seeing class C given that we observe the features f1, f2, ...

πŸ‘‡ 12/20


The reason they're called Discriminative is that they try to learn what makes a class different from all others.

πŸ’‘ You can often use these models to compute feature importance by looking at which features best separate different classes.

πŸ‘‡ 13/20


πŸ”ΉA classic example of a generative model is Naive Bayes.

πŸ”ΉA classic example of a discriminative model is Logistic Regression (and most neural networks).

πŸ‘‡ 14/20


❓ Does this difference matter?

If you mostly care about performance, then no, there is no intrinsically best modelling paradigm, and only experimentation can tell you what to use.

However, depending on how you want to use the model, it can matter.

πŸ‘‡ 15/20


Discriminative models are better at answering why (they think) this is the correct answer than generative models.

They will focus on the important features for the task, and disregard anything that doesn't help them to score better at answering.

πŸ‘‡ 16/20


Discriminative models learn not what we want, but what's useful, which can often be something completely off-track, like spurious correlations or harmful biases in the training set.

πŸ‘‡ 17/20


Generative models often encode stronger inductive biases because they represent a hypothesis (ours) about how the data is created.

This can make them more robust and controllable but if that hypothesis is too far from reality they may not learn anything useful.

πŸ‘‡ 18/20


⭐ As usual, there is no silver bullet. You need to ask the right questions and be mindful of your assumptions.

⚑ And always test your hypotheses!

πŸ‘‡ 19/20


πŸ”– You can read this thread online at https://apiad.net/tweetstorms/ml/generative-vs-discriminative/.

❀️ If you liked this thread, please consider retweeting, following, and liking it, if you think I've earned it. And make sure to read the whole #FoundationsOfML series. It starts here:

https://twitter.com/AlejandroPiad/status/1348840452670291969