What is Machine Learning?

I'm starting a Twitter series on #FoundationsOfML. Today, I want to answer this simple question.

โ“ What is Machine Learning?

This is my preferred way of explaining it... ๐Ÿ‘‡๐Ÿงต

Machine Learning is a computational approach to problem-solving with four key ingredients:

  • 1๏ธโƒฃ A task to solve T
  • 2๏ธโƒฃ A performance metric M
  • 3๏ธโƒฃ A computer program P
  • 4๏ธโƒฃ A source of experience E

You have a Machine Learning solution when:

๐Ÿ”‘ The performance of program P at task T, as measured by M, improves with access to the experience E.

That's it.

Now let's unpack it with a simple example ๐Ÿ‘‡

Let's say the task is to play chess โ™Ÿ๏ธ.

A good performance metric could be, intuitively, the number of games won against a random opponent.

๐Ÿ‘‰ The "classic" approach to solve this problem is to write a computer program that encodes our knowledge of what a "good" chess player is.

๐Ÿค” This could be in the form of a huge number of IF/ELSEs for a bunch of classic openings and endings, plus some heuristics to play the mid-game, possibly based on assigning points to each piece/position and capturing the pieces with the highest points.

And this works, but...

It is tremendously difficult to, first, build, and then, maintain that program as new strategies are discovered. And we'll never know if we're playing the optimal strategy.

Now here is the Machine Learning approach to this problem ๐Ÿ‘‡

You have to ask yourself first: is there a source of experience from which one can reasonably learn to play chess?

๐Ÿ‘‰ For instance, a huge database of world-class chess games?

With that experience at hand, how do we actually code a Machine Learning program?

Details vary, but the bottom line is always the same.

๐Ÿ”‘ Instead of directly coding the program P that plays chess, what we write is kind of a meta-program, or "trainer", call it Q, that will itself give birth to P, by using that source of experience.

To do that, we have to predefine some sort of "mold" or "template" out of which P will come out.

๐Ÿ™ƒ As a simple example, let's assume there are some scores we can assign to each piece/position so we can compute the "value" of any given board.

So P will be a very simple program:

  • Generate every possible board after the current one, applying all valid moves.
  • For each board, compute its value using those (still unknown) scores.
  • Return the move that leads to the highest valued board.

The question is, of course, how do actually find the optimal program P? That is, how do we discover that assignment of scores that leads to optimal gameplay?

โญ We will write another program Q to find them!

โ“ How do we know we found the best P?

Here is where the metric M comes at hand. The best chess program P is the one whose score distribution makes it play such that it wins the most number of games.

โ“ And how do we actually find those points?

The easiest way to do it is to simply enumerate all possible instances of P, by trying all combinations of scores for all possible piece/position configurations.

๐Ÿ’ฉ But this might take forever!

A better approach is to use a bit of clever math.

๐Ÿคฏ If we design those scores the right way, we can come up with sort of an equation system, where all those scores are variables, and we can very quickly find the values that give us the optimal P!

And here is where the experience comes to play.

๐Ÿค” To write that equation system, which is huge, we can use each board in each gameplay as a different equation, that basically says "this board is a winning board, so it should sum 100" or "this board is a losing board, so it should sum 0".

โš—๏ธ After this, there is a piece of mathematical magic that tells us how we should assign the scores, such that the vast majority of "winning boards" sum close to 100 and the "losing boards" sum close to 0.

And we just made a machine "learn" how to play chess! To summarize...

๐ŸŽฉ In a "classic" approach we would:

  • Define a desired output, i.e., the best move.
  • Think very hard about the process to compute that output.
  • Write the program P that produces the output.

๐Ÿค– In a Machine Learning approach, instead we:

  • Assume there is a "template" that any possible program P follows, parameterized with some unknown values.
  • Write a program Q that finds the best values according to some experience E.
  • Run Q on E to find the best program P.

In conclusion, there is a BIG paradigm shift in the Machine Learning approach.

๐ŸŒŸ Instead of directly writing a program P to solve task T, you actually code a "trainer" program Q that, when run on suitable experience E, finds the best program P (according to some metric M).

๐Ÿ”ฅ The reason this paradigm is so hot now, is because there is an incredible amount of tasks for which we don't know how to write P directly, but it's fairly straightforward how to write Q, provided we have enough experience (read: data) to train on.

๐Ÿ“ In ML lingo, a "template" for P is called a "model" or a "hypothesis space", and the actual instance of P, after training, is called the "hypothesis".

Q is any one of a large number of Machine Learning algorithms: decision trees, neural networks, naive Bayes...

โŒ› Next time, we'll talk about the different flavours of "experience" we can have, and how they define what type of "learning" we can actually attempt to do.