AutoML

Today is #TechnicalTuesday πŸ€“!

Let's talk about practical technologies that you can use today.

In this thread I will tell you about AutoML πŸ§΅πŸ‘‡


AutoML stands for Automated Machine Learning.

It encompasses a bunch of technologies and paradigms to gradually automate the process of creating machine learning solutions.

πŸ’‘ AutoML is about raising the abstraction level in ML and reducing the grunt work.


❓ What can AutoML do today?

Getting a machine learning solution to work takes a few steps:

  • 1️⃣ collecting data
  • 2️⃣ sanitizing that data
  • 3️⃣ finding the best model
  • 4️⃣ training that model
  • 5️⃣ and beyond, actually build the product❗

Most current AutoML frameworks today focus on 3️⃣, i.e., helping you select among the plethora of machine learning models which is the best for your problem.

This problem is often framed in terms of:

  • πŸ…°οΈ model selection
  • πŸ…±οΈ hyperparameter optimization

πŸ…°οΈ Model selection is about deciding, e.g., if logistic regression, decision trees or SVM is better, or whether to encode with word2vec or TF-IDF.

The "manual" way of doing this is to actually try each algorithm a bunch of times in your data and collect some statistics.


πŸ…±οΈ Hyperparameter optimization is about selecting the exact value for each tunable thing in your algorithm.

  • How many neurons?
  • How much dropout?
  • Which activation function?
  • Which regularization factor?
  • ...

If you combine both problems, then you realize there are literally thousands (and potentially infinite) different algorithms you can try on your data.

If you were to do this yourself, the simplest solution is something like this:


⭐ Actually, AutoML algorithms are way smarter and faster than random search.

πŸ’‘ AutoML frames this problem as an optimization loop on top of the training loop and applies a lot of clever optimization tricks.


πŸ‘‰ AutoML frameworks hide away all that complexity behind an interface that looks as if you are training a single model, but it is ultimately doing the search and optimization loop under the hood.

Let's see a couple of examples πŸ‘‡:


❀️ Auto-Sklearn is an AutoML framework compatible with scikit-learn.

πŸ”— https://automl.github.io/auto-sklearn/master/

You can basically replace standard scikit-learn code with a generic Auto-Sklearn classifier and suddenly you are evaluating thousands of models:


❀️ Auto-Keras is an AutoML framework specifically designed for deep learning with Keras.

πŸ”— https://autokeras.com/

Instead of manually designing a neural network, you can use Auto-Keras predefined "meta-models" and it will take care of finding the best architecture:


Yeah, I know 🀯!

And AutoML is much more than model selection and hyperparameter search. It can also include automating:

  • data preprocessing
  • feature engineering
  • feature selection
  • dataset augmentation
  • model distillation
  • and more...

πŸ”‘ If you are working on a practical problem today there is no reason not to use AutoML.

πŸ”‘ Even if you are working on research, AutoML will make you more productive by taking care of the dumb tasks and letting you focus on the important parts.


❗However, this is no silver bullet.

There are a lot of challenges to make AutoML production-ready. Data cleaning is a major bottleneck still, far from automated. And we need to understand how these methods exacerbate data bias.


Finally, if you are feeling adventurous, you can try @auto_goal, an experimental AutoML framework that goes beyond "standard" AutoML.

⭐ Check it out in https://autogoal.github.io!


As usual, if you like this topic, reply in this thread or @ me at any time. Feel free to ❀️ like and πŸ” retweet if you think someone else could benefit from knowing this stuff.

🧡 Read this thread online at https://apiad.net/tweetstorms/technicaltuesday-automl


Stay curious πŸ––


πŸ—¨οΈ You can see this tweetstorm as originally posted in this Twitter thread.