design a finite state models for how to make pizza
How to make a Pizza with Deep Learning
Can a deep neural network learn how to cook, when only given a picture of a delicious meal? New Deep Learning research from MIT suggests so!
Their recently released research titled How to make a pizza:
Learning a compositional layer-based GAN model explores how a GAN model can be trained to recognise the steps involved in making a pizza. Their PizzaGAN comes in 2 parts:
(1) Given an input image of a pizza, PizzaGAN is trained to predict what toppings the pizza has on it
(2) Given an in p ut image of a pizza, PizzaGAN can apply an ordered set of models to the image, where each model adds or removes a topping from the pizza
What makes up a pizza?
Before trying to train a deep neural network to make a pizza, we'll first need to figure out how to make a pizza ourselves.
Like any great recipe, the process of making a pizza is comprised of a set of ordered steps. You always start with the dough, sauce, and cheese, and then move on to adding other more adventurous toppings. This sequential process is reflected in how the pizza looks at each step of the way — its visual appearance changes with each added topping.
Once our target process is well-defined, we can begin to train an actual model that can approximate each of these steps.
For example, let's say that we start out with a good'ol pepperoni pizza. Our friend then comes up to us and says "hey, let's add olives!" We can model the process of going from our original pizza to our new one as a series of steps:
(1) Recognise our current state — pepperoni pizza
(2) Apply a change that gets us to our target state — add olives
After adding the olives, another friend might say: "I don't like pepperoni, let's use ham!" This time we have 3 steps:
(1) Recognise our current state — pepperoni and olives pizza
(2) Apply the first change that gets us closer to our target state — remove pepperoni
(3) Apply the second change that gets us to our target state — add ham
To learn how to build pizzas, the PizzaGAN neural network attempts to model all of these steps.
How a GAN can make a pizza
Dataset
The pizza dataset used to train PizzaGAN is composed of 9,213 images, each showing a single pizza. Each image has a set of corresponding labels which describe the toppings that the pizza has on it, excluding the dough, sauce, and base cheese. For example, if the pizza image has ham and mushrooms on it, the labels of that image are:
["ham", "mushrooms"]
When performing the training, the output classifications are one-hot encoded. Thus, with a ham and mushrooms pizza, the ham and mushrooms elements of the output vector are set to 1.0 while the rest of the elements are set to 0.0.
Generator network — adding and removing toppings
Recall that we want to be able to model the building of our pizza as a set of sequential steps. Thus, whatever network is trained must be able to perform a single step at a time — add one topping, remove one topping, cook the pizza, etc.
To that end, a generator network is trained to model the adding or removing of each topping. Given an input image of a pizza, the generator predicts an output image of a pizza as if we added or removed one topping.
Since the generator is trained for one topping at a time, and for only either adding or removing, multiple generator networks are trained, two for every pair of different topping sets (one for adding and one for removing in each pair). An example of a pair of PizzaGAN generators — one to add pepperoni and one to remove it are shown below.
The cheese pizza has a 0 for it's entire classification vector while the pepperoni pizza has all 0s except for the pepperoni index, which is 1.0. Since the difference between the input and output images of a PizzaGAN generator is always only one topping, it follows that the difference of the sum of the classification vector elements of the input and output label vectors is also 1.
Discriminator — recognising pizzas
The PizzaGAN generators cover all of the adding and removing of toppings on the pizza. The discriminator will take care of recognising what toppings are actually on the pizza currently.
Given an input image of a pizza, the discriminator networks predicts a set of multi-label classifications. Each element of the output vector corresponds to a particular topping.
For example, in the figure below a PizzaGAN discriminator predicted that the image of the pizza had pepperoni, mushrooms, and olives . The elements of the output vector corresponding to those toppings were predicated as 1.0 at inference (or some value above the user set threshold).
GAN models are usually trained by performing the training of the generator and discriminator together. The discriminator model is trained with some of the outputs of the generator model and the loss of the discriminator model from it's predictions is used in the training of the generator model.
PizzaGAN also follows this training scheme. In addition to predicting the labels of the pizza image, the discriminator also predicts whether the image is real or comes from a generator. This helps the generator create images that still look like real pizza images and to have all of the correct toppings.
Resulting Pizzas
With the discriminator predicting the toppings on the pizza and the generators having the ability to add and remove toppings, PizzaGAN is able to build and decompose images of pizzas with pretty strong accuracy.
If you'd like to read some more details about how PizzaGAN works, I'd recommend checking out the original paper, published at CVPR 2019!
Beyond that, I leave you with this wonderful quote from the paper:
Pizza is the most photographed food on Instagram with over 38 million posts using the hashtag #pizza.
design a finite state models for how to make pizza
Source: https://towardsdatascience.com/how-to-make-a-pizza-with-deep-learning-f3548e249dc9