All Articles

Reallocating Training

There’s a trend in ML: we are reallocating training.

In the old days, you had to gather (and prepare) a dataset, train your model, validate it and then you were ready to go.

Then models started to get bigger - fine-tuning was introduced. The thinking process was this: let’s build a big model and train it over a vast dataset. Taking that model and fine-tuning it was the subsequent step as was a way of specializing it. Let’s say you have a model that has been trained over lots of miscellaneous images. By taking it and fine-tuning it on a specific set of images (let’s say ones representing manufacturing defects), you:

  • decrease its capabilities of distinguishing cats from dogs
  • (hopefully!) increase its accuracy in separating cracks from holes.

Now we have prompting. Things had been moving for years towards reducing how many samples were needed for fine-tuning.

Prompting is a conceptual leap in this direction: you are not using data to adjust the model’s weights: you are just giving it context.

By simplifying a lot, to obtain a usable model, we went from training to fine-tuning to prompting.

Karpathy’s prediction seems to be right1. Are there any edge cases? I can only think of two:

  • low-power applications. You are still constrained by processing capabilities, there’s no way around that. Admittedly this is a weak case: SOTA performances on embedded systems are usually achieved by training a bigger model and shrinking it afterwards (quantization, pruning, …)
  • vast amounts of private data. Training should still make sense here. But how do you quantify “vast”? And in what applications does it make sense?

In a way, models are becoming more and more accessible (it is easier to prompt than it is to fine-tune). At the same time, training a big model from scratch is now impossible for the majority of us. Less training by you, a lot more by the creators of the original model.

  1. Not surprising at all, since he is one of the brightest minds in the field in my opinion.

Published Apr 25, 2023

Mechatronics Engineer, machine learning enthusiast, busy building Compiuta.