There’s a trend in ML: we are reallocating training.
In the old days, you had to gather (and prepare) a dataset, train your model, validate it and then you were ready to go.
Then models started to get bigger - fine-tuning was introduced. The thinking process was this: let’s build a big model and train it over a vast dataset. Taking that model and fine-tuning it was the subsequent step as was a way of specializing it. Let’s say you have a model that has been trained over lots of miscellaneous images. By taking it and fine-tuning it on a specific set of images (let’s say ones representing manufacturing defects), you:
Now we have prompting. Things had been moving for years towards reducing how many samples were needed for fine-tuning.
Prompting is a conceptual leap in this direction: you are not using data to adjust the model’s weights: you are just giving it context.
By simplifying a lot, to obtain a usable model, we went from training to fine-tuning to prompting.
Karpathy’s prediction seems to be right1. Are there any edge cases? I can only think of two:
In a way, models are becoming more and more accessible (it is easier to prompt than it is to fine-tune). At the same time, training a big model from scratch is now impossible for the majority of us. Less training by you, a lot more by the creators of the original model.