Test data isn’t the best validation for your model.
We love to build data science and machine learning models by diligently cutting out test data sets, then training and creating models, and finally testing them on the test data to measure performance.
But what we’re optimizing for with that is only one certain situation, one given environment, and that specific time period.
Instead, I think great models should be:
Timeless
and universal.
Timeless in the sense that they should work on many different periods of time in much the same way. Universal in that they should work in different kinds of contexts, industries, and environments. If I have a recommendation engine and it works on my book selection, it should also work on my DVD selection! If not, I should build a better model that works on both.
The point is not, to build an omnipotent model but rather to have one that isn’t disturbed by slight changes in the environment. That is timeless and universal.