Counterfactuals matter in data science
In all data-driven processes, you test out a hypothesis, use the data to decide what option is better, and then choose it.
Then you move on to the next until, after dozens of decisions, you’ve incrementally become a lot better.
Except, sometimes you get this weird feeling that this might not be true. You moved forward step by step. But who’s to say you didn’t walk in a circle?
We have no idea what would’ve happened on the other paths or the other options. We don’t know the counterfactual, the fact that it does not exist.
I know a machine learning engineer called Alan. Alan is smart about this; he wants to know the counterfactual. He’s developing a large-scale ML algorithm to optimize the profit of an e-commerce shop. Since he’s constantly adjusting his work, he keeps on wondering about the counterfactual. What would happen if we did not use our algorithm at all?
That’s why he has test groups! For a small percentage of all people in the e-commerce shop, he deploys a different profit formula, a fixed one. For yet another, he deploys a simple percentage formula.
It turns out that, with this approach, he knows the counterfactual! He knows that his algorithm generated a cool million $$$ of additional profit this year and that his improvements drove this up by $200,000.
Counterfactuals matter.