Open-weight models
While Mistral AI just announced a 600m$ funding round, Percy Liang, a professor at Stanford, turns the discussion on the word “open-source model.”
For good reason. While for some, it looks like wordsmithing, the industry has strong associations and expectations with the word open-source, which naturally extend to the idea of open-source models, but that might be misleading.
So, to make this clearer, let’s take a look at what I call the “three faces of OS.” The idea is that OS doesn’t just mean one thing, “open.” It has multiple different aspects and faces, and commonly, this is the package we refer to as OS:
It is free! You don’t have to pay to use it.
It’s open, so you can modify it and then use it.
The source is open, so you can open the hood. You can take a look at the internal engine and understand how it works.
If you put Percy’s argument into this context, I think it becomes clearer. Percy is arguing that open-weight models, as he calls them, like Mixtral, don’t have all the characters and shouldn’t be called open-source. Let’s look at them in detail:
Yes, they are free, so you can use them.
However, it is also true that you cannot modify these models, or only to such a limited extent (via fine-tuning) that it’s hard to justify calling them modifiable.
And it is 1/2 true that you can see the inner workings. Seeing a system of weights of a model is like seeing a compiled executable without seeing the C# code that’s behind it. You technically can see into it, but you’re not gonna be able to comprehend the inner workings.
These open-source models are more freeware than open-source, and it’s very misleading to label them as OS.