Mixtral is a mystery to me. How in the world is that team on par with/beating GOOGLE, who presumably have all the resources in the world to throw at this?
Mixtral is on-par with Gemini Pro, not Gemini Ultra (and even there it is further behind Gemini Pro than Gemini Pro is behind GPT 3.5). But to directly answer your question, they are quite well-funded, having raised over $700mil to date. I definitely wouldn't count them out.
Mixtral is missing in half of the benchmarks in that paper. Hardly conclusive. It’s also common knowledge that these benchmarks have a lot of issues[0]. A good litmus test, but not a substitute for actually seeing how the models do in the real world.
On the topic of “hardly conclusive” things, Gemini Pro literally told me just a few minutes ago[1] that the Avatar movies did not have humans in them. There was no funny business in the prompting. At least Mixtral knows that Avatar has humans in it. Most of Gemini Pro’s responses have been fine, but not exceptional.
Right. I'm just pointing out that comparing one model with a distilled version of another and then making broad statements about the companies behind them isn't really useful.
Surely you could make a comparison of two unreleased models, but it wouldn't be interesting because you don't have any real data (and benchmarks don't really mean anything).
Debating the usefulness of hn commentary is a somewhat philosophical issue, but I think it's entirely fair to draw parallels between what is, not what might be.
Gemini Ultra is self-evidently not ready for production. What the issues are? Who knows, but in a game that as of right now is mostly about reducing the amount of brute force required, something as "simple" as not being efficient enough is actually not something to gloss over. If your engines entire stick is having the greatest graphics but you can't make it run at acceptable fps, well, then it's not actually a usable product.
A LLM that is not actually released could very well be in a comparably dire state and fixing it while also delivering on the promised performance might be entirely non-trivial.
My understanding, however fuzzy, is that all the safety/politeness tuning results in models that are at times less likely to give accurate responses. That said, I suspect that either way both types of models largely give similar answers for soft questions aside from those politeness and safety things
There's a survivorship bias going on here. You've never heard of the thousands of teams out there that are Mistral's size but AREN'T getting results that compete on the global stage, but they do exist. But you've heard of Google, whether they're getting it right or not.
"Thousands of teams" is a vast exaggeration. A tiny handful of companies out there have received funding to the tune of a billion dollars for model training like Mixtral. All of them have researchers with loaded resumes, and most are producing stuff of value. The thousands of other startups in the ecosystem are then taking these APIs and adding trivial abstractions on top.
Slight correction, Mistral.AI was funded by two people from Meta (Guillaume Lample, Timothée Lacroix) and one from Deepmind (Arthur Mensch).
For new technologies, what matters most might be the universities where people are from, rather than the companies. The founders of Google graduated from Stanford. The founders of Mistra AI graduated from École Polytechnique and École Normale Supérieure, that are renowned in France, notably for their scientific formations.
Same as OpenAI, Anthropic, Cohere, Adept and hundreds of other small-mid sized AI startups. When the dust settles and the space gets more mature the exodus from Google Brain/Deepmind over the last few years will be considered this generation's Fairchild moment.