Wait, Did They Just Leak The Secret Behind GPT-4?

OpenAI’s GPT-4 may owe its capabilities to an old technique from the early 1990s known as “Mixture of Experts”

Wait, Did They Just Leak The Secret Behind GPT-4?

Wow. Remember how OpenAI has been super secretive about the architecture of GPT-4?

How have we gone into the wildest speculations?!

But now, more and more rumors about the GPT-4 architecture are surfacing. And here come’s the “wow” part: they are pointing to a concept from the early 1990s that could be GPT-4's key ingredient.

It’s called “Mixture of Experts”. Let’s take a closer look at what it is and what it would mean if it were indeed the secret sauce behind GPT-4.

Mixture of what now?

A mixture of Experts (MoE) is a machine learning technique in which, instead of a single model learning everything, there is a combination of sub-models where each one is specialized in solving specific tasks (hence, “expert”).

Imagine a team of experts that collaborate to address a complex issue… Obviously, this is an extremely powerful setup.

The challenge, however, is to coordinate these experts efficiently, and this is where a so-called Gating Network comes in. You can think of it as the team leader that assigns tasks to the experts: In the case of ChatGPT’s text completion models, the gating network would decide which expert should contribute to the final text prediction.

What’s fascinating is that the Gating Network doesn’t have to rely on a single expert, but can combine the insights of multiple experts, each bringing their nuanced understanding to the table.

https://machinelearningmastery.com/mixture-of-experts/

This concept has been around for quite a while now. In 1991, Michael I. Jordan and Robert A. Jacobs wrote in their paper, “Hierarchical Mixtures of Experts and the EM Algorithm.”

“Overall, our results suggest that mixtures of experts can be a highly effective approach to modeling complex functions in a variety of domains.”

MoE has since been used in many applications, until finally, in 2021, an advanced MoE approach was used in the creation of Google’s Switch Transformer, a language model with 1.6 trillion parameters.

GPT-4: AI Breakthrough or Marketing Masterclass?

In a recent article, Alberto Romero explained that by not disclosing GPT-4’s architecture, OpenAI has kept the narrative of being the most innovative AI company and, more importantly, fueled people’s imagination…and fears(I recommend reading it here).

If GPT-4 is indeed an MoE-based language model, then OpenAI has landed a perfect marketing coup: They successfully preserved their reputation and were able to promote their preferred narrative and the need for planning and regulation.

On the other side, GPT-4 is undoubtedly a monumental and extremely powerful language model.

If it indeed leverages the MoE paradigm, this could open up new opportunities for the open-source community to adopt and further develop this methodology, potentially narrowing the gap with proprietary models.