Mixture of Experts (MoE)
Mixture of experts (MoE) is a neural network architecture that divides a model into many specialized sub-networks called "experts" and uses a routing mechanism to activate only a small subset of them for each input. This design allows models to have trillions of total parameters while only using a fraction during each prediction, dramatically reducing computational cost without sacrificing capability.
Example
DeepSeek-R1 has 671 billion total parameters split across hundreds of expert sub-networks, but only 37 billion parameters are active for any single token. A router network decides which experts to activate — math experts for equations, language experts for translation — keeping inference fast and cost-effective despite the model's massive total size.
Related Terms
Put this into practice
Build polished, copy-ready prompts in under 60 seconds with SurePrompts.
Try SurePrompts