
The Next Phase of Psyche
Starting today, Psyche will train a number of new models in parallel, all aimed at creating useful, novel open source AI.
Our initial training run on testnet verified that we can train models over internet bandwidth at significant parameter and dataset sizes. By both measures, our Consilience 40B run was the largest distributed pre-training run ever. Though we have hit some important milestones with this run, more than anything Nous strives to produce high quality intelligence.
While we're happy to have reached this milestone for provability, Nous has one mission: to open-source and distribute the world's best intelligence. In that spirit, we're moving from provability to performance, as we begin our efforts to pre-train, post-train, and RL, state-of-the-art open source models.
Since the launch of the Psyche testnet we've made substantial improvements to the codebase, including full trainer abstraction, which will allow us to train arbitrary models and move beyond pre-training to supervised fine-tuning (and eventually reinforcement learning). Because of this, the palette of models Psyche can train has greatly increased, and so we're expanding the scope of what Psyche will be doing.
WHAT'S NEXT
For the next phase of Psyche, we will run a series of ablations to improve our next foundation models. Running ablations is the process of setting up a training experiment with a control model and variables to find optimal conditions for a training run. Labs run ablations when deciding their datamix and when choosing hyperparameters for a run.
In particular we're mixing together the best open source pre-training datasets spanning general world knowledge, and more specialized datasets for code and math1. This process will perfect a training recipe which will serve as the formula for future foundation models.
We'll also be training Hermes 4 on Bytedance's latest 36B model using the Psyche network. Seed-OSS-36B is a performant model in a great weight class that wasn't quite covered in our initial release. It sits in that "sweet spot" – large enough to be powerful, small enough to be deployable locally on your GPU or MacBook. The model's tool-using and reasoning capabilities at base are superior to the Llama model series. Its native 512k context length and adjustable reasoning lengths make for a great foundation to experiment with.
With Psyche, these runs happen fully in the open, decentralized and coordinated by the Psyche smart contracts on the Solana blockchain with no human intermediary. See all of the runs and follow their progress on the Psyche dashboard .
1 We're mixing FineWeb-Edu , FineWeb2-HQ , FinePDFs , DCLM-Baseline , Stack-Edu , Nemotron-CC-Math , with more to come.