Model Benchmarks →
The eval shop: five JS engines fighting in C, an ultra-long-horizon agent marathon, a contamination-proof math benchmark, and a refusal-rate probe — papers embedded inside.
Last chance to make stuff for humans. Let's have fun the whole time. Dog is here for morale.
English access to research papers from ChinaXiv and beyond — machine-translated at scale. Now joined by RussiaRxiv, soft-launched with the Russian corpus coming online.
The eval shop: five JS engines fighting in C, an ultra-long-horizon agent marathon, a contamination-proof math benchmark, and a refusal-rate probe — papers embedded inside.
Full-ruleset Blood Bowl engine in C with PufferLib self-play, validated seven ways against real tournament replays before a single training step. Training runs around the clock — watch it live on Blood Bowl TV.
Ars Magica Fifth Edition, played properly: build characters by the book, advance the saga season by season, and gather the troupe at a live table — covenants, cited rules answers, the Mythic Europe timeline.
Vibecoding an entire game with my son: a ninja-vs-zombies wave fighter built in Unity 6, driven from the command line. Gameplay footage on the way.
A Story Grid-driven fiction harness — the model writes the prose, the machinery validates structure, tells, and detection. Read The Kept Watch, written end-to-end inside it.
* approval pending treats. the morale department is a 14lb terrier named Watson. he has no email.


