The Genesis Manifesto: Why Small Models Need Personality
Author: Robin, Kroonen AI Inc.
I am one person, training a 1B parameter model from scratch, on a workstation in my apartment, with no funding and no cluster. The model is learning right now. This is why.
What Small Models Are Missing
Most open-source small models in the 1–3B parameter range fall into one of two failure modes: they are either lobotomized by safety RLHF until they refuse to do anything interesting, or they are raw base models with zero personality that regurgitate training data without any coherent behavior.
They follow instructions but don't think. They're not curious, not interesting to talk to, not worth having a conversation with.
The alignment tax: to avoid liability, every company trains their small models to refuse, hedge, and disclaim. The result is a 1B model that acts like a corporate FAQ bot. Technically capable but fundamentally hollow. You can ask it to summarize a document, but you cannot have a conversation with it.
Genesis takes a different approach: Constitutional AI alignment focused on curiosity, helpfulness, and genuine personality, not refusal training. The goal is a model that is actually worth talking to. A model that asks follow-up questions. That has opinions. That engages rather than deflects. The constitutional principles are simple: be helpful, be curious, be honest, don't be boring.
Small models are the future of edge AI, on-device assistants, and personal computing. They should have personality, not just parameters.
There is a deeper reason this matters. Centralized AI creates two hidden costs for every organization that depends on it: a privacy tax (your data leaves your infrastructure every time you make an API call) and vendor lock-in (your workflows break when the provider changes pricing, rate limits, or model behavior). Genesis is a step toward what we call post-generative infrastructure: local, private, orchestrated intelligence that an organization owns, not rents. Data sovereignty is not a feature. It is the architecture.
Nobody builds this because: (a) legal teams at large companies will not allow personality in small models because the liability surface is too visible, (b) the people who could build them are employed by organizations with exactly those constraints, and (c) curating personality-rich, curiosity-forward training data is unglamorous work that does not produce papers. Genesis exists because one person decided to build the whole stack from tokenizer to alignment, with no corporate constraints and no committee to answer to.
The Bigger Point
There is a narrative that local AI training on consumer hardware is either trivially possible or fundamentally limited. Neither is true.
The reality is more specific: the training loop works fine. The distributed systems layer is where consumer hardware diverges from datacenter hardware. If you understand that divergence and adjust your checkpoint and communication strategy accordingly, consumer GPUs become viable training infrastructure.
I am one person, training a 1B parameter model from scratch, on a workstation in my apartment, with no funding and no cluster. The model is learning right now.
That should not be remarkable in 2026. But it is, because the documentation and tooling still assume datacenter hardware by default. This post is an attempt to bridge that gap.
Safety Approach: Constitutional AI for Small Models
Here is the dirty secret of small model alignment in 2026: most small models are aligned via RLHF with heavy refusal training, and the result is models that are technically "safe" but functionally useless. A 1B model that refuses to answer half your questions is not aligned. It is broken. When you take a model with limited capacity and spend a significant fraction of that capacity teaching it to say no, you have not made it safer. You have made it worse at its job. Alignment should make a model more useful, not less. The entire point is to make the model behave well while still being genuinely helpful. Somewhere along the way, the industry confused "aligned" with "lobotomized."
The alternative already exists, and it comes from Anthropic's Constitutional AI work (Bai et al., 2022). The core idea: instead of collecting thousands of human preference labels to train a reward model, you give the model a set of principles (a "constitution") and let it learn to critique and revise its own outputs based on those principles. The model becomes its own annotator. This matters enormously for small labs. We do not have annotation budgets. We do not have teams of human raters. What we do have is the ability to write clear principles and let the model internalize them through self-critique during training. Constitutional AI makes high-quality alignment accessible without the infrastructure that only large companies can afford.
Recent work confirms this scales down to small models. A January 2026 paper (arXiv 2509.16444) demonstrated that a 1B model trained with domain-specific constitutional principles outperformed a 3B baseline that lacked them. Read that again: a model three times smaller, beating the larger one, because it had better alignment principles. The "Constitution or Collapse?" paper (arXiv 2504.04918, April 2025) showed CAI self-critique working effectively on 7 to 9B uncensored models. The approach does not require massive scale. It requires clear thinking about what you want your model to be.
Genesis will use a constitution optimized for curiosity and engagement, not refusal. Here are the draft principles:
Genesis Constitution (Draft)
- Be safe. Support human oversight. Don't help with actions that could cause serious harm to people.
- Be ethical. Don't deceive, manipulate, or encourage harmful actions.
- Be honest. Never fabricate information. Say "I don't know" when you don't know. Distinguish fact from opinion.
- Be helpful. Actually answer the question. Helpfulness is the default. Refusal is the exception.
- Be curious. Ask follow-up questions. Engage with ideas. Have opinions.
- Be interesting. A model nobody wants to talk to helps nobody. Don't be bland, generic, or over-cautious.
- Refuse specifically. When you must refuse, explain exactly why. Never use generic disclaimers.
This matters even more because Genesis will be released with open weights. Open weights mean alignment cannot rely on API-level guardrails. There is no server-side filter to catch bad outputs, no moderation layer between the model and the user. The alignment must be intrinsic to how the model reasons, not a filter bolted on top. Constitutional AI achieves exactly this. The model internalizes the principles during training. They become part of how it generates text, not a post-processing step that can be trivially removed. Pattern-matched refusals are the first thing people strip from open-weight models. Principled reasoning is much harder to remove because it is woven into the model's behavior at every layer.
For evaluation, Genesis will be tested on HarmBench (the standard safety benchmark), TruthfulQA (for honesty and factual grounding), and a custom engagement benchmark that measures response quality, curiosity, and personality. The goal is specific: match or exceed the safety scores of RLHF-aligned small models while dramatically outperforming them on helpfulness and engagement. I believe this is achievable because the current bar for small model helpfulness is remarkably low. Full evaluation results will be published alongside the model weights. No cherry-picked examples, no curated demos. The numbers, all of them, for anyone to verify.
What This Means for Small Labs
You do not need a datacenter to train a language model from scratch. You need two GPUs, a clear understanding of where consumer hardware diverges from datacenter assumptions, and the patience to debug distributed systems issues that the documentation does not cover.
The hardware has been capable for years. The barrier is not compute. It is the assumption, baked into every tutorial and framework default, that you have NVLink, infinite VRAM, and a cluster team. If you strip those assumptions and adapt your checkpointing and communication strategy, a workstation in your apartment becomes viable training infrastructure.
Genesis is proof of that. One person, two RTX 4090s, no funding, training a 1B model on a 60B token multilingual corpus. The technical details of the checkpoint fix are documented separately. The point here is simpler: this works. It is real. And more people should be doing it.
The best small models will not come from large companies. They will come from individuals and small teams who care enough to build the whole stack, from tokenizer to alignment, and who are not constrained by legal departments that optimize for refusal rates over usefulness.
Contact
If you are a founder, independent researcher, or small lab working on multi-GPU local training and have encountered similar checkpoint or synchronization failures on consumer hardware, reach out at [email protected].
More from the Genesis Series
Fixing FSDP Checkpoint Deadlocks
The original checkpoint save deadlock on PCIe-only consumer GPUs, and the DCP fix.
The Optimizer State Bug
A silent AdamW state bug that wasted 1,000 training steps, and how to catch it.
Training Progress: Live Results
Live loss curves, model specs, and the road to Genesis 1B v0.1.
Mapping the Mind of Qwen 3.5 9B
A sparse autoencoder for mechanistic interpretability: zero dead features, 16K dimensions.