Mapping the Mind of Qwen 3.5 9B: A Sparse Autoencoder for Mechanistic Interpretability
We release a sparse autoencoder trained on the internal activations of Qwen 3.5 9B, enabling feature-level inspection of what the model represents. Zero dead features, 16,384 interpretable dimensions, trained on a single RTX 4090.
interpretability sparse-autoencoder research