Publication
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
A sparse autoencoder-guided framework for using model internals to engineer post-training data for LLM reinforcement learning.
Publication
A sparse autoencoder-guided framework for using model internals to engineer post-training data for LLM reinforcement learning.