Publication

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Preprint

Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, Lei Hou, Juanzi Li, and Xiaozhi Wang

A sparse autoencoder-guided framework for using model internals to engineer post-training data for LLM reinforcement learning.