- [2026.2.4] ✨ Richer Training Data ✨ We have released high-quality visual reasoning data for perspective questions with rendered images from the new perpsective as reasoning here: Reasoning Data Train your models on this and let us know how it works!
- [2026.2.4] Check out an updated version of SAT that should work with current versions of Huggingface datasets and python versions: Data v2
- [2026.2.4] We released a Qwen2.5-VL model trained on a SAT and Video-R1 mixture as a strong baseline. Check it out here: 🤗 Model
We take actions in a 3D simulator and use privileged 3D information about the assets. We use
natural language descriptions of the assets and make question-answer pairs based on how the 3D nature of the scene changes with the actions taken.