đź”— Sky-T1 - Train your own O1 preview model within $450
Stanford’s Sky Lab announced the “Sky-T1” model that performs competitively in reasoning and coding while comparison to OpenAI’s o1 model, while using a small compute budget.
They set about to produce this open-weights model because the current leaders like o1
and gemini 2.0 flash
thinking while excel in reasoning, showing the ability to solve complex tasks by producing a long internal chain of thought, among other advancements. However, the technical details and model weights are un-accessible.
- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)
- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)
- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks.
This is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.
Inference time compute is still underutilized for AI deployments. It is better to distill the reasoning from larger models like R1 for specific tasks. Even better, one can mix in custom thinking instructions for specific sub-problems, so that a fine-tuned model learns o mix of task specific reasoning and custom logic. This can often beat custom prompt iteration. đź”—
This Guide: Train a Reasoning Model | Kiln AI Docs goes into how to distill for thinking models.