9 Commits

Author SHA1 Message Date
Zafir Stojanovski
c6663cdb81 fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward

* pre commit + fix some default vals

* add checkpoint config
2025-03-28 09:45:17 +01:00
Andreas Koepf
2802066233 remove data/ from main .gitignore 2025-03-07 16:16:40 +01:00
Zafir Stojanovski
5109ed89c9 pre-commit 2025-02-23 13:11:31 +01:00
Zafir Stojanovski
6bbec2ac4e exploratory notebook 2025-02-22 00:46:33 +01:00
tohskai
847442ef0a Add PolynomialMultiplicationDataset (#64)
* Add PolynomialMultiplicationDataset
2025-02-07 14:06:41 +01:00
abdulhakeem
715102c277 Remove .DS_Store 2025-02-01 20:39:37 -06:00
Rich Jones
99bf648989 initial bf working, contrib not committed 2025-01-30 15:38:03 +01:00
Andreas Koepf (aider)
3f80fd7b80 build: Initialize reasoning_gym package structure with packaging and development setup 2025-01-23 10:50:54 +01:00
Andreas Koepf
530cb523c8 chore: Add .gitignore with .aider and .env files 2025-01-23 10:50:53 +01:00