16 Commits

Author SHA1 Message Date
Zafir Stojanovski
0e4582f83b fix(evaluation): Add instructions for running on MMLU Pro (#497)
* add instructions for mmlu pro, format instructions for math benchmarks

* lint

* remove `--fewshot_as_multiturn`
2025-08-01 16:27:56 +02:00
Zafir Stojanovski
a969d8ef05 feat(curriculum): Knights and Knaves configs (#488)
* configs

* reduce complexity of curriculum

* update lower bound

* add failure threshold

* update last_k

* update thresholds for success and failure

* update curriculum file as well

* update run name for noncurriculum

* lint

* dtype model eval

* return binary scoring

* set eval repeats to 3

* fix tests
2025-07-31 10:18:05 +02:00
Zafir Stojanovski
0f5352e5cd fix: Training README.md (#491)
* Update README.md in `training`

* add pip install for verl
2025-07-27 11:56:00 +02:00
joesharratt1229
4b60c32978 Curr exp (#487)
* began curr exp

* added holdout words

* updated config

* added context

* updated base curriculum

* updaed

* updated curriculum

* updated

* updated

* updated automatic flag

* updated ray trainer

* update
2025-07-25 20:38:47 +01:00
Zafir Stojanovski
56ce2e79a7 tutorial(training): Add a minimal example with trl (#473)
* v0

* 2 gpu setup

* improve parsing from yaml

* update yaml dataset example

* remove restriction on flash attn

* more comments

* first version of the readme

* pin torch

* simplify requirements

* just flash attn

* use set env instead

* simpler set env

* readme

* add wandb project to setup

* update template

* update model id

* post init to capture the config and weight

* extract metadata

* update config

* update dataset config

* move env for wandb project

* pre-commit

* remove qwen-math from training

* more instructions

* unused import

* remove trl old

* warmup ratio

* warmup ratio

* change model id

* change model_id

* add info about CUDA_VISIBLE_DEVICES
2025-06-21 00:01:31 +02:00
Oliver Stanley
1232a7d1e5 simplify training setup instructions (#454)
* simplify training setup instructions

* tweaks

* update cfgs

* readme update

* readme update
2025-06-06 09:51:29 +01:00
Oliver Stanley
add527ada1 update training dir with external eval details (#437)
* added games

* added llama 3b training conf

* update readme with details of external evals

* readme update

---------

Co-authored-by: joesharratt1229 <joesharratt1229@gmail.com>
2025-05-19 00:35:41 +02:00
Zafir Stojanovski
0cda6b1205 qwen math training code (#435)
* qwen math training code

* pre-commit
2025-05-16 13:19:19 +02:00
Oliver Stanley
85f3c6dd02 updated inter-domain generalisation eval configs (#432)
* tweak eval configs

* add eval configs

* add eval config
2025-05-15 09:08:16 +02:00
joesharratt1229
73e3cb33a4 Added games training and evaluation configuration (#426)
* added games

* Update eval_games_composite.yaml

* Delete training/evaluations/eval_qwen_3b.yaml

* Add files via upload

* Delete training/evaluations/eval_algebraic_composite.yaml

* Delete training/evaluations/eval_algorithmic_composite.yaml

* Delete training/evaluations/eval_arithmetic_composite.yaml

* Delete training/evaluations/eval_cognition_composite.yaml

* Delete training/evaluations/eval_games_composite.yaml
2025-04-26 19:45:32 +01:00
Oliver Stanley
10863ea12b inter-domain generalisation evaluation configs (#424)
* add inter-domain generalisation eval config for algebra

* add algorithmic eval cfg

* vllm infer

* add arithmetic eval cfg

* add geometry eval cfg

* add arc cfg

* add games eval cfg

* add cognition eval cfg

* add graphs eval cfg
2025-04-22 17:32:35 +01:00
joesharratt1229
d0ef136d5b Feat/intragen experiments (#414)
* added curriculum

* readapted readme

* corrected small errors

* Delete eval/eval/r1/algorithmic/word_sorting.json

* removed redundant argument

* added spell

* removed duplicated fit

* changed config

* added composite changes

* added composite changes

* updated yaml

* added spell backward

* updated read me

* added qwen2.5

* added

* Add files via upload

* updated missing trainer func

* updated curr

* updated spell back

* updated correctness score func

* updated configs

* added local evals

* added updates

* updated datasets

* added fsdp to hf utility

* added algorithmic qwen 3b yaml

* updated read me

* updated configs

* added preappend token

* updated with thinking token

* updated test score board

* resolved comments

* added evaluation scripts

* removed results from pr

* added config

* added partial reward scoring

* added evaluation composites

* added training configs

* added games eval

* added rubriks cube

* resolved merge cinflicts

* added games config

* added latest eval configs

* updated strucutre

* Delete training/evaluations/eval_graphs_composite.yaml

---------

Co-authored-by: joesharratt1229 <joesharrat1229@gmail.com>
2025-04-16 08:04:52 +02:00
Oliver Stanley
224532f12a first inter-domain generalisation experiments (#412)
* tweak len reward

* first inter-generalisation experiment config

* update inter algorithmic config

* default to empty config

* fix typo

* change config to match experiment script

* long prompt fixes

* algorithmic training config tweaks

* imports

* update algorithmic training cfgs

* first logic composite config

* fix dset name

* tweaks

* fix syllogisms dataset

* rm temp print

* initial algebra config

* algebra cfg tweaks

* add gc

* add initial games cfg

* rename games cfg

* fix dset name

* fix sokoban metadata

* remove boxnet

* games cfg tweak
2025-04-14 21:06:40 +01:00
joesharratt1229
43c739cb3e Feat/curr adj (#394) 2025-04-02 06:39:14 +01:00
Zafir Stojanovski
c6663cdb81 fix(training): Prepend <think> token in format reward (#396)
* prepend think token in format reward

* pre commit + fix some default vals

* add checkpoint config
2025-03-28 09:45:17 +01:00
Oliver Stanley
eb69916c1b initial verl training codebase (#389)
* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
2025-03-20 15:04:57 +00:00