reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2025-10-09 13:40:09 +03:00

Files

Zafir Stojanovski 56ce2e79a7 tutorial(training): Add a minimal example with trl (#473 )

* v0

* 2 gpu setup

* improve parsing from yaml

* update yaml dataset example

* remove restriction on flash attn

* more comments

* first version of the readme

* pin torch

* simplify requirements

* just flash attn

* use set env instead

* simpler set env

* readme

* add wandb project to setup

* update template

* update model id

* post init to capture the config and weight

* extract metadata

* update config

* update dataset config

* move env for wandb project

* pre-commit

* remove qwen-math from training

* more instructions

* unused import

* remove trl old

* warmup ratio

* warmup ratio

* change model id

* change model_id

* add info about CUDA_VISIBLE_DEVICES

2025-06-21 00:01:31 +02:00

open-instruct

Feat/open instruct example (#381 )

2025-03-17 23:20:11 +01:00

OpenRLHF

use native types List->list, Dict->dict, Set->set, Tuple->tuple

2025-02-21 15:15:38 +01:00

trl

tutorial(training): Add a minimal example with trl (#473 )

2025-06-21 00:01:31 +02:00

unsloth

Better progress tracking