Commit Graph

  • 16d9839071 test {i} Alex Cheema 2024-12-11 12:53:55 +00:00
  • 8269b4b190 t Alex Cheema 2024-12-11 12:38:51 +00:00
  • d4cc2cf13d Merge pull request #480 from blindcrone/train-working Alex Cheema 2024-12-11 11:47:57 +00:00
  • 329efb2381 Model loading and saving for tinygrad Nel Nibcord 2024-12-11 03:21:17 -08:00
  • 16651a3506 dashboard tweaks Alex Cheema 2024-12-11 11:04:25 +00:00
  • b1397b49be Proper sharding in tinygrad Nel Nibcord 2024-12-11 02:45:39 -08:00
  • 7f0c12a98d embed fix Nel Nibcord 2024-12-11 02:32:51 -08:00
  • bd3114457f Dummied up an abstact save_checkpoint Nel Nibcord 2024-12-10 13:21:29 -08:00
  • cc66a0b782 Missed one Nel Nibcord 2024-12-10 13:15:19 -08:00
  • 124a0338b4 Slightly simplified waiting for outstanding requests Nel Nibcord 2024-12-10 13:03:48 -08:00
  • a4313da8d1 Removed statefulModel stuff from mlx impl too Nel Nibcord 2024-12-08 17:31:20 -08:00
  • 0673d6452c Removed ensure_session to clean stuff up. May revisit later Nel Nibcord 2024-12-08 03:12:18 -08:00
  • 6aaea8c74c Abstract load checkpoint method Nel Nibcord 2024-12-08 03:10:31 -08:00
  • 2a3a2e5e67 circular include lol Nel Nibcord 2024-12-08 03:05:04 -08:00
  • 0c5762d18a Node rename Nel Nibcord 2024-12-11 02:56:25 -08:00
  • c2332e2478 Moved nodes around Nel Nibcord 2024-12-11 02:54:36 -08:00
  • 763fbf8486 Updated node refs Nel Nibcord 2024-12-11 02:54:01 -08:00
  • 608a3d800b Proper sharding in tinygrad train-working-test Nel Nibcord 2024-12-11 02:45:39 -08:00
  • 8f0d19e9b0 embed fix Nel Nibcord 2024-12-11 02:32:51 -08:00
  • ee97563b45 Dummied up an abstact save_checkpoint Nel Nibcord 2024-12-10 13:21:29 -08:00
  • 02281ebe3d Missed one Nel Nibcord 2024-12-10 13:15:19 -08:00
  • dffa17b2d1 Slightly simplified waiting for outstanding requests Nel Nibcord 2024-12-10 13:03:48 -08:00
  • 720940d563 Removed statefulModel stuff from mlx impl too Nel Nibcord 2024-12-08 17:31:20 -08:00
  • bc2812238f Removed ensure_session to clean stuff up. May revisit later Nel Nibcord 2024-12-08 03:12:18 -08:00
  • e9971f74ae Abstract load checkpoint method Nel Nibcord 2024-12-08 03:10:31 -08:00
  • 223d35cea0 circular include lol Nel Nibcord 2024-12-08 03:05:04 -08:00
  • 8dc73074e6 Nodes don't need an abstract base class Nel Nibcord 2024-12-10 13:12:05 -08:00
  • 59af2dd592 Do we need casting here? Nel Nibcord 2024-12-08 02:50:12 -08:00
  • b22c21ac16 Some session method cleanup Nel Nibcord 2024-12-08 02:35:45 -08:00
  • 98edb393b2 Initialize inference engine session in base class Nel Nibcord 2024-12-08 02:34:31 -08:00
  • bcf87e79b7 Okay let's turn no_grad back on. We'll worry about that when tinygrad training works Nel Nibcord 2024-12-06 04:49:46 -08:00
  • b7bbda3348 Removed tinygrad StatefulModel class, as it's no longer used Nel Nibcord 2024-12-06 01:06:50 -08:00
  • 67f5ae25a5 Fixing tinygrad model Nel Nibcord 2024-12-06 00:25:18 -08:00
  • bfa3b36be5 Fixing tinygrad model Nel Nibcord 2024-12-06 00:24:04 -08:00
  • 37a75d6b96 Fixing tinygrad model Nel Nibcord 2024-12-06 00:18:46 -08:00
  • 0d3abfca95 Made models save properly Nel Nibcord 2024-11-21 21:14:53 -08:00
  • 9283f6d7bd Correct loss propagation so we can see the actual loss instead of just the requestor shard's loss Nel Nibcord 2024-12-06 00:50:52 -08:00
  • 9eadee310b Basic model saving Nel Nibcord 2024-11-20 17:07:15 -08:00
  • 38e368f00b Fixed up the ops so that batches work Nel Nibcord 2024-11-20 16:01:56 -08:00
  • dd3d99043b Working distributed training Nel Nibcord 2024-11-26 17:10:19 -08:00
  • 175ebc1c42 Coordination biz Nel Nibcord 2024-11-21 17:34:03 -08:00
  • 3e869051f6 Okay we should probably await the update Nel Nibcord 2024-11-19 07:59:53 -08:00
  • 75c8650f1f Naive network-propagated loss implementation on MLX Nel Nibcord 2024-12-06 00:50:23 -08:00
  • 836856824e WIP: Training works on mlx Nel Nibcord 2024-12-06 00:49:55 -08:00
  • a6fd7a3430 Generalizing some of the dataset biz while also creating uniform batches Nel Nibcord 2024-11-26 00:18:58 -08:00
  • f5efbe1b8f Initial distributed evaluation implementation Nel Nibcord 2024-12-06 00:49:23 -08:00
  • 1e869a0f15 trigger test Alex Cheema 2024-12-10 02:04:52 +00:00
  • 5a4d128db6 trigger test Alex Cheema 2024-12-09 08:02:29 +00:00
  • 8a5d212cfc test 20 Alex Cheema 2024-12-08 23:38:30 +00:00
  • 53edb8508b test 19 Alex Cheema 2024-12-08 23:38:24 +00:00
  • 29d9df04bf test 18 Alex Cheema 2024-12-08 23:38:18 +00:00
  • 4d6af6e6ca test 17 Alex Cheema 2024-12-08 23:38:13 +00:00
  • 8c7c156f57 test 16 Alex Cheema 2024-12-08 23:38:07 +00:00
  • 310843487f test 15 Alex Cheema 2024-12-08 23:38:01 +00:00
  • a4b221d0a0 test 14 Alex Cheema 2024-12-08 23:37:55 +00:00
  • 286db875de test 13 Alex Cheema 2024-12-08 23:37:49 +00:00
  • d714e40f62 test 12 Alex Cheema 2024-12-08 23:37:43 +00:00
  • e78ef75531 test 11 Alex Cheema 2024-12-08 23:37:37 +00:00
  • 38eaecf087 test 10 Alex Cheema 2024-12-08 23:37:31 +00:00
  • 3cf28f8452 test 9 Alex Cheema 2024-12-08 23:37:26 +00:00
  • 9ba8bbdd70 test 8 Alex Cheema 2024-12-08 23:37:20 +00:00
  • af6048e373 test 7 Alex Cheema 2024-12-08 23:37:14 +00:00
  • d93b8e8948 test 6 Alex Cheema 2024-12-08 23:37:08 +00:00
  • b69cb49a46 test 5 Alex Cheema 2024-12-08 23:37:02 +00:00
  • cc74b1f9b3 test 4 Alex Cheema 2024-12-08 23:36:57 +00:00
  • e78a52de5f test 3 Alex Cheema 2024-12-08 23:36:51 +00:00
  • f6c2c37c4b test 2 Alex Cheema 2024-12-08 23:36:45 +00:00
  • 314a5d9781 test 1 Alex Cheema 2024-12-08 23:36:22 +00:00
  • b4e885bbd2 test range Alex Cheema 2024-12-08 23:36:14 +00:00
  • bd9d11861b sleep before bench Alex Cheema 2024-12-08 23:24:46 +00:00
  • 571b26c50e allowed interface types Alex Cheema 2024-12-08 23:19:58 +00:00
  • b21681931d remove Glen 2024-12-08 23:12:39 +00:00
  • f584e86d8e get rid of lfs stuff Alex Cheema 2024-12-08 22:55:19 +00:00
  • fd05bca1c8 lfs Alex Cheema 2024-12-08 22:46:49 +00:00
  • cbac4d6a3e git version Alex Cheema 2024-12-08 22:44:32 +00:00
  • b0977f97ab t Alex Cheema 2024-12-08 22:43:23 +00:00
  • 1716f637f7 test Glen 2024-12-08 22:32:03 +00:00
  • 903a5aabf7 fix Glen 2024-12-08 22:26:44 +00:00
  • b4f86496ea bootstrap Glen 2024-12-08 22:23:02 +00:00
  • 8e57f3385c trigger test Alex Cheema 2024-12-08 22:14:23 +00:00
  • 3ccbdf19de add DEBUG_DISCOVERY Alex Cheema 2024-12-08 22:07:48 +00:00
  • 3687ba18df bench logs Alex Cheema 2024-12-08 22:02:39 +00:00
  • 6bb7c11bbb enable debug Alex Cheema 2024-12-08 21:54:15 +00:00
  • 54d3c823b9 dash sounds Alex Cheema 2024-12-08 21:53:40 +00:00
  • d953f6f538 add model to benchmark key Alex Cheema 2024-12-08 21:17:06 +00:00
  • c8f93721c5 model matrix Glen 2024-12-08 21:14:36 +00:00
  • fb8d87025f t Alex Cheema 2024-12-08 21:02:42 +00:00
  • 87865f0cd9 list exo processes before test, warmup req in bench Alex Cheema 2024-12-08 20:56:37 +00:00
  • 755dd477dd jobname Glen 2024-12-08 20:37:33 +00:00
  • fb44eb086c simplify bench Alex Cheema 2024-12-08 20:30:07 +00:00
  • 750bfb9d10 use depot runners Alex Cheema 2024-12-08 20:09:36 +00:00
  • 8f259e7c1e own runner test Alex Cheema 2024-12-08 20:03:56 +00:00
  • be8cbc0f56 trigger test Alex Cheema 2024-12-08 19:28:55 +00:00
  • fe8074929f fix Glen 2024-12-08 19:08:47 +00:00
  • 45b3582f13 tiny tweaks Alex Cheema 2024-12-08 19:02:59 +00:00
  • c3c80c61c9 name Glen 2024-12-08 19:02:53 +00:00
  • c138de0875 job_name Glen 2024-12-08 18:56:37 +00:00
  • a44bf6fdc4 t Alex Cheema 2024-12-08 18:51:08 +00:00
  • b1a386af02 add back mlx, use depot runners Alex Cheema 2024-12-08 18:48:14 +00:00
  • 61deb32404 t Alex Cheema 2024-12-08 18:43:08 +00:00