Added arXiv link and demo sample output

2024-07-16 14:20:27 +03:00 · 2024-06-14 12:58:59 +02:00
parent c5c04e7f29
commit c2642c6ba6
4 changed files with 6 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -74,6 +74,11 @@ preds = sampler({'rgb@224': img.cuda()}, seed=None)
 sampler.plot_modalities(preds, save_path=None)
 ```

+You should expect to see an output like the following:
+
+![4M demo sampler output](./assets/4M_demo_sample_darkmode.jpg#gh-dark-mode-only)
+![4M demo sampler output](./assets/4M_demo_sample_lightmode.jpg#gh-light-mode-only)
+
 For performing caption-to-all generation, you can replace the sampler input by: `preds = sampler({'caption': 'A lake house with a boat in front [S_1]'})`.
 For a list of available 4M models, please see the model zoo below, and see [README_GENERATION.md](README_GENERATION.md) for more instructions on generation.

--- a/assets/4M_demo_sample_darkmode.jpg
+++ b/assets/4M_demo_sample_darkmode.jpg
--- a/assets/4M_demo_sample_lightmode.jpg
+++ b/assets/4M_demo_sample_lightmode.jpg
--- a/notebooks/generation_4M-21.ipynb
+++ b/notebooks/generation_4M-21.ipynb
@@ -11,7 +11,7 @@
    "\n",
    "(\\* Equal contribution, random order)\n",
    "\n",
-    "[`Website`](https://4m.epfl.ch) | [`Paper`](TBD) | [`GitHub`](https://github.com/apple/ml-4m)\n",
+    "[`Website`](https://4m.epfl.ch) | [`Paper`](https://arxiv.org/abs/2406.09406) | [`GitHub`](https://github.com/apple/ml-4m)\n",
    "\n",
    "We adopt the 4M framework to scale a vision model to tens of tasks and modalities. The resulting model, named 4M-21, has significantly expanded out-of-the-box capabilities, and yields stronger results on downstream transfer tasks. \n",
    "\n",