Added arXiv link and demo sample output

This commit is contained in:
Roman Bachmann
2024-06-14 12:58:59 +02:00
parent c5c04e7f29
commit c2642c6ba6
4 changed files with 6 additions and 1 deletions

View File

@@ -74,6 +74,11 @@ preds = sampler({'rgb@224': img.cuda()}, seed=None)
sampler.plot_modalities(preds, save_path=None)
```
You should expect to see an output like the following:
![4M demo sampler output](./assets/4M_demo_sample_darkmode.jpg#gh-dark-mode-only)
![4M demo sampler output](./assets/4M_demo_sample_lightmode.jpg#gh-light-mode-only)
For performing caption-to-all generation, you can replace the sampler input by: `preds = sampler({'caption': 'A lake house with a boat in front [S_1]'})`.
For a list of available 4M models, please see the model zoo below, and see [README_GENERATION.md](README_GENERATION.md) for more instructions on generation.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

View File

@@ -11,7 +11,7 @@
"\n",
"(\\* Equal contribution, random order)\n",
"\n",
"[`Website`](https://4m.epfl.ch) | [`Paper`](TBD) | [`GitHub`](https://github.com/apple/ml-4m)\n",
"[`Website`](https://4m.epfl.ch) | [`Paper`](https://arxiv.org/abs/2406.09406) | [`GitHub`](https://github.com/apple/ml-4m)\n",
"\n",
"We adopt the 4M framework to scale a vision model to tens of tasks and modalities. The resulting model, named 4M-21, has significantly expanded out-of-the-box capabilities, and yields stronger results on downstream transfer tasks. \n",
"\n",