Merge pull request #4 from eltociear/patch-1

Update README.md
2023-09-11 12:43:27 +03:00 · 2023-05-16 20:07:01 -07:00
parent 6b3eb82186 66cb319634
commit 50871be17e
1 changed files with 1 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -339,7 +339,7 @@ You can experiment with a tokenizer here: [https://platform.openai.com/tokenizer
 Different models will use different tokenizers with different levels of granularity. You could, in theory, just feed a model 0’s and 1’s – but then the model needs to learn the concept of characters from bits, and then the concept of words from characters, and so forth. Similarly, you could feed the model a stream of raw characters, but then the model needs to learn the concept of words, and punctuation, etc… and, in general, the models will perform worse.
-To learn more, [HuggingFace has a wonderful introduction to tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) and why they need to exist.
+To learn more, [Hugging Face has a wonderful introduction to tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) and why they need to exist.
 There’s a lot of nuance around tokenization, such as vocabulary size or different languages treating sentence structure meaningfully different (e.g. words not being separated by spaces). Fortunately, language model APIs will almost always take raw text as input and tokenize it behind the scenes – *so you rarely need to think about tokens*.