Updates readme

This commit is contained in:
Waldek Mastykarz
2025-01-20 10:19:54 +01:00
parent 155d1a6347
commit 1a4c800f1a

View File

@@ -20,37 +20,13 @@ Select the model to use for tokenization in the Jupyter notebook. You can choose
1. Set the `text` variable to your text.
1. Run all cells.
```python
text = 'Your text here'
num_tokens = tokenizer_fn(text)
print(f'Number of tokens in text: {num_tokens}')
```
### Calculate tokens in a file
1. Set the `file_path` variable to the path of your file.
1. Run all cells.
```python
file_path = 'path/to/your/file.txt'
num_tokens = get_num_tokens_from_file(file_path)
print(f'{file_path}: {num_tokens}')
```
### Calculate tokens in files in a folder
1. Set the `folder_path` variable to the path of your folder.
1. Optionally, specify a filter for which files to include.
1. Run all cells.
```python
folder_path = 'path/to/your/folder'
file_filter = ['.md'] # Include only files with the .md extension
tokens_info = get_num_tokens_from_folder(folder_path, file_filter)
tokens_info_df = pd.DataFrame(tokens_info, columns=['file', 'tokens'])
tokens_info_df = tokens_info_df.sort_values(by='file')
print(f'Tokens in files in folder {folder_path} using model {model_name}:\n')
print(tokens_info_df.to_string(index=False))
```