1
0
mirror of https://github.com/jfilter/clean-text.git synced 2021-09-19 22:32:58 +03:00

improve READMe

This commit is contained in:
Johannes Filter
2020-10-17 23:20:30 +02:00
parent 8a688b8d1f
commit 67e343dd08

View File

@@ -33,11 +33,14 @@ You may want to abstain from GPL:
pip install clean-text
```
NB: This package is named `clean-text` and not `cleantext`.
If [unidecode](https://github.com/takluyver/Unidecode) is not available, `clean-text` will resort to Python's [unicodedata.normalize](https://docs.python.org/3.7/library/unicodedata.html#unicodedata.normalize) for [transliteration](https://en.wikipedia.org/wiki/Transliteration).
Transliteration to closest ASCII symbols involes manually mappings, i.e., `ê` to `e`. Unidecode's mapping is superiour but unicodedata's are sufficent.
Transliteration to closest ASCII symbols involes manually mappings, i.e., `ê` to `e`.
`unidecode`'s mapping is superiour but unicodedata's are sufficent.
However, you may want to disable this feature altogether depending on your data and use case.
NB: The package is named `clean-text` and not `cleantext`.
To make it clear: There are **inconsistencies** between processing text with or without `unidecode`.
## Usage
@@ -67,7 +70,7 @@ clean("some input",
)
```
Carefully choose the arguments that fit your task. The default parameters are listed above. Whitespace is always normalized.
Carefully choose the arguments that fit your task. The default parameters are listed above.
You may also only use specific functions for cleaning. For this, take a look at the [source code](https://github.com/jfilter/clean-text/blob/master/cleantext/clean.py).