further improve docs

2021-09-19 22:32:58 +03:00 · 2019-04-24 18:35:36 +02:00
parent 220b63fe87
commit 5f7a32f1f6
2 changed files with 11 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -1,9 +1,9 @@
 # clean-text

-Clean your text with `clean-text` to create normalized text represenations. For instance, turn this corrupted input:
+Clean your text with `clean-text` to create normalized text representations. For instance, turn this corrupted input:

 ```txt
-There's a bunch of \\u2018new\\u2019 references, including [Moana](https://en.wikipedia.org/wiki/Moana_%282016_film%29).
+A bunch of \\u2018new\\u2019 references, including [Moana](https://en.wikipedia.org/wiki/Moana_%282016_film%29).


 »Yóù àré     rïght &lt;3!«
@@ -12,12 +12,12 @@ There's a bunch of \\u2018new\\u2019 references, including [Moana](https://en.wi
 into this

 ```txt
-there's a bunch of 'new' references, including [moana](<URL>).
+A bunch of 'new' references, including [moana](<URL>).

 "you are right <3!"
 ```

-`clean-text` uses [ftfy](https://github.com/LuminosoInsight/python-ftfy), [unidecode](https://github.com/takluyver/Unidecode) and numerous hand-crafted rules such as RegEx.
+`clean-text` uses [ftfy](https://github.com/LuminosoInsight/python-ftfy), [unidecode](https://github.com/takluyver/Unidecode) and numerous hand-crafted rules, i.e., RegEx.

 ## Installation

@@ -25,7 +25,7 @@ there's a bunch of 'new' references, including [moana](<URL>).
 pip install clean-text[gpl]
 ```

-This will install the GPL-licensed package [unidecode](https://github.com/takluyver/Unidecode). If it is not available, `clean-text` will resort to Python's [unicodedata.normalize](https://docs.python.org/3.7/library/unicodedata.html#unicodedata.normalize). This is used for transliteration. So `ê` gets turned into `e`. So you can also install it without unidecode.
+This will install the GPL-licensed package [unidecode](https://github.com/takluyver/Unidecode). If it is not available, `clean-text` will resort to Python's [unicodedata.normalize](https://docs.python.org/3.7/library/unicodedata.html#unicodedata.normalize) for [transliteration](https://en.wikipedia.org/wiki/Transliteration). Unicode symbols are encoded to their clostest ASCII equivlaent. So `ê` gets turned into `e`. However, you may also disable this feature altogether.

 ```bash
 pip install clean-text
@@ -60,6 +60,8 @@ clean("some input",

 Carefully choose the arguments that fit your task. The default parameters are listed above. Whitespace is always normalized.

+You may also only use specific functions for cleaning. For this, take a look at the [source code](https://github.com/jfilter/clean-text/blob/master/cleantext/clean.py).
+
 ## Development

 -   install [Pipenv](https://pipenv.readthedocs.io/en/latest/)
@@ -76,7 +78,7 @@ If you don't like the output of `clean-text`, consider adding a [test](https://g

 ## Acknowledgements

-Built upon the work by [Burton DeWilde](https://github.com/bdewilde)'s [Textacy](https://github.com/chartbeat-labs/textacy).
+Built upon the work by [Burton DeWilde](https://github.com/bdewilde)'s for [Textacy](https://github.com/chartbeat-labs/textacy).

 ## License

--- a/setup.py
+++ b/setup.py
@@ -7,6 +7,7 @@ with open("README.md", "r") as fh:
 classifiers = [
    'Programming Language :: Python :: 3.5',
    'Programming Language :: Python :: 3.6',
+    'Programming Language :: Python :: 3.7',
    'License :: OSI Approved :: MIT License',
 ]

@@ -14,11 +15,11 @@ version = '0.0.0'

 setup(name='cleantext',
      version=version,
-      description='Clean your dirty text',
+      description='Clean Your Text to Create Normalized Text Representations',
      long_description=long_description,
      long_description_content_type="text/markdown",
      author='Johannes Filter',
-      author_email='ragha@outlook.com, hi@jfilter.de',
+      author_email='hi@jfilter.de',
      url='https://github.com/jfilter/clean-text',
      license='MIT',
      install_requires=['ftfy'],