1
0
mirror of https://github.com/fhamborg/news-please.git synced 2021-09-19 22:26:00 +03:00

add library description

This commit is contained in:
Felix Hamborg
2017-02-24 12:05:40 +01:00
parent 6af8f7c520
commit cdb173d08a

View File

@@ -12,11 +12,17 @@
## Features
* **works out of the box**: install with pip, add URLs of your pages, run :-)
* execute it conveniently with the **CLI** or use it as a **library** within your own software
### CLI mode
* stores extracted results in **JSON files or ElasticSearch** (other storages can be added easily)
* **simple but extensive configuration** (if you want to tweak the results)
* runs on your favorite Python version (2.7+ and 3+)
* revisions: crawl articles multiple times and track changes
### Library mode
* crawl and extract information for a list of article URLs (currently the fullsite-crawling is only supported via the CLI)
## Getting started
It's super easy, we promise!
@@ -27,7 +33,14 @@ It's super easy, we promise!
$ sudo pip install news-please
```
### Run the crawler
### Use within your own code
```
from newsplease import NewsPleaseLib
article = NewsPleaseLib.download_article('https://www.nytimes.com/2017/02/23/us/politics/cpac-stephen-bannon-reince-priebus.html?hp')
print(article['title'])
```
### Run the crawler (CLI)
```
$ news-please