1
0
mirror of https://github.com/fhamborg/news-please.git synced 2021-09-19 22:26:00 +03:00

98 Commits

Author SHA1 Message Date
Felix Hamborg
520c975926 add cchardet 2021-05-06 10:01:52 +02:00
Felix Hamborg
e6cdd0f22b add warc file filtering 2021-04-22 11:03:45 +02:00
Felix Hamborg
6232b7d60f fix #206 2021-03-16 12:04:59 +01:00
Felix Hamborg
961948477d inc version 2021-02-24 08:53:58 +01:00
Felix Hamborg
a5f2fb4bd1 Update setup.py 2021-02-06 12:20:43 +01:00
Felix Hamborg
6d29e8367a Update setup.py 2021-02-05 16:06:54 +01:00
Felix Hamborg
48ab3300ac Update setup.py 2021-02-02 14:02:25 +01:00
Felix Hamborg
c93bf81a3e make scrapy pipeline item configurable 2020-05-13 11:43:05 +02:00
Felix Hamborg
57fbd8508a Update setup.py 2020-04-28 10:26:31 +02:00
Felix Hamborg
c9489e372c Update setup.py 2020-04-08 19:46:01 +02:00
Felix Hamborg
3595d4ce07 Update setup.py 2020-04-08 19:38:53 +02:00
Felix Hamborg
e070924204 inc version 2020-04-06 20:43:08 +02:00
Andrei Erdoss
054d30a154 Add Postgresql Storage option with init and reset of data 2020-04-05 15:18:54 +03:00
Felix Hamborg
838f230295 continue transition from text to maintext 2020-03-20 11:25:03 +01:00
Felix Hamborg
d3f635c265 Update setup.py 2020-01-14 21:52:03 +01:00
Felix Hamborg
2956e6aff5 Update setup.py 2020-01-04 18:52:57 +01:00
Felix Hamborg
53319ba7c9 remove py2.7 libs 2019-11-14 15:01:50 +01:00
Felix Hamborg
2d43bc3ab1 increase version 2019-11-08 10:29:56 +01:00
Felix Hamborg
947e38151f Update setup.py 2019-08-06 10:32:06 +02:00
Felix Hamborg
3c8af46c14 Update setup.py 2019-08-06 10:29:38 +02:00
Felix Hamborg
8913886e56 add docker script for cc 2019-07-15 17:18:44 +02:00
Felix Hamborg
663c229de4 add docker script for cc 2019-07-15 15:39:32 +02:00
Felix Hamborg
f1e3f87ad8 add entrypoint news-please-cc for commoncrawl script 2019-07-15 15:06:27 +02:00
Felix Hamborg
e2b82f8087 add entrypoint news-please-cc for commoncrawl script 2019-07-15 15:05:23 +02:00
Felix Hamborg
15b27a9ed5 cc:fix path problems
cc:add main params for download locations
2019-07-12 18:13:07 +02:00
Felix Hamborg
14bae0ef60 cc: more comprehensive log 2019-07-12 17:04:22 +02:00
Felix Hamborg
cdb1dab688 cc: more comprehensive logging output
cc: callback for warc completion
fix #105
2019-07-12 16:52:50 +02:00
Felix Hamborg
4c103c5806 from_warc properly handles encoding
increase version
2019-07-11 16:23:33 +02:00
Felix Hamborg
05a3a80994 from_warc properly handles encoding
increase version
2019-07-11 15:41:56 +02:00
Felix Hamborg
8c21ff3f14 various minor fixes and improvements 2019-05-13 16:50:38 +02:00
Felix Hamborg
78529dd0e4 increase version 2019-04-02 09:02:08 -04:00
Felix Hamborg
e5a7553805 remove pathlib which is python 3.4+ only 2019-03-29 13:31:47 +01:00
Felix Hamborg
9453405896 add panda support 2019-03-28 13:37:04 +01:00
Felix Hamborg
50fe56fbd4 Update setup.py 2019-03-10 12:53:31 +01:00
Felix Hamborg
6b18d83924 Update setup.py 2019-02-22 09:50:56 +01:00
Felix Hamborg
fb8b256171 Update setup.py 2019-02-21 13:50:08 +01:00
Felix Hamborg
866c44b9a6 Update setup.py 2018-12-08 09:55:46 +01:00
Felix Hamborg
cd12d32c79 fix #72
increase version
2018-11-14 16:27:42 +01:00
Felix Hamborg
1ab8be3660 fix #72
increase version
2018-11-14 16:24:55 +01:00
Felix Hamborg
bbcb143874 increase version 2018-11-14 14:36:16 +01:00
Felix Hamborg
4067c2c65a pip upload 2018-06-27 13:57:54 +02:00
Felix Hamborg
8215302c4d fix #60 2018-06-05 14:14:06 -05:00
Felix Hamborg
a2bc51210a add timeout to crawling in lib mode
new pypi version including docker
2018-05-29 12:50:50 +02:00
Felix Hamborg
83feaa327b Update setup.py 2018-05-17 11:06:08 +02:00
Felix Hamborg
0f57c602f5 Update setup.py 2018-02-01 16:09:18 +01:00
Felix Hamborg
21f9b8c44a update doc 2017-12-03 11:46:44 +01:00
Felix Hamborg
7182b56cf8 Update setup.py 2017-10-30 12:23:17 +01:00
Felix Hamborg
50aa40b898 fix NewsArticle object conversion to dict 2017-10-05 14:39:24 +02:00
Felix Hamborg
4f0e754408 change to relative imports in commoncrawl scripts so that a git clone is sufficient to run the example and no installation is required.
increase version
2017-10-05 12:06:15 +02:00
Felix Hamborg
104c98cb86 add Sören Lachnit as author 2017-09-18 17:17:57 +02:00