Felix Hamborg
|
520c975926
|
add cchardet
|
2021-05-06 10:01:52 +02:00 |
|
Felix Hamborg
|
e6cdd0f22b
|
add warc file filtering
|
2021-04-22 11:03:45 +02:00 |
|
Felix Hamborg
|
6232b7d60f
|
fix #206
|
2021-03-16 12:04:59 +01:00 |
|
Felix Hamborg
|
961948477d
|
inc version
|
2021-02-24 08:53:58 +01:00 |
|
Felix Hamborg
|
a5f2fb4bd1
|
Update setup.py
|
2021-02-06 12:20:43 +01:00 |
|
Felix Hamborg
|
6d29e8367a
|
Update setup.py
|
2021-02-05 16:06:54 +01:00 |
|
Felix Hamborg
|
48ab3300ac
|
Update setup.py
|
2021-02-02 14:02:25 +01:00 |
|
Felix Hamborg
|
c93bf81a3e
|
make scrapy pipeline item configurable
|
2020-05-13 11:43:05 +02:00 |
|
Felix Hamborg
|
57fbd8508a
|
Update setup.py
|
2020-04-28 10:26:31 +02:00 |
|
Felix Hamborg
|
c9489e372c
|
Update setup.py
|
2020-04-08 19:46:01 +02:00 |
|
Felix Hamborg
|
3595d4ce07
|
Update setup.py
|
2020-04-08 19:38:53 +02:00 |
|
Felix Hamborg
|
e070924204
|
inc version
|
2020-04-06 20:43:08 +02:00 |
|
Andrei Erdoss
|
054d30a154
|
Add Postgresql Storage option with init and reset of data
|
2020-04-05 15:18:54 +03:00 |
|
Felix Hamborg
|
838f230295
|
continue transition from text to maintext
|
2020-03-20 11:25:03 +01:00 |
|
Felix Hamborg
|
d3f635c265
|
Update setup.py
|
2020-01-14 21:52:03 +01:00 |
|
Felix Hamborg
|
2956e6aff5
|
Update setup.py
|
2020-01-04 18:52:57 +01:00 |
|
Felix Hamborg
|
53319ba7c9
|
remove py2.7 libs
|
2019-11-14 15:01:50 +01:00 |
|
Felix Hamborg
|
2d43bc3ab1
|
increase version
|
2019-11-08 10:29:56 +01:00 |
|
Felix Hamborg
|
947e38151f
|
Update setup.py
|
2019-08-06 10:32:06 +02:00 |
|
Felix Hamborg
|
3c8af46c14
|
Update setup.py
|
2019-08-06 10:29:38 +02:00 |
|
Felix Hamborg
|
8913886e56
|
add docker script for cc
|
2019-07-15 17:18:44 +02:00 |
|
Felix Hamborg
|
663c229de4
|
add docker script for cc
|
2019-07-15 15:39:32 +02:00 |
|
Felix Hamborg
|
f1e3f87ad8
|
add entrypoint news-please-cc for commoncrawl script
|
2019-07-15 15:06:27 +02:00 |
|
Felix Hamborg
|
e2b82f8087
|
add entrypoint news-please-cc for commoncrawl script
|
2019-07-15 15:05:23 +02:00 |
|
Felix Hamborg
|
15b27a9ed5
|
cc:fix path problems
cc:add main params for download locations
|
2019-07-12 18:13:07 +02:00 |
|
Felix Hamborg
|
14bae0ef60
|
cc: more comprehensive log
|
2019-07-12 17:04:22 +02:00 |
|
Felix Hamborg
|
cdb1dab688
|
cc: more comprehensive logging output
cc: callback for warc completion
fix #105
|
2019-07-12 16:52:50 +02:00 |
|
Felix Hamborg
|
4c103c5806
|
from_warc properly handles encoding
increase version
|
2019-07-11 16:23:33 +02:00 |
|
Felix Hamborg
|
05a3a80994
|
from_warc properly handles encoding
increase version
|
2019-07-11 15:41:56 +02:00 |
|
Felix Hamborg
|
8c21ff3f14
|
various minor fixes and improvements
|
2019-05-13 16:50:38 +02:00 |
|
Felix Hamborg
|
78529dd0e4
|
increase version
|
2019-04-02 09:02:08 -04:00 |
|
Felix Hamborg
|
e5a7553805
|
remove pathlib which is python 3.4+ only
|
2019-03-29 13:31:47 +01:00 |
|
Felix Hamborg
|
9453405896
|
add panda support
|
2019-03-28 13:37:04 +01:00 |
|
Felix Hamborg
|
50fe56fbd4
|
Update setup.py
|
2019-03-10 12:53:31 +01:00 |
|
Felix Hamborg
|
6b18d83924
|
Update setup.py
|
2019-02-22 09:50:56 +01:00 |
|
Felix Hamborg
|
fb8b256171
|
Update setup.py
|
2019-02-21 13:50:08 +01:00 |
|
Felix Hamborg
|
866c44b9a6
|
Update setup.py
|
2018-12-08 09:55:46 +01:00 |
|
Felix Hamborg
|
cd12d32c79
|
fix #72
increase version
|
2018-11-14 16:27:42 +01:00 |
|
Felix Hamborg
|
1ab8be3660
|
fix #72
increase version
|
2018-11-14 16:24:55 +01:00 |
|
Felix Hamborg
|
bbcb143874
|
increase version
|
2018-11-14 14:36:16 +01:00 |
|
Felix Hamborg
|
4067c2c65a
|
pip upload
|
2018-06-27 13:57:54 +02:00 |
|
Felix Hamborg
|
8215302c4d
|
fix #60
|
2018-06-05 14:14:06 -05:00 |
|
Felix Hamborg
|
a2bc51210a
|
add timeout to crawling in lib mode
new pypi version including docker
|
2018-05-29 12:50:50 +02:00 |
|
Felix Hamborg
|
83feaa327b
|
Update setup.py
|
2018-05-17 11:06:08 +02:00 |
|
Felix Hamborg
|
0f57c602f5
|
Update setup.py
|
2018-02-01 16:09:18 +01:00 |
|
Felix Hamborg
|
21f9b8c44a
|
update doc
|
2017-12-03 11:46:44 +01:00 |
|
Felix Hamborg
|
7182b56cf8
|
Update setup.py
|
2017-10-30 12:23:17 +01:00 |
|
Felix Hamborg
|
50aa40b898
|
fix NewsArticle object conversion to dict
|
2017-10-05 14:39:24 +02:00 |
|
Felix Hamborg
|
4f0e754408
|
change to relative imports in commoncrawl scripts so that a git clone is sufficient to run the example and no installation is required.
increase version
|
2017-10-05 12:06:15 +02:00 |
|
Felix Hamborg
|
104c98cb86
|
add Sören Lachnit as author
|
2017-09-18 17:17:57 +02:00 |
|