Commit Graph

1289 Commits

Author SHA1 Message Date
Reed Allman
04ae223a5d fixup json,http protocols (#772)
* http now buffers the entire request body from the container before copying
it to the response writer (and sets content length). this is a level of sad i
don't feel comfortable talking about but it is what it is.
* json protocol was buffering the entire body so there wasn't any reason for
us to try to write this directly to the container stdin manually, we needed to
add a bufio.Writer around it anyway it was making too many write(fd) syscalls
with the way it was. this is just easier overall and has the same performance
as http now in my tests, whereas previously this was 50% slower [than http].
* add buffer pool for http & json to share/use. json doesn't create a new
buffer every stinkin request. we need to plumb down content length so that we
can properly size the buffer for json, have to add header size and everything
together but it's probably faster than malloc(); punting on properly sizing.
* json now sets content type to the length of the body from the returned json
blurb from the container

this does not handle imposing a maximum size of the response returned from a
container, which we need to add, but this has been open for some time
(specifically, on json). we can impose this by wrapping the pipes, but there's
some discussion to be had for json specifically we won't be able to just cut
off the output stream and use that (http we can do this). anyway, filing a
ticket...

closes #326 :(((((((
2018-02-14 14:06:36 -08:00
Reed Allman
9cbe4ea536 add pprof endpoints, additional spans (#770)
i would split this commit in two if i were a good dev.

the pprof stuff is really useful and this only samples when called. this is
pretty standard go service stuff. expvar is cool, too.

the additional spannos have turned up some interesting tid bits... gonna slide
em in
2018-02-13 20:01:41 -08:00
CI
61f4fe2e24 fnserver: 0.3.338 release [skip ci] 2018-02-14 03:54:25 +00:00
Tolga Ceylan
c132cf1825 fn: dind SIGINT and SIGCHLD changes (#771)
1) in dind, prevent SIGINT reaching to dockerd. This kills
docker and prevents shutdown as fn server is trying to stop.
2) as init process, always reap child processes.
2018-02-13 19:46:53 -08:00
CI
f01b502bc7 fnserver: 0.3.337 release [skip ci] 2018-02-14 03:06:43 +00:00
Reed Allman
1a1250e5ea disable fail whale logs (#768)
we have been getting these from attach all this time and never needed these
anyway.

I ran cpu profiles of dockerd and this was 90% of docker cpu usage (json
logs). woot. this will reduce i/o quite a bit, and we don't have to worry
about them taking up any disk space either.

from tests i get about 50% speedup with these off. the hunt continues...
2018-02-13 17:45:11 -08:00
CI
eebb9ae4e7 fnserver: 0.3.336 release [skip ci] 2018-02-13 19:35:03 +00:00
Reed Allman
f287ad274e support deeper / nesting of image names (#765)
closes #764
2018-02-13 11:26:28 -08:00
CI
46caee5815 fnserver: 0.3.335 release [skip ci] 2018-02-13 02:53:29 +00:00
Reed Allman
cbfd659e7e cap docker retries to fixed number (#762)
previously we would retry infinitely up to the context with some backoff in
between. for hot functions, since we don't set any dead line on pulling or
creating the image, this means it would retry forever without making any
progress if e.g. the registry is inaccessable or any other temporary error
that isn't actually temporary.  this adds a hard cap of 10 retries, which
gives approximately 13s if the ops take no time, still respecting the context
deadline enclosed.

the case where this was coming up is now tested for and was otherwise
confusing for users to debug, now it spits out an ECONNREFUSED with the
address of the registry, which should help users debug without having to poke
around fn logs (though I don't like this as an excuse, not all users will be
operators at some point in the near future, and this one makes sense)

closes #727
2018-02-12 18:45:30 -08:00
CI
726f615a03 fnserver: 0.3.334 release [skip ci] 2018-02-13 01:59:10 +00:00
Reed Allman
97194b3d8b return bad function http resp error (#728)
* return bad function http resp error

this was being thrown into the fn server logs but it's relatively easy to get
this to crop up if a function user forgets that they left a `println` laying
around that gets written to stdout, it garbles the http (or json, in its case)
output and they just see 'internal server error'. for certain clients i could
see that we really do want to keep this as 'internal server error' but for
things like e.g. docker image not authorized we're showing that in the
response, so this seems apt.

json likely needs the same treatment, will file a bug.

as always, my error messages are rarely helpful enough, help me please :)

closes #355

* add formatting directive

* fix up http error

* output bad jasons to user

closes #729

woo
2018-02-12 17:51:45 -08:00
CI
9d3b66d807 fnserver: 0.3.333 release [skip ci] 2018-02-13 00:00:27 +00:00
Tolga Ceylan
567136cb5e fn: required docker version fix (#759) 2018-02-12 15:53:05 -08:00
CI
ab77223d05 fnserver: 0.3.332 release [skip ci] 2018-02-12 22:18:51 +00:00
Tolga Ceylan
c848fc6181 fn: hot container timer improvements (#751)
* fn: hot container timer improvements

With this change, now we are allocating the timers
when the container starts and managing them via
stop/clear as needed, which should not only be more
efficient, but also easier to follow.

For example, previously, if eject time out was
set to 10 secs, this could have delayed idle timeout
up to 10 secs as well. It is also not necessary to do
any math for elapsed time.

Now consumers avoid any requeuing when startDequeuer() is cancelled.
This was triggering additional dequeue/requeue causing
containers to wake up spuriously. Also in startDequeuer(),
we no longer remove the item from the actual queue and
leave this to acquire/eject, which side steps issues related
with item landing in the channel, not consumed, etc.
2018-02-12 14:12:03 -08:00
CI
ffcda9b823 fnserver: 0.3.331 release [skip ci] 2018-02-12 18:42:21 +00:00
Tolga Ceylan
b2c95410f4 fn: test case additions (#755)
1) oom test
2) invalid http resp code test
3) check for error string contents in various error cases
2018-02-12 10:34:35 -08:00
CI
a2aad73664 fnserver: 0.3.330 release [skip ci] 2018-02-12 18:27:15 +00:00
CI
6f3237585d fnserver: 0.3.329 release [skip ci] 2018-02-09 21:30:52 +00:00
Reed Allman
27179ddf54 plumb ctx for container removal spanno (#750)
these were just dangling off on the side, took some plumbing work but not so
bad
2018-02-08 22:48:23 -08:00
CI
aea3bab95e fnserver: 0.3.328 release [skip ci] 2018-02-09 01:23:22 +00:00
Reed Allman
3ab49d4701 limit log size in containers (#748)
closes #317

we could fiddle with this, but we need to at least bound these. this
accomplishes that. 1m is picked since that's our default max log size for the
time being per call, it also takes a little time to generate that many bytes
through logs, typically (i.e. without trying to). I tested with 0, which
spiked the i/o rate on my machine because it's constantly deleting the json
log file. I also tested with 1k and it was similar (for a task that generated
about 1k in logs quickly) -- in testing, this halved my throughput, whereas
using 1m did not change the throughput at all. trying the 'none' driver and
'syslog' driver weren't great, 'none' turns off all stderr and 'syslog' blocks
every log line (boo). anyway, this option seems to have no affect on the
output we get in 'attach', which is what we really care about (i.e. docker is
not logically capping this, just swapping out the log file).

using 1m for this, e.g. if we have 500 hot containers on a machine we have
potentially half a gig of worthless logs laying around. we don't need the
docker logs laying around at all really, but short of writing a storage driver
ourselves there don't seem to be too many better options. open to idears, but
this is likely to hold us over for some time.
2018-02-08 17:16:26 -08:00
CI
6c62bdb18a fnserver: 0.3.327 release [skip ci] 2018-02-08 01:29:10 +00:00
Tolga Ceylan
f27d47f2dd Idle Hot Container Freeze/Preempt Support (#733)
* fn: freeze/unfreeze and eject idle under resource contention
2018-02-07 17:21:53 -08:00
CI
105947d031 fnserver: 0.3.326 release [skip ci] 2018-02-08 00:56:39 +00:00
Tolga Ceylan
dc4d90432b fn: memory limit adjustments (#746)
1) limit kernel memory which was previously unlimited, using
   same limits as user memory for a unified approach.
2) disable swap memory for containers
2018-02-07 16:48:52 -08:00
CI
8f70b622cc fnserver: 0.3.325 release [skip ci] 2018-02-07 00:24:19 +00:00
Tolga Ceylan
ebc6657071 fn: docker version check2 (#744)
1) now required docker version is 17.06
2) enable circle ci latest docker install
3) docker driver & agent check minimum version before start
2018-02-06 16:16:40 -08:00
CI
640a47fe55 fnserver: 0.3.324 release [skip ci] 2018-02-06 00:23:02 +00:00
CI
4d802acc83 fnserver: 0.3.323 release [skip ci] 2018-02-05 20:00:05 +00:00
Reed Allman
235cbc2d67 Fix default setting (#740)
* push validate/defaults into datastore

we weren't setting a timestamp in route insert when we needed to create an app
there. that whole thing isn't atomic, but this fixes the timestamp issue.

closes #738

seems like we should do similar with the FireBeforeX stuff too.

* fix tests

* app name validation was buggy, an upper cased letter failed. now it doesn't.
uses unicode now.
* removes duplicate errors for datastore and models validation that were used
interchangably but weren't.
2018-02-05 11:54:09 -08:00
CI
b49f332e01 fnserver: 0.3.322 release [skip ci] 2018-02-05 19:18:17 +00:00
Tolga Ceylan
fdf5a67f6f fn: error image is now deprecated (#737)
Please use fn-test-utils instead for testing.
2018-02-05 11:12:27 -08:00
CI
18cbf440bd fnserver: 0.3.321 release [skip ci] 2018-02-05 18:07:23 +00:00
Tolga Ceylan
6b5486c699 fn: sleeper image is now deprecated (#736)
Please use fn-test-utils instead for testing.
2018-02-05 10:01:11 -08:00
CI
f76648a18a fnserver: 0.3.320 release [skip ci] 2018-02-05 16:14:17 +00:00
CI
606813a35f fnserver: 0.3.319 release [skip ci] 2018-02-05 15:57:15 +00:00
Nigel Deakin
5089dd6119 Extend /stats API to handle two routes with the same path in different apps (#735)
* Extend deprecated /stats API to handle apps and paths correctly

* More changes (bugfixes) to the JSON structure returned by the stats API call
2018-02-05 15:51:53 +00:00
CI
ac4dfa6077 fnserver: 0.3.318 release [skip ci] 2018-02-01 15:31:13 +00:00
CI
45675bc36d fnserver: 0.3.317 release [skip ci] 2018-02-01 01:30:56 +00:00
Reed Allman
3b261fc144 pipe swapparoo each slot (#721)
* pipe swapparoo each slot

previously, we made a pair of pipes for stdin and stdout for each container,
and then handed them out to each call (slot) to use. this meant that multiple
calls could have a handle on the same stdin pipe and stdout pipe to read/write
to/from from fn's perspective and could mix input/output and get garbage. this
also meant that each was blocked on the previous' reads.

now we make a new pipe every time we get a slot, and swap it out with the
previous ones. calls are no longer blocked from fn's perspective, and we don't
have to worry about timing out dispatch for any hot format. there is still the
issue that if a function does not finish reading the input from the previous
task, from its perspective, and reads the next call's it can error out the
second call. with fn deadline we provide the necessary tools to skirt this,
but without some additional coordination am not sure this is a closable hole
with our current protocols since terminating a previous calls input requires
some protocol specific bytes to go in (json in particular is tricky). anyway,
from fn's side fixing pipes was definitely a hole, but this client hole is
still hanging out. there was an attempt to send an io.EOF but the issue is
that will shut down docker's read on the stdin pipe (and the container). poop.

this adds a test for this behavior, and makes sure 2 containers don't get
launched.

this also closes the response writer header race a little, but not entirely, I
think there's still a chance that we read a full response from a function and
get a timeout while we're changing the headers. I guess we need a thread safe
header bucket, otherwise we have to rely on timings (racy). thinking on it.

* fix stats mu race
2018-01-31 17:25:24 -08:00
CI
4e989b0789 fnserver: 0.3.316 release [skip ci] 2018-01-31 12:32:20 +00:00
Dario Domizioli
e753732bd8 Hot protocols improvements (for 662) (#724)
* Improve deadline handling in streaming protocols

* Move special headers handling down to the protocols

* Adding function format documentation for JSON changes

* Add tests for request url and method in JSON protocol

* Fix  protocol missing fn-specific info

* Fix import

* Add panic for something that should never happen
2018-01-31 12:26:43 +00:00
CI
02c8aa1998 fnserver: 0.3.315 release [skip ci] 2018-01-31 12:13:33 +00:00
Dario Domizioli
e2dad00a83 Add simple test for calling several hot functions in parallel (#675)
* Add test for calling several hot functions in parallel
2018-01-31 12:08:05 +00:00
CI
5f8736500d fnserver: 0.3.314 release [skip ci] 2018-01-26 20:26:48 +00:00
Tolga Ceylan
97d78c584b fn: better slot/container/request state tracking (#719)
* fn: better slot/container/request state tracking
2018-01-26 12:21:11 -08:00
CI
a7223437df fnserver: 0.3.313 release [skip ci] 2018-01-24 17:36:52 +00:00
CI
cc4c157d56 fnserver: 0.3.312 release [skip ci] 2018-01-24 15:43:41 +00:00